On Nov 22, 2016, at 5:03 PM, Lofstead, Gerald F II <gflofst(a)sandia.gov> wrote:
Sarp: Can you share any of the materials from your previous effort?
I have a few other comments to add in:
1. A better metadata testing tool is a great idea. Let’s focus on forward looking tools
rather than clinging to old tools. My concern is how well can we avoid “gaming” the new
tool. mdtest is well understood and can probably be controlled for.
2. We have had a lot of discussion of moving to object storage because we don’t have a
choice. The vendors are addressing the needs of their 95% customer. I don’t think IOR is a
fair test of this. It ends up being a test of the mapping from the data structures to
objects. For example, using something like PLFS will be a HUGE advantage for “fixing” the
IO to be more object oriented no matter the IO API/middleware limitations. In essence, you
could cheat trivially.
I see this primarily as a customer benchmark and not a vendor benchmark. I suspect people
will always be able to cheat if they really want, so we should focus on providing useful
metrics to measure the performance of the storage. If there are optimizations like PFLS
that can be used to accelerate application performance, isn't that of benefit to the
users?
3. By doing mdtest and ior separately, we are decoupling the two.
Striping issues that hit the metadata server are part of the file creation AND IO
performance issues. Do we want to combine these in a more direct test somehow?
4. How much of what we are testing is intended to be the hardware vs. the storage
software layer (e.g., Lustre) vs. the middleware (MPI-IO + PLFS vs. ADIOS + BP) vs. IO API
(HDF/NetCDF vs. ADIOS vs. POSIX vs. MPI-IO)? Testing at each of these levels makes a lot
of sense and have different values to different audiences. I’d argue that all that matters
is the top level test since we are trying to support applications. If they do N-1 files,
unless the system ALWAYS uses PLFS, it should suffer the stack performance
characteristics. Doing something “simple” at a lower layer does not represent what end
users care about—what IO performance can I expect? I think IOR can do a lot of it, but it
isn’t a complete solution.
While benchmarks at the low level are useful for system analysis, they don't really
represent how users will interact with the storage. That is IMHO one reason why IOR is
attractive, since it can interact with different storage APIs in a similar manner to store
the same data.
5. How do we deal with burst buffers in their various incarnations?
Do we make rules about relative sizes of BB and main memory to decide if other storage
systems have to be considered? Is there a different metric such as accessible from some
external location that determines what we want to benchmark? Is that fair since many
systems are being bought with a BB to hide that latency in the general case believing that
there are sufficient IOPS and back end bandwidth to drain without slowing applications.
I'd think that the BB storage be benchmarked separately from the main storage system,
if there is one. Knowing the performance characteristics of both is quite useful.
One of the suggestions I heard was to include the total size of the storage system as one
of the components in the "single value" result (in TB or some other suitable
unit). Having a large fast storage system is probably more useful than having a small
system, since it won't run out of space before the user is finished writing their
data, and it will avoid games like using the RAM of 1/2 of client nodes to accept writes
for a few seconds at high speed but without enough capacity to hold much data. Not saying
that in itself isn't a useful activity in some cases (e.g. SCR can checkpoint into
peer client RAM), but that isn't (IMHO) as useful as a storage system with the
capacity of 30x RAM at the same speed.
Cheers, Andreas
There are tons more things to consider.
Best,
Jay
From: IO-500 <io-500-bounces(a)vi4io.org> on behalf of Julian Kunkel
<juliankunkel(a)googlemail.com>
Date: Tuesday, November 22, 2016 at 12:49 AM
To: John Bent <John.Bent(a)seagategov.com>
Cc: "io-500(a)vi4io.org" <io-500(a)vi4io.org>
Subject: [EXTERNAL] Re: [IO-500] Benchmark abstraction
Dear All,
I'm not *against* using IOR but at this stage, I rather favour a clear separation
between
What and why certain metrics are useful to be measured and in a second step
How they are measured.
This also serves as validation that we do the right thing. I found this always useful
when defining a test, and a benchmark is just a performance test for me. The intended
purpose helps not only in communication but also prevents unintentional optimizations of
systems.
Again I agree that IOR could be the vehicle but I would hope the community firstly agrees
on the metrics before there might be detailed discussions about the tool.
Regards
Julian
Am 21.11.2016 10:37 nachm. schrieb "John Bent"
<John.Bent(a)seagategov.com>:
> Thanks Sarp! Some comments in-line.
>
>> On Nov 21, 2016, at 2:22 PM, Oral, H. Sarp <oralhs(a)ornl.gov> wrote:
>>
>> Well, I agree with John that trying to define a new and all around benchmark is
highly difficult. We tried that (and looked a few other benchmarks at the time) and
failed. No need to repeat the same mistakes, I think.
>>
>> And I also agree that the benchmarks need to be simple and easy to run and
representative of realistic scenarios.
>>
>> Rather than limiting to two IOR instances, we can perhaps increase them slightly
to cover more I/O workloads with IOR, if needed.
>>
>> By the way, we already have an IOR version that we integrated with ADIOS. We can
share it with the community. And IOR already supports HDF5, and MPIIO. Between POSIX and
these mid level libs, I think IOR covers a majority of the use cases. The trick is coming
up with good, canned command line option sets for IOR covering various I/O workloads.
>>
>> There is really nothing else on measuring the mdtest today as far as I know.
>>
> In terms of mdtest, when I speak of it, I have to admit that I’m speaking of a
theoretical future mdtest which does not yet exist. IOR is beautifully engineered with a
fantastic plug-in feature as you mention. The mdtest I’m envisioning is taking mdtest.c
from it’s current github and moving it into the IOR github and rewriting it to replace the
POSIX calls into calls to this IOR plug-in interface. The plug-in interface is already
almost a superset of what mdtest needs. I think only ‘stat’ needs to be added. That way,
when people add new plug-ins to IOR, they will simultaneously add them to mdtest. Also,
for our benchmark, they’d simply pull and ‘make’ from a single repository.
>
> Any volunteers? :)
>
> Here’s what I believe to be the most recently maintained repositories:
>
https://github.com/MDTEST-LANL/mdtest
>
https://github.com/IOR-LANL/ior
>
> I have to admit that I have not yet looked at md-real-io to do a comparison. (sorry
Julian, it is on my TODO list…)
>
>
>>
>> So, we are on board.
>>
> Fantastic. Our survey is now 1% complete. :)
>
> Thanks,
>
> John
>
>
>>
>> Thanks,
>>
>> Sarp
>>
>> --
>> Sarp Oral, PhD
>>
>>
>>
>> National Center for Computational Sciences
>> Oak Ridge National Laboratory
>> oralhs(a)ornl.gov
>> 865-574-2173
>>
>>
>>
>> On 11/20/16, 12:33 PM, "IO-500 on behalf of Julian Kunkel"
<io-500-bounces(a)vi4io.org on behalf of juliankunkel(a)googlemail.com> wrote:
>>
>> Dear John,
>> I would definitely not go with mdtest. That one can be well optimized by read
ahead / sync. Also it is POSIX only.
>> Note that for overcoming the caching problem, I wrote the md-real-io benchmark
that shares many things with mdtest.
>> I would wait for the community feedback and not ignore that concepts such as
ADIOS may not necessarily fit as IOR back ends and rather to with abstract definitions
first.
>>
>> Regards
>> Julian
>>
>>
>> Am 20.11.2016 6:12 nachm. schrieb "John Bent"
<John.Bent(a)seagategov.com>:
>>
>> To attempt defining the perfect IO benchmark is Quixotic. Those who dislike
IO500 will always dislike IO500 regardless of what the specific benchmark is. Those who
like the idea will accept an imperfect benchmark.
>>
>>
>> Therefore, I suggest we move forward with the straw person proposal: IOR hard,
IOR easy, mdtest hard, mdtest easy.
>>
>> * Average IOR hard and IOR easy. Average mdtest hard and mdtest easy.
>> * Their product determines the winner.
>> * Don’t report the product since it’s a meaningless unit; report the
averages.
>> * e.g. The winner of IO500 is TaihuLight with a score of 250 GB/s and 300K
IOPs.
>>
>>
>>
>>
>> Unless a proposal is strictly much better than IOR hard, IOR easy, mdtest
hard, mdtest easy, I don’t think we should consider it. The beauty of IOR hard, IOR easy,
mdtest hard, mdtest easy is that they are well-understood, well-accepted benchmarks, that
>> are trivial to download and compile, and whose results are immediately
understandable. Every RFP in the world uses them. The one problem is they need a pithier
name than “IOR hard, IOR easy, mdtest hard, mdtest easy”...
>>
>>
>> My suggestion is to poll the top 100 of the
>>
top500.org <
http://top500.org> and ask them this:
>>
>>
>> "If we were to do an IO500, and our benchmark was IOR hard, IOR easy,
mdtest hard, mdtest easy, would you participate? If not, would you participate with a
different benchmark?”
>>
>>
>> If the bulk of the answers are “yes,” then we just figure out how to organize
and administer this thing.
>> If the bulk of the answers are “no,” then we give up and do something else.
>> If the bulk of the answers are “no, yes,” then we need to find a new
benchmark.
>>
>>
>> Thanks,
>>
>>
>> John
>>
>>
>> On Nov 20, 2016, at 7:11 AM, Julian Kunkel <juliankunkel(a)googlemail.com>
wrote:
>>
>> Dear all,
>> based on our discussion during the BoF at SC, we could focus on the
>> access pattern(s) of interest first. Later we can define which
>> benchmarks (such as IOR) could implement these patterns (e.g., how to
>> call existing benchmarks).
>>
>> This strategy gives other I/O paradigms the option to create a
>> benchmark with that pattern that fits their I/O paradigm/architecture.
>>
>>
>> Here is a draft of one that is probably not too difficult to discuss:
>> Goal: IOmax: Sustained performance for well-formed I/O
>>
>> Rationales:
>> The benchmark shall determine the best sustained I/O behavior without
>> in-memory caching and I/O variability. A set of real applications that
>> are highly optimized should be able to show the described access
>> behavior.
>>
>> Use case: A large data structure is distributed across N
>> threads/processes; a time series of this data structured shall be
>> stored/retrieved efficiently. (This could be a checkpoint.)
>>
>> Processing steps:
>> S0) Each thread allocates and initializes a large consecutive memory
>> region of size S with a random (but well defined) pattern
>> S1) Repeat T times: Each process persists/reads its data to/from the
>> storage. Ech iteration is protected with a global barrier and the
>> runtime is measured
>> S2) Compute the throughput (as IOmax) by dividing the total accessed
>> data volume (N*S) by the maximum observed runtime for any single
>> iteration in step S1
>>
>> Rules:
>> R1) The data of each thread and timestep must be stored individually
>> and cannot be overwritten during a benchmark run
>> R2) It must be ensured that the time includes all processes needed to
>> persist all data in volatile memory (for writes) and that prior
>> startup of reads no data is cached in any volatile memory
>> R3) A valid result must verify that read returns the expected (random) data
>> R4) N, T and S can be set arbitrarily. T must be >= 3. The benchmark
>> shall be repeated several times
>>
>> Reported metrics:
>> * IOmax
>> * Working set size W: N*T*S
>>
>> Regards,
>> Julian
>> _______________________________________________
>> IO-500 mailing list
>> IO-500(a)vi4io.org
>>
https://www.vi4io.org/cgi-bin/mailman/listinfo/io-500
>>
>>
>>
>>
>>
>>
>>
>>
>> STRICTLY PERSONAL AND CONFIDENTIAL. This email may contain
>> confidential and proprietary material for the sole use of
>> the intended recipient. Any review or distribution by others
>> is strictly prohibited. If you are not the intended recipient
>> please contact the sender and delete all copies.
>>
>>
>>
>>
>>
>>
>>
>
>
> STRICTLY PERSONAL AND CONFIDENTIAL. This email may contain
> confidential and proprietary material for the sole use of
> the intended recipient. Any review or distribution by others
> is strictly prohibited. If you are not the intended recipient
> please contact the sender and delete all copies.
_______________________________________________
IO-500 mailing list
IO-500(a)vi4io.org
https://www.vi4io.org/cgi-bin/mailman/listinfo/io-500
Cheers, Andreas