Dear all,
I have now integrated the current discussion points into the document:
(Hope I have not forgotten sth. important).
It also includes a lightweight way of voting for an alternative.
1) I would not want to discuss details of the roadmap further, it is
now also described in the document. There is no need to actually have
(all) results ready during ISC. A few numbers and first experience are
great to present to foster further discussion and validate the results
of the discussion.
2) Based on the discussion, I expect that we now go with an systematic
approach to define (finally) the benchmarks, if you are against, just
enter your (name) into the document and we may continue discussing
this.
3) For now, we also not discuss the specific benchmark tool further,
we focus on issue / concerns and identify the patterns to benchmark
and issues. There are quite a few in respect to alternative storage
architectures and storage APIs to resolve and as I see everybody is
willing to actually make sure we define benchmarks that are useful for
most (if not all) technology/interfaces. Why not make a subset of the
benchmarking patterns useful for different BigData solutions? How to
make sure byte-addressable storage works well....
4) Since I have not heard sth. else, I expect we focus on end-user
observable performance which covers middleware, software, hardware and
such. If this is wrong, again add your name in the doc.
Feel free to comment and/or modify the document as suits useful.
Thanks & regards,
Julian
2016-11-26 1:58 GMT+01:00 Oral, H. Sarp <oralhs(a)ornl.gov>:
Just a reminder; running an I/O benchmark on large-scale systems
require a lot of logistics in terms of scheduling a down time, securing resources/people
and parsing through some amount of red tape, so let’s plan accordingly. Two months
(Apr-Jun) might not be enough to populate the list.
Back to John’s multi-choice question; my suggestion is to first identify the
audience/consumer for the list. If it is the system designers/vendors/managers, then
IOR/mdtest will be sufficient. If it is the end users (i.e., app teams), then it is a bit
more complicated but I still believe IOR/mdtest (or a version of them) will suffice with
some modifications.
Just my two cents.
Thanks,
Sarp
--
Sarp Oral, PhD
National Center for Computational Sciences
Oak Ridge National Laboratory
oralhs(a)ornl.gov
865-574-2173
On 11/23/16, 8:30 PM, "IO-500 on behalf of Andreas Dilger"
<io-500-bounces(a)vi4io.org on behalf of adilger(a)dilger.ca> wrote:
On Nov 23, 2016, at 7:58 AM, John Bent <John.Bent(a)seagategov.com> wrote:
>
> Upon further reflection, there is no need to be hasty. My concern is simply
interminable discussion. Bounded discussion OTOH sounds great. How about:
>
> 1. Open discussion from now until Jan 31 to define goals, identify concerns,
etc.
> 2. In Feb, people can submit benchmark proposals. Two months to discuss and
refine these.
> 3. If consensus not reached by April 1, vote.
> 4. Survey the top 100 sites using the chosen benchmark proposal
It would be good to have this ready in time for ISC'17 so that we can at least
have a bunch of sites submit results in advance.
Whether we get buyin from all of the Top 100 sites isn't critical. I think
anyone should be able to submit results, since sites with fast storage may not be the same
as sites with fast compute, and having some exposure at ISC'17 will make it more
visible and increase participation for SC'17.
Cheers, Andreas
> On Nov 23, 2016, at 12:37 AM, John Bent <John.Bent(a)seagategov.com> wrote:
>
>> PLFS will do horribly in this test because it will completely fail the IOR
hard and it will do horribly on both mdtest hard and easy. If someone fixes it (like
maybe the Mohror burst-fs stuff), then that seems like fair game.
>>
>> I think we’re at a bit of an impasse in terms of whether to use IOR and
mdtest or try to create new and better benchmarks. I fear that an attempt to create new
and better benchmarks is doomed to failure. We can never make everyone happy. Dongarra
went with Linpack. It’s universally reviled yet Top 500 is massively successful.
>>
>> Now, here’s a question: do we want to be successful or do we want to advance
science? Because maybe they are different things. Has Top 500 with an imperfect
benchmark advanced science?
>>
>> In terms of our impasse, how about we spend six weeks trying to define a
more well-liked benchmark? If we fail, we go ahead with IOR and mdtest. How do we
determine failure/success? I guarantee we won’t find consensus. Maybe six weeks to
discuss and then we submit candidates and vote?
>>
>> I also like my idea of surveying the Top 100. I suspect if we propose IOR
hard, IOR easy, mdtest hard, mdtest easy to them today, we will get 30 of them that say
they will do it. I suspect if we spend six months discussing and then survey them again
with whatever we agree upon, we will get 30 of them that say they will do it. And I think
30 is enough that we should do it because 30 is large enough that the rest will follow.
Heck, I think ORNL alone is enough that the rest will follow.
>>
>> But if our goal is not just to succeed, but actually to advance science then
I’m willing to spend six weeks and see if we converge on something better than IOR and
mdtest. My prediction (which I think Andreas shares)? We won’t.
>>
>> Sarp and others who haven’t yet spoken, can you please weigh in on whether:
>>
>> A. We should survey immediately with IOR and mdtest
>> B. We should spend six weeks trying to find something better
>> C. Some other path?
>>
>> By the way, I liked Dean’s suggestion that we change the benchmark every
year in terms of how well it would advance science. But it absolutely terrifies me as a
practical matter. This discussion every year? Plus I think sites are less likely to
participate if they have to learn how to run, and tune for, a different benchmark every
year.
>>
>> Also, replies to Jay inline:
>>
>>> On Nov 22, 2016, at 5:03 PM, Lofstead, Gerald F II
<gflofst(a)sandia.gov> wrote:
>>>
>>> Sarp: Can you share any of the materials from your previous effort?
>>>
>>> I have a few other comments to add in:
>>>
>>> 1. A better metadata testing tool is a great idea. Let’s focus on
forward looking tools rather than clinging to old tools. My concern is how well can we
avoid “gaming” the new tool. mdtest is well understood and can probably be controlled
for.
>> If mdtest hard and mdtest easy as we’ve discussed can be gamed, then any
other benchmark can as well. But if someone “games” to do well with mdtest hard and
mdtest easy, then I do think that sets bounds for user expectations of metadata
performance.
>>
>>> 2. We have had a lot of discussion of moving to object storage because
we don’t have a choice. The vendors are addressing the needs of their 95% customer. I
don’t think IOR is a fair test of this. It ends up being a test of the mapping from the
data structures to objects. For example, using something like PLFS will be a HUGE
advantage for “fixing” the IO to be more object oriented no matter the IO API/middleware
limitations. In essence, you could cheat trivially.
>> As I discussed above, PLFS would fail horribly.
>>
>>> 3. By doing mdtest and ior separately, we are decoupling the two.
Striping issues that hit the metadata server are part of the file creation AND IO
performance issues. Do we want to combine these in a more direct test somehow?
>> Maybe we should do IOR hard, IOR easy, mdtest hard, mdtest easy
independently as four measurements. Then run all four concurrently as a fifth
measurement?
>>
>>> 4. How much of what we are testing is intended to be the hardware vs.
the storage software layer (e.g., Lustre) vs. the middleware (MPI-IO + PLFS vs. ADIOS +
BP) vs. IO API (HDF/NetCDF vs. ADIOS vs. POSIX vs. MPI-IO)? Testing at each of these
levels makes a lot of sense and have different values to different audiences. I’d argue
that all that matters is the top level test since we are trying to support applications.
If they do N-1 files, unless the system ALWAYS uses PLFS, it should suffer the stack
performance characteristics. Doing something “simple” at a lower layer does not represent
what end users care about—what IO performance can I expect? I think IOR can do a lot of
it, but it isn’t a complete solution.
>> I agree that all that matters is the top-level test since we are trying to
support apps. I’d phrase it as ‘we are trying to help apps predict their performance’ as
this was a comment made at the BoF by someone whose name I sadly do not know.
>>
>>> 5. How do we deal with burst buffers in their various incarnations? Do
we make rules about relative sizes of BB and main memory to decide if other storage
systems have to be considered? Is there a different metric such as accessible from some
external location that determines what we want to benchmark? Is that fair since many
systems are being bought with a BB to hide that latency in the general case believing that
there are sufficient IOPS and back end bandwidth to drain without slowing applications.
>>>
>> I like 5 minutes of sustained IO. If the BB is large enough to get super
high bandwidth during 5 minutes, then I’m willing to believe its a good storage system.
Sure, someone might build a BB sized for 5 minutes and then not even bother to have a
second tier just because they want to win IO 500. And they’ve done dumb stuff like that
for Top 500 too. We can’t build a perfect benchmark.
>>
>> But . . . I’m willing to spend some time trying to build a good one.
Although, to be honest, I remain of the opinion that IOR and mdtest are already a good
one. I’ve heard good arguments against them but nothing sufficient to persuade me that we
can do any better. Maybe I just haven’t understood the arguments well enough.
>>
>>> There are tons more things to consider.
>> Agreed. So many that attempting to consider them all dooms us to inaction.
>>
>> Thanks,
>>
>> John
>>
>>>
>>> Best,
>>>
>>> Jay
>>>
>>> From: IO-500 <io-500-bounces(a)vi4io.org> on behalf of Julian Kunkel
<juliankunkel(a)googlemail.com>
>>> Date: Tuesday, November 22, 2016 at 12:49 AM
>>> To: John Bent <John.Bent(a)seagategov.com>
>>> Cc: "io-500(a)vi4io.org" <io-500(a)vi4io.org>
>>> Subject: [EXTERNAL] Re: [IO-500] Benchmark abstraction
>>>
>>> Dear All,
>>> I'm not *against* using IOR but at this stage, I rather favour a
clear separation between
>>> What and why certain metrics are useful to be measured and in a second
step
>>> How they are measured.
>>>
>>> This also serves as validation that we do the right thing. I found this
always useful when defining a test, and a benchmark is just a performance test for me. The
intended purpose helps not only in communication but also prevents unintentional
optimizations of systems.
>>>
>>> Again I agree that IOR could be the vehicle but I would hope the
community firstly agrees on the metrics before there might be detailed discussions about
the tool.
>>>
>>> Regards
>>> Julian
>>>
>>>
>>> Am 21.11.2016 10:37 nachm. schrieb "John Bent"
<John.Bent(a)seagategov.com>:
>>>> Thanks Sarp! Some comments in-line.
>>>>
>>>>> On Nov 21, 2016, at 2:22 PM, Oral, H. Sarp
<oralhs(a)ornl.gov> wrote:
>>>>>
>>>>> Well, I agree with John that trying to define a new and all
around benchmark is highly difficult. We tried that (and looked a few other benchmarks at
the time) and failed. No need to repeat the same mistakes, I think.
>>>>>
>>>>> And I also agree that the benchmarks need to be simple and easy
to run and representative of realistic scenarios.
>>>>>
>>>>> Rather than limiting to two IOR instances, we can perhaps
increase them slightly to cover more I/O workloads with IOR, if needed.
>>>>>
>>>>> By the way, we already have an IOR version that we integrated
with ADIOS. We can share it with the community. And IOR already supports HDF5, and MPIIO.
Between POSIX and these mid level libs, I think IOR covers a majority of the use cases.
The trick is coming up with good, canned command line option sets for IOR covering various
I/O workloads.
>>>>>
>>>>> There is really nothing else on measuring the mdtest today as
far as I know.
>>>>>
>>>> In terms of mdtest, when I speak of it, I have to admit that I’m
speaking of a theoretical future mdtest which does not yet exist. IOR is beautifully
engineered with a fantastic plug-in feature as you mention. The mdtest I’m envisioning is
taking mdtest.c from it’s current github and moving it into the IOR github and rewriting
it to replace the POSIX calls into calls to this IOR plug-in interface. The plug-in
interface is already almost a superset of what mdtest needs. I think only ‘stat’ needs to
be added. That way, when people add new plug-ins to IOR, they will simultaneously add
them to mdtest. Also, for our benchmark, they’d simply pull and ‘make’ from a single
repository.
>>>>
>>>> Any volunteers? :)
>>>>
>>>> Here’s what I believe to be the most recently maintained
repositories:
>>>>
https://github.com/MDTEST-LANL/mdtest
>>>>
https://github.com/IOR-LANL/ior
>>>>
>>>> I have to admit that I have not yet looked at md-real-io to do a
comparison. (sorry Julian, it is on my TODO list…)
>>>>
>>>>
>>>>>
>>>>> So, we are on board.
>>>>>
>>>> Fantastic. Our survey is now 1% complete. :)
>>>>
>>>> Thanks,
>>>>
>>>> John
>>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Sarp
>>>>>
>>>>> --
>>>>> Sarp Oral, PhD
>>>>>
>>>>>
>>>>>
>>>>> National Center for Computational Sciences
>>>>> Oak Ridge National Laboratory
>>>>> oralhs(a)ornl.gov
>>>>> 865-574-2173
>>>>>
>>>>>
>>>>>
>>>>> On 11/20/16, 12:33 PM, "IO-500 on behalf of Julian
Kunkel" <io-500-bounces(a)vi4io.orgon behalf of juliankunkel(a)googlemail.com>
wrote:
>>>>>
>>>>> Dear John,
>>>>> I would definitely not go with mdtest. That one can be well
optimized by read ahead / sync. Also it is POSIX only.
>>>>> Note that for overcoming the caching problem, I wrote the
md-real-io benchmark that shares many things with mdtest.
>>>>> I would wait for the community feedback and not ignore that
concepts such as ADIOS may not necessarily fit as IOR back ends and rather to with
abstract definitions first.
>>>>>
>>>>> Regards
>>>>> Julian
>>>>>
>>>>>
>>>>> Am 20.11.2016 6:12 nachm. schrieb "John Bent"
<John.Bent(a)seagategov.com>:
>>>>>
>>>>> To attempt defining the perfect IO benchmark is Quixotic.
Those who dislike IO500 will always dislike IO500 regardless of what the specific
benchmark is. Those who like the idea will accept an imperfect benchmark.
>>>>>
>>>>>
>>>>> Therefore, I suggest we move forward with the straw person
proposal: IOR hard, IOR easy, mdtest hard, mdtest easy.
>>>>>
>>>>> * Average IOR hard and IOR easy. Average mdtest hard and
mdtest easy.
>>>>> * Their product determines the winner.
>>>>> * Don’t report the product since it’s a meaningless unit;
report the averages.
>>>>> * e.g. The winner of IO500 is TaihuLight with a score of 250
GB/s and 300K IOPs.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Unless a proposal is strictly much better than IOR hard, IOR
easy, mdtest hard, mdtest easy, I don’t think we should consider it. The beauty of IOR
hard, IOR easy, mdtest hard, mdtest easy is that they are well-understood, well-accepted
benchmarks, that
>>>>> are trivial to download and compile, and whose results are
immediately understandable. Every RFP in the world uses them. The one problem is they
need a pithier name than “IOR hard, IOR easy, mdtest hard, mdtest easy”...
>>>>>
>>>>>
>>>>> My suggestion is to poll the top 100 of the
>>>>>
top500.org <
http://top500.org> and ask them this:
>>>>>
>>>>>
>>>>> "If we were to do an IO500, and our benchmark was IOR
hard, IOR easy, mdtest hard, mdtest easy, would you participate? If not, would you
participate with a different benchmark?”
>>>>>
>>>>>
>>>>> If the bulk of the answers are “yes,” then we just figure out
how to organize and administer this thing.
>>>>> If the bulk of the answers are “no,” then we give up and do
something else.
>>>>> If the bulk of the answers are “no, yes,” then we need to
find a new benchmark.
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>>
>>>>> John
>>>>>
>>>>>
>>>>> On Nov 20, 2016, at 7:11 AM, Julian Kunkel
<juliankunkel(a)googlemail.com> wrote:
>>>>>
>>>>> Dear all,
>>>>> based on our discussion during the BoF at SC, we could focus
on the
>>>>> access pattern(s) of interest first. Later we can define
which
>>>>> benchmarks (such as IOR) could implement these patterns
(e.g., how to
>>>>> call existing benchmarks).
>>>>>
>>>>> This strategy gives other I/O paradigms the option to create
a
>>>>> benchmark with that pattern that fits their I/O
paradigm/architecture.
>>>>>
>>>>>
>>>>> Here is a draft of one that is probably not too difficult to
discuss:
>>>>> Goal: IOmax: Sustained performance for well-formed I/O
>>>>>
>>>>> Rationales:
>>>>> The benchmark shall determine the best sustained I/O behavior
without
>>>>> in-memory caching and I/O variability. A set of real
applications that
>>>>> are highly optimized should be able to show the described
access
>>>>> behavior.
>>>>>
>>>>> Use case: A large data structure is distributed across N
>>>>> threads/processes; a time series of this data structured
shall be
>>>>> stored/retrieved efficiently. (This could be a checkpoint.)
>>>>>
>>>>> Processing steps:
>>>>> S0) Each thread allocates and initializes a large consecutive
memory
>>>>> region of size S with a random (but well defined) pattern
>>>>> S1) Repeat T times: Each process persists/reads its data
to/from the
>>>>> storage. Ech iteration is protected with a global barrier
and the
>>>>> runtime is measured
>>>>> S2) Compute the throughput (as IOmax) by dividing the total
accessed
>>>>> data volume (N*S) by the maximum observed runtime for any
single
>>>>> iteration in step S1
>>>>>
>>>>> Rules:
>>>>> R1) The data of each thread and timestep must be stored
individually
>>>>> and cannot be overwritten during a benchmark run
>>>>> R2) It must be ensured that the time includes all processes
needed to
>>>>> persist all data in volatile memory (for writes) and that
prior
>>>>> startup of reads no data is cached in any volatile memory
>>>>> R3) A valid result must verify that read returns the expected
(random) data
>>>>> R4) N, T and S can be set arbitrarily. T must be >= 3. The
benchmark
>>>>> shall be repeated several times
>>>>>
>>>>> Reported metrics:
>>>>> * IOmax
>>>>> * Working set size W: N*T*S
>>>>>
>>>>> Regards,
>>>>> Julian
>>>>> _______________________________________________
>>>>> IO-500 mailing list
>>>>> IO-500(a)vi4io.org
>>>>>
https://www.vi4io.org/cgi-bin/mailman/listinfo/io-500
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> STRICTLY PERSONAL AND CONFIDENTIAL. This email may contain
>>>>> confidential and proprietary material for the sole use of
>>>>> the intended recipient. Any review or distribution by others
>>>>> is strictly prohibited. If you are not the intended
recipient
>>>>> please contact the sender and delete all copies.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> STRICTLY PERSONAL AND CONFIDENTIAL. This email may contain
>>>> confidential and proprietary material for the sole use of
>>>> the intended recipient. Any review or distribution by others
>>>> is strictly prohibited. If you are not the intended recipient
>>>> please contact the sender and delete all copies.
>>
>>
>> STRICTLY PERSONAL AND CONFIDENTIAL. This email may contain
>> confidential and proprietary material for the sole use of
>> the intended recipient. Any review or distribution by others
>> is strictly prohibited. If you are not the intended recipient
>> please contact the sender and delete all copies.
>> _______________________________________________
>> IO-500 mailing list
>> IO-500(a)vi4io.org
>>
https://www.vi4io.org/cgi-bin/mailman/listinfo/io-500
>
> STRICTLY PERSONAL AND CONFIDENTIAL. This email may contain
> confidential and proprietary material for the sole use of
> the intended recipient. Any review or distribution by others
> is strictly prohibited. If you are not the intended recipient
> please contact the sender and delete all copies.
> _______________________________________________
> IO-500 mailing list
> IO-500(a)vi4io.org
>
https://www.vi4io.org/cgi-bin/mailman/listinfo/io-500
Cheers, Andreas