I still think we are missing the mark a little. As someone building infrastructure,
streaming reads and writes and completely random unaligned small block
doesn't really give me the information I want. We should also address the
configuration for error correction because issues like mirroring in the controllers or
stripe verification dramatically impacts performance and a history of abuse here.
If I were to grossly generalize, what we would see is a write dominated bi-modal
distribution of random aligned 1 meg io's mixed with a similar number of
4k io's and a smattering between the two. There are typically order 100's of
independent i/o streams that make identifying and managing/coalescing sequential
streams difficult and while we see some coalescing at the block layer, its an area ripe
for improvement and an i/o system that could do that would be most useful.
The proposed tests may not reward that.
Its also the case that I don't want to over-provision i/o, so we will come in well
below many sites, but even so, we have lots of unused streaming capability
and are mostly impacted from SW design issues either within lustre or the application,
some can be dealt with like a finding users aggressively stat'ing a
non-existant file in parallel, and some are more embedded like the geometry of a given
simulation not matching up with the underlying device structure as the
requests make their way through several layers of SW like the FORTRAN I/O library, HDF,
NetCDF, etc.
So I guess my concrete suggestion is that a mixed workload number is most useful, and that
number will be no where near the peak throughput or the completely
random and I don't think that can be a derived quantity (i.e. geo mean), it must be
measured directly.
bob
-------------------------------------------------------------------------
Robert B. Ciotti Chief Systems Architect Supercomputing
NASA Advanced Supercomputing (NAS) Division TEL (650) 604-4408
NASA Ames Research Center FAX (650) 604-4377
P.O. Box 1, Moffett Field, CA 94035-0001 Bob.Ciotti(a)NASA.gov
-------------------------------------------------------------------------
On 06/16/2017 01:30 PM, John Bent wrote:
All,
Sorry for the long silence on the mailing list. However, we have made some substantial
progress recently as we prepare for our ISC BOF next week. For those of you at ISC,
please join us from 11 to 12 on Tuesday <x-apple-data-detectors://0> in Substanz
1&2.
The progress that we have made recently happened because a bunch of us were attending a
German workshop last month at Dagstuhl and had multiple discussions about the benchmark.
Here’s the highlights from what was discussed and the progress that we made at Dagstuhl:
1. General agreement that the IOR-hard, IOR-easy, mdtest-hard, mdtest-easy approach is
appropriate.
2. We should add a ‘find’ command as this is a popular and important workload.
3. The multiple bandwidth measurements should be combined via geometric mean into one
bandwidth.
4. The multiple IOPs measurements should also be combined via geometric mean into one
IOPs.
5. The bandwidth and the IOPs should be multiplied to create one final score.
6. The ranking uses that final score but the webpage can be sorted using other metrics.
7. The webpage should allow filtering as well so, for example, people can look at only
the HDD results.
8. We should separate the write/create phases from the read/stat phases to help ensure
that caching is avoided
9. Nathan Hjelm volunteered to combine the mdtest and IOR benchmarks into one git repo
and has now done so. This removes the #ifdef mess from mdtest and now they both share the
nice modular IOR backend
So the top-level summary of the benchmark in pseudo-code has become:
# write/create phase
bw1 = ior_easy -write [user supplies their own parameters maximizing data writes that can
be done in 5 minutes]
md1 = md_test_easy -create [user supplies their own parameters maximizing file creates
that can be done in 5 minutes]
bw2 = ior_hard -write [we supply parameters: unaligned strided into single shared file]
md2 = md_test_hard -create [we supply parameters: creates of 3900 byte files into single
shared directory]
# read/stat phase
bw3 = ior_easy -read [cross-node read of everything that was written in bw1]
md3 = md_test_easy -stat [cross-node stat of everything that was created in md1]
bw4 = ior_hard -read
md4 = md_test_hard -stat
# find phase
md5 = [we supply parameters to find a subset of the files that were created in the
tests]
# score phase
bw = geo_mean( bw1 bw2 bw3 bw4)
md = geo_mean( md1 md2 md3 md4 bd5)
total = bw * md
Now we are moving on to precisely define what the parameters should look like for the
hard tests and to create a standard so that people can start running it on their systems.
By doing so, we will define the formal process so we can actually make this an official
benchmark. Please see the attached file in which we’ve started precisely defining these
parameters. Let’s start iterating please on this file to get these parameters correct.
Thanks,
John
_______________________________________________
IO-500 mailing list
IO-500(a)vi4io.org
https://www.vi4io.org/cgi-bin/mailman/listinfo/io-500