[IO-500] Detailed benchmark proposal

Saturday, 17 June 2017

Somehow the mail from John did not get through, so here it is (if there is
an email issue please mail me). Thanks also for all those, we had
discussions with, besides the Dagstuhl meeting...

------------------------------
*Von:* John Bent <John.Bent(a)seagategov.com&gt;
*Gesendet:* 16. Juni 2017 22:30:23 MESZ
*An:* &quot;io-500(a)vi4io.org&quot; <io-500(a)vi4io.org&gt;
*Betreff:* Detailed benchmark proposal

All,

Sorry for the long silence on the mailing list. However, we have made some
substantial progress recently as we prepare for our ISC BOF next week.  For
those of you at ISC, please join us from 11 to 12 on Tuesday in Substanz
1&2.

The progress that we have made recently happened because a bunch of us were
attending a German workshop last month at Dagstuhl and had multiple
discussions about the benchmark.

Here’s the highlights from what was discussed and the progress that we made
at Dagstuhl:

   1. General agreement that the IOR-hard, IOR-easy, mdtest-hard,
   mdtest-easy approach is appropriate.
   2. We should add a ‘find’ command as this is a popular and important
   workload.
   3. The multiple bandwidth measurements should be combined via geometric
   mean into one bandwidth.
   4. The multiple IOPs measurements should also be combined via geometric
   mean into one IOPs.
   5. The bandwidth and the IOPs should be multiplied to create one final
   score.
   6. The ranking uses that final score but the webpage can be sorted using
   other metrics.
   7. The webpage should allow filtering as well so, for example, people
   can look at only the HDD results.
   8. We should separate the write/create phases from the read/stat phases
   to help ensure that caching is avoided
   9. Nathan Hjelm volunteered to combine the mdtest and IOR benchmarks
   into one git repo and has now done so.  This removes the #ifdef mess from
   mdtest and now they both share the nice modular IOR backend

So the top-level summary of the benchmark in pseudo-code has become:

# write/create phase
bw1 = ior_easy -write [user supplies their own parameters maximizing data
writes that can be done in 5 minutes]
md1 = md_test_easy -create [user supplies their own parameters maximizing
file creates that can be done in 5 minutes]
bw2 = ior_hard -write [we supply parameters: unaligned strided into single
shared file]
md2 = md_test_hard -create [we supply parameters: creates of 3900 byte
files into single shared directory]

# read/stat phase
bw3 = ior_easy -read [cross-node read of everything that was written in bw1]
md3 = md_test_easy -stat [cross-node stat of everything that was created in
md1]
bw4 = ior_hard -read
md4 = md_test_hard -stat

# find phase
md5 = [we supply parameters to find a subset of the files that were created
in the tests]

# score phase
bw = geo_mean( bw1 bw2 bw3 bw4)
md = geo_mean( md1 md2 md3 md4 bd5)
total = bw * md

Now we are moving on to precisely define what the parameters should look
like for the hard tests and to create a standard so that people can start
running it on their systems.  By doing so, we will define the formal
process so we can actually make this an official benchmark.  Please see the
attached file in which we’ve started precisely defining these parameters.
Let’s start iterating please on this file to get these parameters correct.

Thanks,

John

-- 
Dr. Julian Kunkel
Abteilung Forschung
Deutsches Klimarechenzentrum GmbH (DKRZ)
Bundesstraße 45a • D-20146 Hamburg • Germany

Phone:  +49 40 460094-161 <040%20460094161>
Fax: +49 40 460094-270 <040%20460094270>
E-mail: kunkel(a)dkrz.de
URL: http://www.dkrz.de

Geschäftsführer: Prof. Dr. Thomas Ludwig
Sitz der Gesellschaft: Hamburg
Amtsgericht Hamburg HRB 39784

2024

2023

2022

2021

2020

2019

2018

2017

2016

[IO-500] Detailed benchmark proposal