IOR has an option to allocate a certain amount of the hosts memory. I suggest that we set this to 90-95 percent and the total amount of data written as twice the size of the main memory? Otherwise, the 10+ PB main memory of SUMMIT would make the list useless ;)
If I read everything correctly the current run rules define an execution time of 5 minutes and just count the numbers of bytes/iops/files touched during this time. I agree that most of the time our users do I/O in bursts. Is the benchmark basically only about “who can write the most data with one file per process in 5 mins”? Why 5 minutes and not “how long does it take to dump 80% of the main memory to some redundant permanent storage” (with fsync())?
Do we want to define some rules about how safe the data has to be? Should it be OK if this data ends up in a single burst buffer and there is no copy somewhere? I would recommend that the results are only valid if data survives one failure of one of the storage devices used.
For example: For the mdtest-workload I could imagine a file system that has directory locking turned off and is using an SSD/NVRAM backend and thus would just behave like the “IOR hard” workload.
Another point is that I am more a fan of application driven benchmarks. The numbers above do not tell me anything about my applications, so why should I actually run the benchmark? Just to to be “on the list”?
Application driven benchmarks (something like SPEC CPU, but SPEC I/O), that scale with the machine (and with the machines main memory) could actually become a standard that also the industry could use to advertise their systems.
In addition, if we as a site have I/O patterns that are close to one of the benchmarks, we could put some weight on this benchmark and adjust our tenders and the industry partner would know how to design the storage system with respect to our special requirements. Just because they know the I/O pattern because it is a standard and they know how to deal with it.
One more thing that the current approach does not deal with at all is the fact that in very near future applications will access permanent storage using interfaces that IOR does not cover and store data by using mov() instructions in the CPU. Thus, if the list is established using some combination of IOR+MDTEST+POSIX, I think it has no chance to reflect the really fast I/O subsystems that are coming like http://pmem.io/
Sorry for the lengthy statement …
Regards, Michael
--
Dr.-Ing. Michael Kluge
Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany
Contact:
Falkenbrunnen, Room 240
Phone: (+49) 351 463-34217
Fax: (+49) 351 463-37773
e-mail: michael.kluge@tu-dresden.de
Von: IO-500 [mailto:io-500-bounces@vi4io.org] Im Auftrag von John Bent
Gesendet: Freitag, 16. Juni 2017 22:30
An: io-500@vi4io.org
Betreff: [IO-500] Detailed benchmark proposal
All,
Sorry for the long silence on the mailing list. However, we have made some substantial progress recently as we prepare for our ISC BOF next week. For those of you at ISC, please join us from 11 to 12 on Tuesday in Substanz 1&2.
The progress that we have made recently happened because a bunch of us were attending a German workshop last month at Dagstuhl and had multiple discussions about the benchmark.
Here’s the highlights from what was discussed and the progress that we made at Dagstuhl:
So the top-level summary of the benchmark in pseudo-code has become:
# write/create phase
bw1 = ior_easy -write [user supplies their own parameters maximizing data writes that can be done in 5 minutes]
md1 = md_test_easy -create [user supplies their own parameters maximizing file creates that can be done in 5 minutes]
bw2 = ior_hard -write [we supply parameters: unaligned strided into single shared file]
md2 = md_test_hard -create [we supply parameters: creates of 3900 byte files into single shared directory]
# read/stat phase
bw3 = ior_easy -read [cross-node read of everything that was written in bw1]
md3 = md_test_easy -stat [cross-node stat of everything that was created in md1]
bw4 = ior_hard -read
md4 = md_test_hard -stat
# find phase
md5 = [we supply parameters to find a subset of the files that were created in the tests]
# score phase
bw = geo_mean( bw1 bw2 bw3 bw4)
md = geo_mean( md1 md2 md3 md4 bd5)
total = bw * md
Now we are moving on to precisely define what the parameters should look like for the hard tests and to create a standard so that people can start running it on their systems. By doing so, we will define the formal process so we can actually make this an official benchmark. Please see the attached file in which we’ve started precisely defining these parameters. Let’s start iterating please on this file to get these parameters correct.
Thanks,
John