I met with Ilene Carpenter about a week ago and she had a bunch of interesting thoughts about IO-500. Her’s are top-level than my thoughts are the sub-bullets.
1. Maybe it should be called HPC IO-500 to distinguish it from hyper scalars since the focus is on HPC and the machines on the Top 500 list.
* On the other hand, Top 500 is not called HPC Top 500.
2. Benchmarks get what benchmarks get. What about performance when another job is running?
* The idea was data-easy, data-hard, metadata-easy, metadata-hard and then figure out how to combine these four into one number. Perhaps we add a fifth test which is running all four simultaneously.
3. Storage performance degrades with age. You run Linpack on day 1 and get a number. You run Linpack on day N and get the same number. If your purpose is to bound user expectation, then a number from day 1 may no longer be the correct bound on day N when the storage is fragmented.
4. Another challenge with IO 500 is that Linpack is so easy. You run it and you get two indisputable answers: the result which can be verified to be correct and the time that it took to get the result. IO benchmarks are much harder. Are people allowed to set up RAID0 RAM disks and do the benchmark into them?
5. IO 500 is great because it enables people to look at historical trends.
* For example, when do various systems start showing up? If I’m procuring a new storage system and I see that Ceph, BeeGFS, or OrangeFS are high up in the IO 500 then I’m much likely consider them instead of just Lustre and GPFS.
6. NREL just doubled their flops on a system but didn’t touch storage. It’d be nice if the IO 500 could somehow capture this. Both before and after will have the same storage performance but the after system is worse because it is imbalanced.
7. Aren’t you all behind the schedule that you yourself John Bent proposed after the SC BoF?
My impression was that Ilene, who is on this list (Hi Ilene!) likes the idea and supports the idea and hopes we are successful with the idea but just wants to identify some of the reasons why it is important and why it is challenging.
this is a side thread only regarding the naming conventions, please
have a look here:
>From my perspective this is resolved:
The HPSL currently supports site (facility), supercomputer and storage
system in a component fashion.
A site can have multiple supercomputers and storage systems.
An identifier is <site>/<supercomputer | storage>.
A storage system and supercomputer can have a name.
Most storage names provided in the list so far are dull names like
"Lustre" or "Lustre work" but this could be improved if the names of
file systems would be available / easy to find.
@Ilene and Andreas:
Please let me hear, if this resolves the issues or if some
modifications to the current scheme would be necessary.
The performance values of Top500, Graph500 etc. are also part of the