On Apr 5, 2017, at 10:18 PM, John Bent <John.Bent(a)seagategov.com> wrote:
All,
I met with Ilene Carpenter about a week ago and she had a bunch of interesting thoughts
about IO-500. Her’s are top-level than my thoughts are the sub-bullets.
• Maybe it should be called HPC IO-500 to distinguish it from hyper scalars since the
focus is on HPC and the machines on the Top 500 list.
• On the other hand, Top 500 is not called HPC Top 500.
• Benchmarks get what benchmarks get. What about performance when another job is
running?
• The idea was data-easy, data-hard, metadata-easy, metadata-hard and then figure out
how to combine these four into one number. Perhaps we add a fifth test which is running
all four simultaneously.
I'm not against running all of them together, but the aggregate shouldn't be worse
than data-hard + metadata-hard in the end, or by definition those are not the
"hard" numbers we were looking for?
• Storage performance degrades with age. You run Linpack on day 1
and get a number. You run Linpack on day N and get the same number. If your purpose is
to bound user expectation, then a number from day 1 may no longer be the correct bound on
day N when the storage is fragmented.
Agreed. While the "-easy" numbers may degrade over time, at least the
"-hard" numbers shouldn't get worse with age. Also, there is value to
knowing how the "-easy" numbers decline over time. The amount that the
"-easy" performance declines is a function of the underlying filesystem and
storage technology (e.g. SSD vs. HDD) so unlike Top 500 there would be value in re-running
the benchmarks every year and amending the storage system's entry (though I think the
ranking should be based on the peak number).
• Another challenge with IO 500 is that Linpack is so easy. You run
it and you get two indisputable answers: the result which can be verified to be correct
and the time that it took to get the result. IO benchmarks are much harder. Are people
allowed to set up RAID0 RAM disks and do the benchmark into them?
Sure, if that is a storage option that they actually provide for users. I know of a few
sites that use(d) RAM-based Lustre filesystems for shared storage since their application
didn't have checkpoint-restart so if any node crashed the test run was lost anyway.
• IO 500 is great because it enables people to look at historical
trends.
• For example, when do various systems start showing up? If I’m procuring a new
storage system and I see that Ceph, BeeGFS, or OrangeFS are high up in the IO 500 then I’m
much likely consider them instead of just Lustre and GPFS.
Definitely one of the things I've wanted to see in the past was which filesystem each
system on the Top-500 was using. We've had to generate these results manually in the
past.
• NREL just doubled their flops on a system but didn’t touch
storage. It’d be nice if the IO 500 could somehow capture this. Both before and after
will have the same storage performance but the after system is worse because it is
imbalanced.
While the number of clients running the tests should be part of the results, I don't
think a later change in the client system should affect previous results (see my earlier
comment about submitting updated numbers periodically). There could be the ability to link
the IO-500 storage results to the Top-500 system results, but there isn't necessarily
a 1:1 relationship between them.
In particular, many sites have Lustre or GPFS site-wide filesystems that are accessed by
multiple (and changing) compute clusters, so while a particular benchmark result may
depend on the number of clients actually running the test, the result itself shouldn't
be a function of the total number of clients accessing the storage.
Cheers, Andreas
• Aren’t you all behind the schedule that you yourself John Bent
proposed after the SC BoF?
• ….
My impression was that Ilene, who is on this list (Hi Ilene!) likes the idea and supports
the idea and hopes we are successful with the idea but just wants to identify some of the
reasons why it is important and why it is challenging.
Thanks,
John
_______________________________________________
IO-500 mailing list
IO-500(a)vi4io.org
https://www.vi4io.org/cgi-bin/mailman/listinfo/io-500
Cheers, Andreas