On Jun 21, 2017, at 1:31 AM, Andreas Dilger <adilger(a)dilger.ca>
wrote:
On Jun 19, 2017, at 8:10 PM, Bob Ciotti <Bob.Ciotti(a)nasa.gov> wrote:
>
> I still think we are missing the mark a little. As someone building infrastructure,
streaming reads and writes and completely random unaligned small block
> doesn't really give me the information I want. We should also address the
configuration for error correction because issues like mirroring in the controllers or
stripe verification dramatically impacts performance and a history of abuse here.
Thanks Bob. We hope that we will be able to capture enough info to address this
concern. Julian has been very active in building the database and webpage integration so
you should be able to do things like filter the results for only systems that used
spinning disks and at least RAID-6 durability for example.
I think this exposes an interesting question for the benchmark - what
data to record for each run, and what is available on the website?
There are a number of datapoints that can be easily recorded from the client when the
test is run and should be automatically collected and reported by the test script
(filesystem type, filesystem name/mountpoint, filesystem version, total/free filesystem
space/inodes, client OS version, client CPU/RAM/network, client count, ...). There are a
large number of other parameters that may not be immediately visible, such as tuning
parameters, etc. but that could be extracted programatically.
There are also many parameters that are not visible to the client regarding how the
servers are configured, and this would need to be manually entered by the tester.
Possibly we could have a script that collects similar data on the client and server, that
knows about the filesystem type (Lustre, GPFS, etc.) that can extract the interesting data
about the underlying storage, tunables, OS, software, etc. but I suspect this will need to
be an incremental effort or it will never be done. One option is to dump a large amount
of data (e.g. lspci, lsscsi, lscpu, Lustre config logs, etc.) into a file that accompanies
the test result, and this can be available as a "blob" for future reference
and/or extraction as our data collection becomes more advanced. That "blob"
would grow as more information is considered relevant
Thank you Andreas. I know that we are all very interested in doing just this and
you are right that it will be hard to capture everything that anyone might later realize
is relevant. As a guiding principle we are requiring that submissions provide enough
information such that they are reproducible by others. So all tuning that was done should
be included. How much is included in a blob and how much goes into the DB schema is yet
to be determined. When I did PanFS testing at LANL, we put everything into a database and
had a bash script that ran after the benchmark that tried to collect tons of environmental
info. Maybe I can dig that up as well as the database schema.
Thanks,
John
Cheers, Andreas
<signature.asc>_______________________________________________
IO-500 mailing list
IO-500(a)vi4io.org
https://www.vi4io.org/cgi-bin/mailman/listinfo/io-500