On Jun 19, 2017, at 8:10 PM, Bob Ciotti <Bob.Ciotti(a)nasa.gov> wrote:
I still think we are missing the mark a little. As someone building infrastructure,
streaming reads and writes and completely random unaligned small block
doesn't really give me the information I want. We should also address the
configuration for error correction because issues like mirroring in the controllers or
stripe verification dramatically impacts performance and a history of abuse here.
I think this exposes an interesting question for the benchmark - what data to record for
each run, and what is available on the website?
There are a number of datapoints that can be easily recorded from the client when the test
is run and should be automatically collected and reported by the test script (filesystem
type, filesystem name/mountpoint, filesystem version, total/free filesystem space/inodes,
client OS version, client CPU/RAM/network, client count, ...). There are a large number
of other parameters that may not be immediately visible, such as tuning parameters, etc.
but that could be extracted programatically.
There are also many parameters that are not visible to the client regarding how the
servers are configured, and this would need to be manually entered by the tester.
Possibly we could have a script that collects similar data on the client and server, that
knows about the filesystem type (Lustre, GPFS, etc.) that can extract the interesting data
about the underlying storage, tunables, OS, software, etc. but I suspect this will need to
be an incremental effort or it will never be done. One option is to dump a large amount
of data (e.g. lspci, lsscsi, lscpu, Lustre config logs, etc.) into a file that accompanies
the test result, and this can be available as a "blob" for future reference
and/or extraction as our data collection becomes more advanced. That "blob"
would grow as more information is considered relevant
Cheers, Andreas