All,

A great first list. Now let's think about the next list for ISC. We are in agreement that the benchmarks themselves will not change although we might add some additional optional tests.

The question I'd like to put before the community right now is how to improve the usefulness of the collected numbers. I think one way to do this is collect additional environmental information. For example, we currently have been recording the number of client nodes and the number of procs per node. This allows analysis of client scalability and allows a per-client score ranking. We should collect more information such as server side information like the number of servers and the number of devices; this will allow similar server scalability analysis.

As such, I have two specific questions right now:

1. What is the best way to collect this environmental information?

2. What environmental information should we collect?

Right now, I've added the following variables into io500.sh which we are asking submitters to fill-in. This will allow us to automate the data collection on the back-end but may require the submitter to do some research when they do their run. There is a trade-off between what is useful to collect and what puts an undue burden on the submitter.

# top level info

io500_info_system_name='xxx' # e.g. Oakforest-PACS

io500_info_institute_name='xxx' # e.g. JCAHPC

io500_info_storage_age_in_months='xxx' # not install date but age since last refresh

io500_info_storage_install_date='xxx' # MM/YY

io500_info_filesysem='xxx' # e.g. BeeGFS, DataWarp, GPFS, IME, Lustre

io500_info_filesystem_version='xxx'

# client side info

io500_info_num_client_nodes='xxx'

io500_info_procs_per_node='xxx'

# server side info

io500_info_num_metadata_server_nodes='xxx'

io500_info_num_data_server_nodes='xxx'

io500_info_num_data_storage_devices='xxx' # if you have 5 data servers, and each has 5 drives, then this number is 25

io500_info_num_metadata_storage_devices='xxx' # if you have 2 metadata servers, and each has 5 drives, then this number is 10

io500_info_data_storage_type='xxx' # HDD, SSD, persistent memory, etc, feel free to put specific models

io500_info_metadata_storage_type='xxx' # HDD, SSD, persistent memory, etc, feel free to put specific models

io500_info_storage_network='xxx' # infiniband, omnipath, ethernet, etc

io500_info_storage_interface='xxx' # SAS, SATA, NVMe, etc

# miscellaneous

io500_info_whatever='WhateverElseYouThinkRelevant'

One thing that might be useful are scripts to automatically collect this info. They might be specific to different filesystems. For example, perhaps we could include 'utilities/collect_XXX.sh' scripts which automatically collect useful information for various XXX filesystems like BeeGFS, DataWarp, GPFS, IME, Lustre. And perhaps there are ways to automatically collect in a filesystem agnostic way. Here is the enviromental information we are currently collecting:

echo "System: " `uname -n`

echo "filesystem_utilization=$(df ${io500_workdir}|tail -1)"

Thanks and looking forward to hearing all of your ideas about this,

John