All,

A great first list.  Now let's think about the next list for ISC.  We are in agreement that the benchmarks themselves will not change although we might add some additional optional tests.  

The question I'd like to put before the community right now is how to improve the usefulness of the collected numbers.  I think one way to do this is collect additional environmental information.  For example, we currently have been recording the number of client nodes and the number of procs per node.  This allows analysis of client scalability and allows a per-client score ranking.  We should collect more information such as server side information like the number of servers and the number of devices; this will allow similar server scalability analysis.

As such, I have two specific questions right now:

1. What is the best way to collect this environmental information?
2. What environmental information should we collect?

Right now, I've added the following variables into io500.sh which we are asking submitters to fill-in.  This will allow us to automate the data collection on the back-end but may require the submitter to do some research when they do their run.  There is a trade-off between what is useful to collect and what puts an undue burden on the submitter.

  # top level info
  io500_info_system_name='xxx'      # e.g. Oakforest-PACS
  io500_info_institute_name='xxx'   # e.g. JCAHPC
  io500_info_storage_age_in_months='xxx' # not install date but age since last refresh
  io500_info_storage_install_date='xxx'  # MM/YY
  io500_info_filesysem='xxx'     # e.g. BeeGFS, DataWarp, GPFS, IME, Lustre
  io500_info_filesystem_version='xxx'
  # client side info
  io500_info_num_client_nodes='xxx'
  io500_info_procs_per_node='xxx'
  # server side info
  io500_info_num_metadata_server_nodes='xxx'
  io500_info_num_data_server_nodes='xxx'
  io500_info_num_data_storage_devices='xxx'  # if you have 5 data servers, and each has 5 drives, then this number is 25
  io500_info_num_metadata_storage_devices='xxx'  # if you have 2 metadata servers, and each has 5 drives, then this number is 10
  io500_info_data_storage_type='xxx' # HDD, SSD, persistent memory, etc, feel free to put specific models
  io500_info_metadata_storage_type='xxx' # HDD, SSD, persistent memory, etc, feel free to put specific models
  io500_info_storage_network='xxx' # infiniband, omnipath, ethernet, etc
  io500_info_storage_interface='xxx' # SAS, SATA, NVMe, etc
  # miscellaneous
  io500_info_whatever='WhateverElseYouThinkRelevant'

One thing that might be useful are scripts to automatically collect this info.  They might be specific to different filesystems.  For example, perhaps we could include 'utilities/collect_XXX.sh' scripts which automatically collect useful information for various XXX filesystems like BeeGFS, DataWarp, GPFS, IME, Lustre.  And perhaps there are ways to automatically collect in a filesystem agnostic way.  Here is the enviromental information we are currently collecting:

  echo "System: " `uname -n`
  echo "filesystem_utilization=$(df ${io500_workdir}|tail -1)"

Thanks and looking forward to hearing all of your ideas about this,

John