On Nov 21, 2017, at 10:24 AM, John Bent <johnbent(a)gmail.com> wrote:
A great first list. Now let's think about the next list for ISC. We are in
agreement that the benchmarks themselves will not change although we might add some
additional optional tests.
The question I'd like to put before the community right now is how to improve the
usefulness of the collected numbers. I think one way to do this is collect additional
environmental information. For example, we currently have been recording the number of
client nodes and the number of procs per node. This allows analysis of client scalability
and allows a per-client score ranking. We should collect more information such as server
side information like the number of servers and the number of devices; this will allow
similar server scalability analysis.
Definitely. I think being able to measure performance-per-disk, and
performance-per-server is really what makes the list useful for admins and procurement vs.
bragging rights for the fastest aggregate system.
As such, I have two specific questions right now:
1. What is the best way to collect this environmental information?
I think as a starting point we should make a text template that lists the values we want
to collect, possibly in some form that could be easily parsed and input to the database.
Once we have the template, it is possible to write some scripts that generate the template
for a given OS and/or filesystem. Probably running once on a server and then a field that
is "N of these" instead of running it on every server.
2. What environmental information should we collect?
Right now, I've added the following variables into io500.sh which we are asking
submitters to fill-in. This will allow us to automate the data collection on the back-end
but may require the submitter to do some research when they do their run. There is a
trade-off between what is useful to collect and what puts an undue burden on the
submitter.
# top level info
io500_info_system_name='xxx' # e.g. Oakforest-PACS
io500_info_institute_name='xxx' # e.g. JCAHPC
io500_info_storage_age_in_months='xxx' # not install date but age since last
refresh
s/refresh/reformat/?
Filesystem free space (df, df -i) is relevant to the performance.
io500_info_storage_install_date='xxx' # MM/YY
io500_info_filesysem='xxx' # e.g. BeeGFS, DataWarp, GPFS, IME, Lustre
io500_info_filesystem_version='xxx'
"..._server_version" (may be different than client software version)
# client side info
io500_info_num_client_nodes='xxx'
io500_info_procs_per_node='xxx'
# server side info
io500_info_num_metadata_server_nodes='xxx'
io500_info_num_data_server_nodes='xxx'
io500_info_num_data_storage_devices='xxx' # if you have 5 data servers, and
each has 5 drives, then this number is 25
or devices per server?
io500_info_num_metadata_storage_devices='xxx' # if you
have 2 metadata servers, and each has 5 drives, then this number is 10
... same
io500_info_data_storage_type='xxx' # HDD, SSD, persistent
memory, etc, feel free to put specific models
I would separate the storage class (HDD, SSD, NVMe, pmem) from the specific device
model(s), e.g:
io500_info_data_storage_type=
io500_info_metadata_storage_type='xxx' # HDD, SSD,
persistent memory, etc, feel free to put specific models
io500_info_storage_network='xxx' # infiniband, omnipath,
ethernet, etc
with speed... ... same. For consistency, I'd provide a list of examples so that we
don't have 20 different ways to describe the same thing (makes sorting and organizing
the list more difficult), like:
IB-QDR, IB-FDR, IB-EDR (not sure if these have specific speed names), OPA-100Gb,
Eth-100Mb, Eth-1Gb, Eth-10Gb
io500_info_storage_interface='xxx' # SAS, SATA, NVMe, etc
I'd provide a list of examples here also, like SATA-III, SAS-3Gb, SAS-6Gb, PCIe3x8,
PCIe3x16, etc. Probably don't need USB :-)
# miscellaneous
io500_info_whatever='WhateverElseYouThinkRelevant'
RAM per meta/data server, CPU number/type?
One thing that might be useful are scripts to automatically collect
this info. They might be specific to different filesystems. For example, perhaps we
could include 'utilities/collect_XXX.sh' scripts which automatically collect
useful information for various XXX filesystems like BeeGFS, DataWarp, GPFS, IME, Lustre.
And perhaps there are ways to automatically collect in a filesystem agnostic way. Here is
the enviromental information we are currently collecting:
echo "System: " `uname -n`
echo "filesystem_utilization=$(df ${io500_workdir}|tail -1)"
Needs to be "df -kP" to avoid line split for long device names.
"filesystem_space_total=..."
"filesystem_space_avail=..."
"filesystem_inode_total=..."
"filesystem_inode_avail=..."
This is easily scripted.
Cheers, Andreas