On Nov 24, 2017, at 1:40 AM, Julian Kunkel <juliankunkel(a)googlemail.com> wrote:
Hi Andreas,
for the comprehensive datacenter list, I already have a JSON schema
that covers most of these things in order to describe the data center.
Runtime-specific settings of a benchmark are not covered (and not the
goal) but these things we are interested to add for an *individual*
benchmark run.
I don't think (m)any of the things I proposed are run-specific values,
except the number of clients used for the test, and possibly how full
the filesystem currently is.
I think without at least some storage details being included with the
benchmark run it is mostly just "bragging rights", as there is no way
to compare the systems being tested.
One of my goals this year is to make the schema cover most needs and
allow sites to embed a JavaScript (or generated HTML) onto their page,
thus, allowing them to only describe their system once and use this
description when uploading benchmarking results.
Anyhow, I find it a nice way of hierarchically describing components
of the systems.
I enclosed the schema, if somebody is interested to participate here
-- welcome...
Some comments on the schema.
Under RAID, since this appears to be a restricted list, it needs to
at least have a reasonably full list of options, including RAID-5,
RAID-6, RAID-Z2, RAID-Z3.
Under "local storage" the "file system" option should include btrfs
(default for SLES11).
s/netto/net/g
s/IOOPs/IOPS/
Under "interconnect" the Cray network types should be listed, "Aries"
and "Gemini".
Under "benchmarks" there is no "io500"???
Under "storage system", why not change "local storage" to just
"storage"
and make "storage system" an instance of "storage"? Then the
"attached"
property of the storage should allow an "interconnect" to be specified
for network-attached storage.
Under "energy costs per kWh" s/Ammortized/Amortized/
Why is the "network" option under SYSTEM not an "interconnect"
schema?
Cheers, Andreas
2017-11-24 2:00 GMT+01:00 Andreas Dilger <adilger(a)dilger.ca>:
> On Nov 21, 2017, at 10:24 AM, John Bent <johnbent(a)gmail.com> wrote:
>>
>> A great first list. Now let's think about the next list for ISC. We are in
agreement that the benchmarks themselves will not change although we might add some
additional optional tests.
>>
>> The question I'd like to put before the community right now is how to improve
the usefulness of the collected numbers. I think one way to do this is collect additional
environmental information. For example, we currently have been recording the number of
client nodes and the number of procs per node. This allows analysis of client scalability
and allows a per-client score ranking. We should collect more information such as server
side information like the number of servers and the number of devices; this will allow
similar server scalability analysis.
>
> Definitely. I think being able to measure performance-per-disk, and
performance-per-server is really what makes the list useful for admins and procurement vs.
bragging rights for the fastest aggregate system.
>
>> As such, I have two specific questions right now:
>>
>> 1. What is the best way to collect this environmental information?
>
> I think as a starting point we should make a text template that lists the values we
want to collect, possibly in some form that could be easily parsed and input to the
database. Once we have the template, it is possible to write some scripts that generate
the template for a given OS and/or filesystem. Probably running once on a server and then
a field that is "N of these" instead of running it on every server.
>
>> 2. What environmental information should we collect?
>>
>> Right now, I've added the following variables into io500.sh which we are
asking submitters to fill-in. This will allow us to automate the data collection on the
back-end but may require the submitter to do some research when they do their run. There
is a trade-off between what is useful to collect and what puts an undue burden on the
submitter.
>>
>> # top level info
>> io500_info_system_name='xxx' # e.g. Oakforest-PACS
>> io500_info_institute_name='xxx' # e.g. JCAHPC
>> io500_info_storage_age_in_months='xxx' # not install date but age since
last refresh
>
> s/refresh/reformat/?
>
> Filesystem free space (df, df -i) is relevant to the performance.
>
>> io500_info_storage_install_date='xxx' # MM/YY
>> io500_info_filesysem='xxx' # e.g. BeeGFS, DataWarp, GPFS, IME,
Lustre
>> io500_info_filesystem_version='xxx'
>
> "..._server_version" (may be different than client software version)
>
>> # client side info
>> io500_info_num_client_nodes='xxx'
>> io500_info_procs_per_node='xxx'
>> # server side info
>> io500_info_num_metadata_server_nodes='xxx'
>> io500_info_num_data_server_nodes='xxx'
>> io500_info_num_data_storage_devices='xxx' # if you have 5 data servers,
and each has 5 drives, then this number is 25
>
> or devices per server?
>
>> io500_info_num_metadata_storage_devices='xxx' # if you have 2 metadata
servers, and each has 5 drives, then this number is 10
>
> ... same
>
>> io500_info_data_storage_type='xxx' # HDD, SSD, persistent memory, etc,
feel free to put specific models
>
> I would separate the storage class (HDD, SSD, NVMe, pmem) from the specific device
model(s), e.g:
>
> io500_info_data_storage_type=
>
>> io500_info_metadata_storage_type='xxx' # HDD, SSD, persistent memory,
etc, feel free to put specific models
>
>
>> io500_info_storage_network='xxx' # infiniband, omnipath, ethernet, etc
>
> with speed... ... same. For consistency, I'd provide a list of examples so that
we don't have 20 different ways to describe the same thing (makes sorting and
organizing the list more difficult), like:
>
> IB-QDR, IB-FDR, IB-EDR (not sure if these have specific speed names), OPA-100Gb,
Eth-100Mb, Eth-1Gb, Eth-10Gb
>
>> io500_info_storage_interface='xxx' # SAS, SATA, NVMe, etc
>
> I'd provide a list of examples here also, like SATA-III, SAS-3Gb, SAS-6Gb,
PCIe3x8, PCIe3x16, etc. Probably don't need USB :-)
>
>> # miscellaneous
>> io500_info_whatever='WhateverElseYouThinkRelevant'
>
> RAM per meta/data server, CPU number/type?
>
>> One thing that might be useful are scripts to automatically collect this info.
They might be specific to different filesystems. For example, perhaps we could include
'utilities/collect_XXX.sh' scripts which automatically collect useful information
for various XXX filesystems like BeeGFS, DataWarp, GPFS, IME, Lustre. And perhaps there
are ways to automatically collect in a filesystem agnostic way. Here is the enviromental
information we are currently collecting:
>>
>> echo "System: " `uname -n`
>> echo "filesystem_utilization=$(df ${io500_workdir}|tail -1)"
>
> Needs to be "df -kP" to avoid line split for long device names.
>
> "filesystem_space_total=..."
> "filesystem_space_avail=..."
> "filesystem_inode_total=..."
> "filesystem_inode_avail=..."
>
> This is easily scripted.
>
> Cheers, Andreas
>
>
>
>
>
>
> _______________________________________________
> IO-500 mailing list
> IO-500(a)vi4io.org
>
https://www.vi4io.org/cgi-bin/mailman/listinfo/io-500
>
--
http://wr.informatik.uni-hamburg.de/people/julian_kunkel
<schema-system.json>
Cheers, Andreas