Hi Andreas,
thanks for your feedback and schema suggestions.
I don't think (m)any of the things I proposed are run-specific
values,
except the number of clients used for the test, and possibly how full
the filesystem currently is.
That is what I also hoped for, but don't know all site-specific
details that are relevant for a run.
I propose to add the actual I/O path used to the list, e.g., local
storage may be used as write-back cache?
I think without at least some storage details being included with
the
benchmark run it is mostly just "bragging rights", as there is no way
to compare the systems being tested.
I agree, I believe it is important to deliver a proper system description.
Data centers typically lack a proper and standardized description,
therefore, I hope that overall approach will help in the long run...
I have propagated many changes. Pending suggestions are discussed as follows:
Under "benchmarks" there is no "io500"???
That is right. TOP500 and the others are currently only reported once
per system.
At the moment there is not, the intention is to allow joins between
this *table* and any benchmark,
that way one could add, e.g., multiple IO-500 results, one for each
different file system, number of clients, for a degraded state of the
storage, ...
Under "storage system", why not change "local
storage" to just "storage"
and make "storage system" an instance of "storage"? Then the
"attached"
property of the storage should allow an "interconnect" to be specified
for network-attached storage.
That is an interesting suggestion, I'm not 100%
sure if that is the
(final) answer and it deserves further comments.
The current thought is that a "storage system" provides a namespace,
typically, that means it consists of server (nodes) providing some
services.
I'd like to specify the hardware of the server nodes similarly to the
compute nodes as the compute power can be relevan too.
An HPSS system can already be constructed.
There are some limitations to this approach:
* network attached storage is not (yet) covered, maybe that becomes
something like a controller or "storage processor" instead of a
"node"
and can be attached similarly to a "tape archive", I have little
experience here to suggest a good mapping.
* how to describe the integration of node-local storage (that already
can be specified on the compute nodes) into a "storage system", e.g.,
as read cache.
Maybe this is not needed at that level but rather when describing the
I/O path for a run?
Why is the "network" option under SYSTEM not an
"interconnect" schema?
I understand the interconnect here as node-specific
hardware such as a
HBA that connects the node (one or multiple times) to one or multiple
networks.
The network is then the site-specific deployment of all hardware that
connects the nodes together, it has a topoligy, bisection bandwidth
etc.
A better representation of the network potentially should allow to
define individual switches and how they are connected.
At this point I believe that is probably too much to cover such
inventory as well. It is possible to extend the "network" schema
further and define networks in more detail.
All this depends on the information people find relevant to describe
their system.
Thanks,
Julian
--
http://wr.informatik.uni-hamburg.de/people/julian_kunkel