On Dec 2, 2018, at 3:08 AM, Julian Kunkel <juliankunkel(a)googlemail.com> wrote:
Hi Andreas,
Thanks for the feedback!
I have incorporated most of the suggestions.
> - why not run "io-500-score.sh" automatically at the end, instead of just
> printing a message to do so?
The reason is to keep everything separated as much as possible.
We may have different scoring tools that e.g. upload the data to the
webpage, compare data and so forth. Community ideas :-)
Julian,
one problem I see with the current implementation is that it hard-codes the command lines
used for the IO-500 tests into the same config file that needs to be edited by users.
This is problematic if this file lives in Git, because any changes in Git will conflict
with the parameters changed by users. I had similar problems when running IO-500
previously.
I think it would make more sense to have a separate file (marked read-only in Git to avoid
it being edited easily) that *only* contains user-editable parameters, and has a comment
at the top to make a copy (e.g. io-500-local.sh) for local usage/editing. This local
config file is sourced by the various scripts to generate the config (possibly as a
command-line argument, so that it is easy to have multiple different config files).
Besides:
* It reduces dependencies (e.g., awk on compute nodes).
Are there nodes that do *not* have awk on them?
* Invoking the scoring automatically but producing an error may be
too
easily ignored by users. They may also be tempted to rerun the test if
some output produces an error.
Conversely, if a user is running multiple iterations of io-500 in batch (e.g. to find
optimal parameters), it is better if the scores are computed as part of the test script,
rather than depending on them to run it manually on all of the output files. Also, it may
not be clear that their test runs are invalid (e.g. under 300s write duration) if this is
not show in the output. Having multiple separate steps makes it harder for users to get a
result.
* running on 1000 nodes, I don't want to waste CPU time for
post-processing.
The post-processing would obviously only be run once on as part of the job submission
script, and not on every job node. Given that io-500 takes about an hour to run, the 1s
to post-process the test output into an io-500 score file shouldn't be too much
overhead.
Cheers, Andreas