Hi Andreas,
one problem I see with the current implementation is that it
hard-codes the command lines used for the IO-500 tests into the same config file that
needs to be edited by users.
This is problematic if this file lives in Git, because any changes in Git will conflict
with the parameters changed by users. I had similar problems when running IO-500
previously.
I cannot completely follow.
So far we had these problems that are related:
1) changes of the "fixed-io500.sh" script that forced users to change
parameters in the io-500.sh
2) that they had not included the batch submission paramters as well.
3) changes to io-500.sh was not in Git Repository either.
What I do for me is that I change the io-500-gen.sh to include the
most relevant parameters then generate io-500.sh scripts (e.g. for 10
nodes), then I version control them.
When an update to io-500-gen.sh happens (like it did before to
io-500.sh), I create the delta to my script and inspect it. Then apply
the changes needed.
A problem is always how to incorporate changes to the execution in the
way one has created it.
I absolutely expect that we will create a version number, freeze the
script for an upcoming submission.
The script will output for which version it is. "This script generates
a valid submission for the IO-500 list in 11/2018. Do not use it for
any other list!"
Users must use the right script.
I think it would make more sense to have a separate file (marked
read-only in Git to avoid it being edited easily) that *only* contains user-editable
parameters, and has a comment at the top to make a copy (e.g. io-500-
local.sh) for local usage/editing. This local config file is sourced by the various
scripts to generate the config (possibly as a command-line argument, so that it is easy to
have multiple different config files).
This can be done.
Do you think something like:
./io-500-gen.sh config-1.sh
Would do?
That one could then generate io-500-run-config-1.sh with the stuff?
> Besides:
> * It reduces dependencies (e.g., awk on compute nodes).
Are there nodes that do *not* have awk on them?
Yes.
Conversely, if a user is running multiple iterations of io-500 in
batch (e.g. to find optimal parameters), it is better if the scores are computed as part
of the test script, rather than depending on them to run it manually on all of the output
files. Also, it may not be clear that their test runs are invalid (e.g. under 300s write
duration) if this is not show in the output. Having multiple separate steps makes it
harder for users to get a result.
It is very good practice that I adapted from the big-data community to
keep raw-data separated from any data product as it allows to run not
only different types of analysis but to improve the analysis over
time, e.g. generate new data products.
You can add the execution of the post-processing to the generated
batch file or the generator if you want! You just don't have to.
I will include a comprehensive python validator that will check every
parameter and output error that we have seen to make sure that the
output is correct.
It is not useful to encode this in bash, however, at the same time, we
faced compatibility problems running Python on the compute nodes.
That said, your choice to include it to automatize the calculation.
I rather run the full checker to ensure I have not missed anything and
compute even more derived performance values than we do for the lists.
Best,
Julian