On Dec 6, 2018, at 6:24 AM, Julian Kunkel <juliankunkel(a)googlemail.com> wrote:
Hi Andreas,
> one problem I see with the current implementation is that it hard-codes
> the command lines used for the IO-500 tests into the same config file
> that needs to be edited by users.
> This is problematic if this file lives in Git, because any changes in
> Git will conflict with the parameters changed by users. I had similar
> problems when running IO-500 previously.
I cannot completely follow.
So far we had these problems that are related:
1) changes of the "fixed-io500.sh" script that forced users to change
parameters in the io-500.sh
2) that they had not included the batch submission paramters as well.
3) changes to io-500.sh was not in Git Repository either.
What I do for me is that I change the io-500-gen.sh to include the
most relevant parameters then generate io-500.sh scripts (e.g. for 10
nodes), then I version control them.
When an update to io-500-gen.sh happens (like it did before to
io-500.sh), I create the delta to my script and inspect it. Then apply
the changes needed.
A problem is always how to incorporate changes to the execution in the
way one has created it.
The "create the delta ... and apply changes" part is what is problematic
IMHO. That is fine for you to do as author of the scripts, but for other
users it can become more complex (e.g. if there is a conflict in the patch
they may not know whether the change is mandatory or optional).
Conversely, they may modify (accidentally?) the parameters that make
IOR-hard "hard" and find they get a much better performance result...
Having the user-specified parameters separate from the execution of the
tests themselves is what I'm looking for. If the user-supplied parameters
are listed first on the execution command-line, and the mandatory (easy,
hard) parameters are listed afterward, then the mandatory ones should
override any conflicting options that were specified by the user.
I absolutely expect that we will create a version number, freeze the
script for an upcoming submission.
The script will output for which version it is. "This script generates
a valid submission for the IO-500 list in 11/2018. Do not use it for
any other list!" Users must use the right script.
I'm not strictly against this, but it means the scripts need to be
updated for each list. I think it should be done in the opposite manner,
that the scripts have a specific version, and is included in the results,
but the version is independent of the list version (e.g. "IO-500 v1.2.5"
and this is valid for 2018-11, 2019-06, ...).
Then, we can determine which version of the results is valid for a given
IO-500 list (maybe some changes are inconsequential, and the same results
are usable across many lists). Users should get a reply that their
result is not acceptable for the next top list if it is not running an
approved version, but the result can still be accepted into the DB with
the old version.
> I think it would make more sense to have a separate file (marked
> read-only in Git to avoid it being edited easily) that *only* contains
> user-editable parameters, and has a comment at the top to make a copy
> (e.g. io-500-local.sh) for local usage/editing. This local config
> file is sourced by the various scripts to generate the config (possibly
> as a command-line argument, so that it is easy to have multiple different
> config files).
This can be done.
Do you think something like:
./io-500-gen.sh config-1.sh
Would do? That one could then generate io-500-run-config-1.sh with the stuff?
Yes, this is what I was thinking, but why generate the separate static
run script at this point? The "io-500.sh config-1.sh" command would be
able to run the specific commands directly, without the (IMHO not useful)
separate step of generating a static list of commands. Otherwise, there
is also a danger that "config-1.sh" is updated, but io-500-gen.sh is not
run again, and the old static script is used for the job submission.
IMHO, the fewer steps that are needed, the better. The main difference
from the current io-500.sh script is that the user parameters are kept
in a separate config file that is not in the main git repo, so it does
not see conflicts when the script is updated, and it is easy to have
multiple different configs without having to update them all when there
are changes to the run script.
>> Besides:
>> * It reduces dependencies (e.g., awk on compute nodes).
> Are there nodes that do *not* have awk on them?
Yes.
> Conversely, if a user is running multiple iterations of io-500 in
> batch (e.g. to find optimal parameters), it is better if the scores
> are computed as part of the test script, rather than depending on
> them to run it manually on all of the output files. Also, it may
> not be clear that their test runs are invalid (e.g. under 300s
> write duration) if this is not show in the output. Having multiple
> separate steps makes it harder for users to get a result.
It is very good practice that I adapted from the big-data community to
keep raw-data separated from any data product as it allows to run not
only different types of analysis but to improve the analysis over
time, e.g. generate new data products.
You can add the execution of the post-processing to the generated
batch file or the generator if you want! You just don't have to.
I will include a comprehensive python validator that will check every
parameter and output error that we have seen to make sure that the
output is correct.
It is not useful to encode this in bash, however, at the same time, we
faced compatibility problems running Python on the compute nodes.
That said, your choice to include it to automatize the calculation.
I rather run the full checker to ensure I have not missed anything and
compute even more derived performance values than we do for the lists.
Sure, I can agree with this, as long as it is clear to users what needs
to be done to get a valid result.
Cheers, Andreas