On Jun 18, 2017, at 3:41 AM, Georgios Markomanolis
<georgios.markomanolis(a)kaust.edu.sa> wrote:
An updated version of the script where the user can select filesystem (corresponding
variable), this moment Lustre and Cray DW, and then IOR uses different parameters
depending on the filesystem.
Looking at the script, a couple of things to note:
max_ost=`lfs df -h $DIRNAME | grep OST0 | wc -l`
lfs setstripe -c $max_ost $workdir
ior_easy_params="-t 1m -b 10g"
There is no need to determine "max_ost" to explicitly set the stripe count.
Instead, one could just use stripe count "-c -1" to stripe over all OSTs for the
"IOR hard" case. However, for "IOR easy" and all mdtest benchmarks
the wide striping is sub-optimal, and "-c 1" should be used.
Also, the pathname should be a parameter for the test, rather than being hard-coded in the
script.
Cheers, Andreas
Best regards,
George
________________________________________
George Markomanolis, PhD
Computational Scientist
KAUST Supercomputing Laboratory (KSL)
King Abdullah University of Science & Technology
Al Khawarizmi Bldg. (1) Room 0123
Thuwal
Kingdom of Saudi Arabia
Mob: +966 56 325 9012
Office: +966 12 808 0393
From: Georgios Markomanolis <georgios.markomanolis(a)kaust.edu.sa>
Date: Saturday, 17 June 2017 at 10:54 PM
To: Julian Kunkel <juliankunkel(a)googlemail.com>, "io-500(a)vi4io.org"
<io-500(a)vi4io.org>
Subject: Re: [IO-500] Detailed benchmark proposal
Hello everybody,
Today I had the chance to work with John for some time and we decided to make a first
version of the script. I attach it, we have tested it on our Cray Burst Buffer at KAUST
and it works without issues, I have to tune it a bit for better performance. We tried to
create some variables to be easy to modify. The script calculates everything automatic. We
need to clarify what we want as final result, now it follows the formula of the previous
email with the geometric means.
Some topics for discussion, why do we need to run mdtest and IOR for 5 minutes minimum?
For example, in our system with more than 1TB/s IOR, I need to create 300TB+ in 5 minutes
on BB. Is it not reasonable to create a file a bit more than the half memory of the
reserved resources? Do you think that there is a chance of caching by using more than half
memory? For mdtests, creating millions of files can hurt the filesystem and especially
Lustre. We had some real cases that Lustre could not handle.
I think we should start a repository and update there, we can have optimum parameters per
site also.
The BW of IOR are located in ior_$JOBID with details, for example:
Bandwidth 1 is 774570.92 MB/s and duration is 17.30 seconds
Bandwidth 2 is 123680.01 MB/s and duration is 7.78 seconds
Bandwidth 3 is 1250231.51 MB/s and duration is 10.72 seconds
Bandwidth 4 is 83650.59 MB/s and duration is 11.51 seconds
and the mdtests in the file mdt_$JOBID.
The script is prepared with SLURM commands and in this example I have used 2048 compute
nodes, so feel free to adapt.
In case that we use Lustre, we need to run the experiment on another striped folder. Even
for BB, I need to increase the MPI I/O aggregators in the case of shared file, but at
least we have a first version for now.
See you at BOF.
Best regards,
George
________________________________________
George Markomanolis, PhD
Computational Scientist
KAUST Supercomputing Laboratory (KSL)
King Abdullah University of Science & Technology
Al Khawarizmi Bldg. (1) Room 0123
Thuwal
Kingdom of Saudi Arabia
Mob: +966 56 325 9012
Office: +966 12 808 0393
From: IO-500 <io-500-bounces(a)vi4io.org> on behalf of Julian Kunkel
<juliankunkel(a)googlemail.com>
Date: Saturday, 17 June 2017 at 9:42 AM
To: "io-500(a)vi4io.org" <io-500(a)vi4io.org>
Subject: [IO-500] Detailed benchmark proposal
Somehow the mail from John did not get through, so here it is (if there is an email issue
please mail me). Thanks also for all those, we had discussions with, besides the Dagstuhl
meeting...
Von: John Bent <John.Bent(a)seagategov.com>
Gesendet: 16. Juni 2017 22:30:23 MESZ
An: "io-500(a)vi4io.org" <io-500(a)vi4io.org>
Betreff: Detailed benchmark proposal
All,
Sorry for the long silence on the mailing list. However, we have made some substantial
progress recently as we prepare for our ISC BOF next week. For those of you at ISC,
please join us from 11 to 12 on Tuesday in Substanz 1&2.
The progress that we have made recently happened because a bunch of us were attending a
German workshop last month at Dagstuhl and had multiple discussions about the benchmark.
Here’s the highlights from what was discussed and the progress that we made at Dagstuhl:
• General agreement that the IOR-hard, IOR-easy, mdtest-hard, mdtest-easy approach is
appropriate.
• We should add a ‘find’ command as this is a popular and important workload.
• The multiple bandwidth measurements should be combined via geometric mean into one
bandwidth.
• The multiple IOPs measurements should also be combined via geometric mean into one
IOPs.
• The bandwidth and the IOPs should be multiplied to create one final score.
• The ranking uses that final score but the webpage can be sorted using other metrics.
• The webpage should allow filtering as well so, for example, people can look at only
the HDD results.
• We should separate the write/create phases from the read/stat phases to help ensure
that caching is avoided
• Nathan Hjelm volunteered to combine the mdtest and IOR benchmarks into one git repo
and has now done so. This removes the #ifdef mess from mdtest and now they both share the
nice modular IOR backend
So the top-level summary of the benchmark in pseudo-code has become:
# write/create phase
bw1 = ior_easy -write [user supplies their own parameters maximizing data writes that can
be done in 5 minutes]
md1 = md_test_easy -create [user supplies their own parameters maximizing file creates
that can be done in 5 minutes]
bw2 = ior_hard -write [we supply parameters: unaligned strided into single shared file]
md2 = md_test_hard -create [we supply parameters: creates of 3900 byte files into single
shared directory]
# read/stat phase
bw3 = ior_easy -read [cross-node read of everything that was written in bw1]
md3 = md_test_easy -stat [cross-node stat of everything that was created in md1]
bw4 = ior_hard -read
md4 = md_test_hard -stat
# find phase
md5 = [we supply parameters to find a subset of the files that were created in the
tests]
# score phase
bw = geo_mean( bw1 bw2 bw3 bw4)
md = geo_mean( md1 md2 md3 md4 bd5)
total = bw * md
Now we are moving on to precisely define what the parameters should look like for the
hard tests and to create a standard so that people can start running it on their systems.
By doing so, we will define the formal process so we can actually make this an official
benchmark. Please see the attached file in which we’ve started precisely defining these
parameters. Let’s start iterating please on this file to get these parameters correct.
Thanks,
John
--
Dr. Julian Kunkel
Abteilung Forschung
Deutsches Klimarechenzentrum GmbH (DKRZ)
Bundesstraße 45a • D-20146 Hamburg • Germany
Phone: +49 40 460094-161
Fax: +49 40 460094-270
E-mail: kunkel(a)dkrz.de
URL:
http://www.dkrz.de
Geschäftsführer: Prof. Dr. Thomas Ludwig
Sitz der Gesellschaft: Hamburg
Amtsgericht Hamburg HRB 39784
This message and its contents including attachments are intended solely for the original
recipient. If you are not the intended recipient or have received this message in error,
please notify me immediately and delete this message from your computer system. Any
unauthorized use or distribution is prohibited. Please consider the environment before
printing this email.
<io_500.sh>_______________________________________________
IO-500 mailing list
IO-500(a)vi4io.org
https://www.vi4io.org/cgi-bin/mailman/listinfo/io-500
Cheers, Andreas