Can I get an invite to join slack. The website page says, ask admin to invite.
On 6/1/20, 8:43 AM, "io-500-bounces(a)vi4io.org on behalf of
io-500-request(a)vi4io.org" <io-500-bounces(a)vi4io.org on behalf of
io-500-request(a)vi4io.org> wrote:
Send IO-500 mailing list submissions to
io-500(a)vi4io.org
To subscribe or unsubscribe via the World Wide Web, visit
or, via email, send a message with subject or body 'help' to
io-500-request(a)vi4io.org
You can reach the person managing the list at
io-500-owner(a)vi4io.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of IO-500 digest..."
Today's Topics:
1. Re: Find phase - discrepancies between io500.sh & io500 C
version. C-version fails (John Bent)
----------------------------------------------------------------------
Message: 1
Date: Mon, 1 Jun 2020 09:42:45 -0600
From: John Bent <johnbent(a)gmail.com>
To: Julian Kunkel <juliankunkel(a)googlemail.com>
Cc: Mark Nelson <mnelson(a)redhat.com>om>, io-500(a)vi4io.org
Subject: Re: [IO-500] Find phase - discrepancies between io500.sh &
io500 C version. C-version fails
Message-ID:
<CAGfFL-z4Ny6gQkfOfkApdjAem-C6CiKJcYE8fE6BsZfp9u0JDQ(a)mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Definitely like github for issues. Personally I prefer the mailing list
over slack for discussions though. For me, it feels easier to search and
easier to group things into one coherent thread. I might be showing my age
with this preference. :)
On Mon, Jun 1, 2020 at 9:35 AM Julian Kunkel via IO-500 <io-500(a)vi4io.org>
wrote:
Thanks all for your reporting,
I personally agree with the approach: we should try to keep the
mailing list for most important discussions and post such error
messages either as issues in GitHub or discuss them on Slack to
minimize traffic.
Thanks & Best,
Julian
On Mon, Jun 1, 2020 at 4:05 PM Mark Nelson via IO-500 <io-500(a)vi4io.org>
wrote:
On 6/1/20 8:44 AM, Pinkesh Valdria via IO-500 wrote:
> I made some progress over the weekend to troubleshoot why find phase
> was not working, but I am not out of the woods. Appreciate if
> someone can confirm if I am on the right path or acknowledge if the
> below are known issues and there are workarounds for them.
>
> Thanks for your help. It’s a long email, but detailed to ensure,
> there is not a lot of back-n-forth.
>
> Here are some differences I found and workaround I had to use.
>
> *_Issue1: _*
>
> Io500 (C version) expects a field labelled: external-extra-args in the
> config.ini file, but the non-C version (io500.sh) logic is looking
> for field labelled: “external-args” (without “-extra-“) , see below
> line:
>
>
https://urldefense.com/v3/__https://github.com/VI4IO/io500-app/blob/master/…
>
>
> io500_find_cmd_args="$(get_ini_param find external-args)"
>
>
> less config-full.ini
>
> [find]
>
> # Set to an external script to perform the find phase
>
> external-script =
>
> # Extra arguments for external scripts
>
> external-extra-args =
>
> # Startup arguments for external scripts
>
> external-mpi-args =
>
> # Set the number of processes for pfind
>
> nproc =
>
> # Pfind queue length
>
> pfind-queue-length = 10000
>
> # Pfind Steal from next
>
> pfind-steal-next = FALSE
>
> # Parallelize the readdir by using hashing. Your system must support
this!
pfind-parallelize-single-dir-access-using-hashing = FALSE
*Temporary workaround*: I changed the io500.sh code to look for:
/io500_find_cmd_args="$(get_ini_param find external-extra-args)"/
*_Issue2_*: I had to set*__*io500_find_mpi to True in the io500.sh
script to avoid getting this error: “/io500_find_mpi: unbound
variable/” , but I don’t know if there is a different way to set the
value for the C-version using config.ini file, can someone share how
to pass that value for the C-app version?
function setup_find {
io500_find_mpi="True"
*_Issue3:_***How do I validate that the parameters I am setting in the
config.ini file are been used at runtime. I set the following, but
don’t see them below:
[find]
external-script = /mnt/beeond/io500-app/bin/pfind
#nproc = 30
pfind-queue-length = 2000
pfind-steal-next = TRUE
pfind-parallelize-single-dir-access-using-hashing = FALSE*__*
io500 – C version hangs at the below command and I don’t see queue
length, steal-next, etc
[find]
t_start = 2020-06-01 08:28:30
exe = /mnt/beeond/io500-app/bin/pfind ./out//2020.06.01-06.57.23-app
-newer ./out//2020.06.01-06.57.23-app/timestampfile -size 3901c -name
"*01*"
nproc = 1
*_Issue4_*: Manual workaround to make “find” phase work by setting
some parameters in io500.sh, but they are not passed to the C-version
and hence that fails.
In io500.sh script:
function setup_find {
io500_find_mpi="True"
io500_find_cmd_args="-N -q 2000 -s $io500_stonewall_timer -r
$io500_result_dir/pfind_results"
__
_Non-C version - success _
[Starting] find
[Exec] mpiexec --allow-run-as-root -mca btl self -x UCX_TLS=rc,self,sm
-x HCOLL_ENABLE_MCAST_ALL=0 -mca coll_hcoll_enable 0 -x
UCX_IB_TRAFFIC_CLASS=105 -x UCX_IB_GID_INDEX=3 -n 30 -npernode 10
--hostfile /mnt/beeond/hostsfile.cn /mnt/beeond/io500-app/bin/pfind
./out//2020.06.01-06.57.23-scr -newer
./out//2020.06.01-06.57.23-scr/timestampfile -size 3901c -name "*01*"
-N -q 2000 -s 300 -r ./results//2020.06.01-06.57.23-scr/pfind_results
[Results] in ./results//2020.06.01-06.57.23-scr/find.txt.
[FIND] MATCHED 28170/15966876 in 44.6593 seconds
[RESULT] IOPS phase 3 find 357.520
kiops : time 44.66 seconds
_io500 – C version hangs at the below command and I don’t see queue
length, steal-next, etc _
[find]
t_start = 2020-06-01 08:28:30
exe = /mnt/beeond/io500-app/bin/pfind ./out//2020.06.01-06.57.23-app
-newer ./out//2020.06.01-06.57.23-app/timestampfile -size 3901c -name
"*01*"
nproc = 1
*_Issue5_*: As you can see in issue#4, I am passing some parameters
with bash variables, if I do the same in the config.ini file, they
will be passed as is without getting interpreted by the bash script.
How do I pass such variable for the C-version of io500 ?
Already tried the below and they are not interpreted when processed by
io500 C-app version.
In config.ini file:
external-extra-args = -s \$io500_stonewall_timer -r
\$io500_result_dir/pfind_results
or
external-extra-args = -s $io500_stonewall_timer -r
$io500_result_dir/pfind_results
[find]
t_start = 2020-05-31 11:43:52
exe = /mnt/beeond/io500-app/bin/pfind -s $io500_stonewall_timer -r
$io500_result_dir/pfind_results ./out//2020.05.31-10.52.56-app -newer
./out//2020.05.31-10.52.56-app/timestampfile -size 3901c -name "*01*"
nproc = 1
[find]
t_start = 2020-05-31 15:55:52
exe = /mnt/beeond/io500-app/bin/pfind -s \$io500_stonewall_timer -r
\$io500_result_dir/pfind_results ./out//2020.05.31-15.55.38-app -newer
./out//2020.05.31-15.55.38-app/timestampfile -size 3901c -name "*01*"
nproc = 1
This is a small test cluster
less 2020.06.01-06.57.23-scr/result_summary.txt
[RESULT] BW phase 1 ior_easy_write 6.062
GiB/s : time 362.17 seconds
[RESULT] IOPS phase 1 mdtest_easy_write 7.300
kiops : time 2054.93 seconds
[RESULT] BW phase 2 ior_hard_write 1.605
GiB/s : time 321.58 seconds
[RESULT] IOPS phase 2 mdtest_hard_write 3.173
kiops : time 304.69 seconds
[RESULT] IOPS phase 3 find 357.520
kiops : time 44.66 seconds
[RESULT] BW phase 3 ior_easy_read 8.269
GiB/s : time 265.51 seconds
[RESULT] IOPS phase 4 mdtest_easy_stat 144.149
kiops : time 104.06 seconds
[RESULT] BW phase 4 ior_hard_read 3.847
GiB/s : time 134.18 seconds
[RESULT] IOPS phase 5 mdtest_hard_stat 82.220
kiops : time 11.76 seconds
[RESULT] IOPS phase 6 mdtest_easy_delete 54.334
kiops : time 276.07 seconds
[RESULT] IOPS phase 7 mdtest_hard_read 22.822
kiops : time 42.37 seconds
[RESULT] IOPS phase 8 mdtest_hard_delete 8.042
kiops : time 123.97 seconds
[SCORE] Bandwidth 4.19424 GiB/s : IOPS 31.5378 kiops : TOTAL 11.5012
C-version app – partial result – which hangs at find:
[root@inst-5n58i-good-crow results]# less
2020.06.01-06.57.23-app/result.txt | egrep "\[|score"
[ior-easy-write]
score = 6.136592
[mdtest-easy-write]
score = 41.913878
[timestamp]
[ior-hard-write]
score = 1.538194
[mdtest-hard-write]
score = 3.012750
[find]
[root@inst-5n58i-good-crow results]#
FWIW the default version of find is still working the same as it did
previously for us. This is not really related to what you are seeing,
but what I'm still noticing is that we're getting an unreasonable find
advantage when our easy and hard create results are highly skewed (which
I'm still working on debugging since we are still sometimes seeing a lot
of variation). This is from an 8 node development cluster:
First run (C-app in this case):
[RESULT] ior-easy-write 59.443067 GiB/s : time 467.383
seconds
[RESULT] mdtest-easy-write 277.570531
kIOPS : time 369.282
seconds
[RESULT] ior-hard-write 11.037549
GiB/s : time 333.391
seconds
[RESULT] mdtest-hard-write 48.336249
kIOPS : time 328.619
seconds
[RESULT] find 512.379261
kIOPS : time 230.722
seconds
[RESULT] ior-easy-read 70.285275
GiB/s : time 397.338
seconds
[RESULT] mdtest-easy-stat 627.847802
kIOPS : time 165.087
seconds
[RESULT] ior-hard-read 13.879120
GiB/s : time 275.325
seconds
[RESULT] mdtest-hard-stat 179.562335
kIOPS : time 1221.149
seconds
[RESULT] mdtest-easy-delete 140.591170
kIOPS : time 917.852
seconds
[RESULT] mdtest-hard-read 43.742740
kIOPS : time 362.040
seconds
[RESULT] mdtest-hard-delete 11.804945
kIOPS : time 1341.281
seconds
[SCORE] Bandwidth 28.284600 GB/s : IOPS
124.102186 kiops : TOTAL
59.246778
Second run (Script):
[RESULT] BW phase 1 ior_easy_write 55.296 GiB/s : time
477.21 seconds
[RESULT] BW phase 2 ior_hard_write 10.749 GiB/s : time
331.08 seconds
[RESULT] BW phase 3 ior_easy_read 67.308 GiB/s : time
391.89 seconds
[RESULT] BW phase 4 ior_hard_read 14.842 GiB/s : time
239.77 seconds
[RESULT] IOPS phase 1 mdtest_easy_write 278.851 kiops : time
367.23 seconds
[RESULT] IOPS phase 2 mdtest_hard_write 8.455 kiops : time
323.59 seconds
[RESULT] IOPS phase 3 find 1470.910 kiops : time
71.48 seconds
[RESULT] IOPS phase 4 mdtest_easy_stat 967.127 kiops : time
105.89 seconds
[RESULT] IOPS phase 5 mdtest_hard_stat 47.136 kiops : time
58.04 seconds
[RESULT] IOPS phase 6 mdtest_easy_delete 224.795 kiops : time
455.53 seconds
[RESULT] IOPS phase 7 mdtest_hard_read 46.749 kiops : time
58.52 seconds
[RESULT] IOPS phase 8 mdtest_hard_delete 28.490 kiops : time
99.08 seconds
[SCORE] Bandwidth 27.7589 GiB/s : IOPS 121.449 kiops : TOTAL 58.0628
We don't really "deserve" a score of 58 on that second run because the
only reason find was so fast was that we wrote out far less
mdtest_hard_write data.
Mark
> *From: *Pinkesh Valdria <pinkesh.valdria(a)oracle.com>
> *Date: *Saturday, May 30, 2020 at 3:01 AM
> *To: *Andreas Dilger <adilger(a)dilger.ca>
> *Cc: *<io-500(a)vi4io.org>
> *Subject: *Re: [IO-500] Io500 runs twice - is that expected starting
> 2020 ?
>
> Thanks Andreas,
>
> The C-app benchmark failed, I mean it never completed and there was
> no score at the end, I waited for 8 hours before I did ctrl-c . It
> only has 4 results vs 12 results in the first section. Are there
> special logs to see for the C version of the benchmark.
>
> The below is a very small development cluster I am using until I
> figure out how to run IO500 correctly.
>
> [Leaving] datafiles in ./out//2020.05.29-17.31.34-scr
>
> [Summary] Results files in ./results//2020.05.29-17.31.34-scr
>
> [Summary] Data files in ./out//2020.05.29-17.31.34-scr
>
> [RESULT] BW phase 1 ior_easy_write 6.188
> GiB/s : time 357.44 seconds
>
> [RESULT] BW phase 2 ior_hard_write 1.132
> GiB/s : time 367.41 seconds
>
> [RESULT] BW phase 3 ior_easy_read 8.090
> GiB/s : time 273.37 seconds
>
> [RESULT] BW phase 4 ior_hard_read 3.726
> GiB/s : time 111.69 seconds
>
> [RESULT] IOPS phase 1 mdtest_easy_write 4.263
> kiops : time 3518.34 seconds
>
> [RESULT] IOPS phase 2 mdtest_hard_write 2.953
> kiops : time 303.52 seconds
>
> [RESULT] IOPS phase 3 find 91.550
> kiops : time 173.64 seconds
>
> [RESULT] IOPS phase 4 mdtest_easy_stat 137.243
> kiops : time 109.30 seconds
>
> [RESULT] IOPS phase 5 mdtest_hard_stat 84.140
> kiops : time 10.65 seconds
>
> [RESULT] IOPS phase 6 mdtest_easy_delete 55.311
> kiops : time 271.19 seconds
>
> [RESULT] IOPS phase 7 mdtest_hard_read 21.778
> kiops : time 41.16 seconds
>
> [RESULT] IOPS phase 8 mdtest_hard_delete 7.133
> kiops : time 129.24 seconds
>
> [SCORE] Bandwidth 3.81212 GiB/s : IOPS 24.1149 kiops : TOTAL 9.58796
>
> The io500.sh was run
>
> Running the C version of the benchmark now
>
> IO500 version io500-isc20
>
> [RESULT] ior-easy-write 6.233210 GiB/s : time 359.216 seconds
>
> [RESULT] mdtest-easy-write 3.750549 kIOPS : time 3999.448
seconds
>
> [RESULT] ior-hard-write 1.415903 GiB/s : time 349.737 seconds
>
> [RESULT] mdtest-hard-write 3.006432 kIOPS : time 305.006 seconds
>
> ^C
>
> [root@inst-q7cdd-good-crow io500-app]#
>
> *From: *Andreas Dilger <adilger(a)dilger.ca>
> *Date: *Saturday, May 30, 2020 at 2:33 AM
> *To: *Pinkesh Valdria <pinkesh.valdria(a)oracle.com>
> *Cc: *<io-500(a)vi4io.org>
> *Subject: *Re: [IO-500] Io500 runs twice - is that expected starting
> 2020 ?
>
> Hi Pinkesh,
>
> The dual runs of the IO500 benchmark for this list are intentional,
>
> and documented in the README-ISC20.txt file in the source tree.
>
> This is to allow comparison between the historical io500.sh script and
the
>
> new C application that runs the same IOR, mdtest, and find commands.
>
> Please submit both results for ISC'20.
>
> We wanted to be sure that the transition to the new C-app didn't
>
> introduce any errors in the results. The need to run the benchmark
>
> twice will hopefully be gone for the SC'20 list.
>
> Cheers, Andreas(*)
>
> * speaking on my own behalf and not on behalf of the IO500 board
>
>
>
>
> On May 29, 2020, at 16:02, Pinkesh Valdria via IO-500
> <io-500(a)vi4io.org> wrote:
>
>
>
> Hello IO-500 experts,
>
> I am trying to configure io500 . When I run it, it runs twice,
> first one is regular and 2^nd one is called “Running the C version
> of the benchmark now”. Is it because I misconfigured it or is it
> required to run both, starting 2020 ? My config*.ini file is
> below.
>
> [root@inst-q7cdd-good-crow io500-app]# ./io500.sh config-test1.ini
>
> System: inst-q7cdd-good-crow
>
> …..
>
> Running the IO500 Benchmark now
>
> [Creating] directories
>
> …..
>
> [Summary] Results files in ./results//2020.05.29-17.31.34-scr
>
> [Summary] Data files in ./out//2020.05.29-17.31.34-scr
>
> [RESULT] BW phase 1 ior_easy_write 6.188 GiB/s :
> time 357.44 seconds
>
> [RESULT] BW phase 2 ior_hard_write 1.132 GiB/s :
> time 367.41 seconds
>
> [RESULT] BW phase 3 ior_easy_read 8.090 GiB/s
> : time 273.37 seconds
>
> [RESULT] BW phase 4 ior_hard_read 3.726 GiB/s :
> time 111.69 seconds
>
> [RESULT] IOPS phase 1 mdtest_easy_write 4.263 kiops
> : time 3518.34 seconds
>
> [RESULT] IOPS phase 2 mdtest_hard_write 2.953 kiops
> : time 303.52 seconds
>
> [RESULT] IOPS phase 3 find 91.550 kiops :
> time 173.64 seconds
>
> [RESULT] IOPS phase 4 mdtest_easy_stat 137.243 kiops :
> time 109.30 seconds
>
> [RESULT] IOPS phase 5 mdtest_hard_stat 84.140 kiops :
> time 10.65 seconds
>
> [RESULT] IOPS phase 6 mdtest_easy_delete 55.311 kiops :
> time 271.19 seconds
>
> [RESULT] IOPS phase 7 mdtest_hard_read
> 21.778 kiops : time 41.16 seconds
>
> [RESULT] IOPS phase 8 mdtest_hard_delete 7.133
> kiops : time 129.24 seconds
>
> [SCORE] Bandwidth 3.81212 GiB/s : IOPS 24.1149 kiops : TOTAL
9.58796
>
> The io500.sh was run
>
> Running the C version of the benchmark now
>
> IO500 version io500-isc20
>
> <currently running …when I posted this question ….
>
> ***************************************************
>
> config-test1.ini (END)
>
> ***************************************************
>
> [global]
>
> datadir = ./out/
>
> resultdir = ./results/
>
> timestamp-resultdir = TRUE
>
> # Chose parameters that are very small for all benchmarks
>
> [debug]
>
> stonewall-time = 300 # for testing
>
> [ior-easy]
>
> transferSize = 2m
>
> blockSize = 102400m
>
> [mdtest-easy]
>
> API = POSIX
>
> # Files per proc
>
> n = 500000
>
> [ior-hard]
>
> API = POSIX
>
> # Number of segments10000000
>
> segmentCount = 400000
>
> [mdtest-hard]
>
> API = POSIX
>
> # Files per proc 1000000
>
> n = 40000
>
> [find]
>
> external-script = /mnt/beeond/io500-app/bin/pfind
>
> pfind-parallelize-single-dir-access-using-hashing = FALSE
>
> ***************************************************
>
> _______________________________________________
> IO-500 mailing list
> IO-500(a)vi4io.org
>
https://urldefense.com/v3/__https://www.vi4io.org/mailman/listinfo/io-500__…
> <
https://urldefense.com/v3/__https:/www.vi4io.org/mailman/listinfo/io-500__;…
_______________________________________________
IO-500 mailing list
IO-500(a)vi4io.org
https://urldefense.com/v3/__https://www.vi4io.org/mailman/listinfo/io-500__…
_______________________________________________
IO-500 mailing list
IO-500(a)vi4io.org
https://urldefense.com/v3/__https://www.vi4io.org/mailman/listinfo/io-500__…
--
Dr. Julian Kunkel
Lecturer, Department of Computer Science
+44 (0) 118 378 8218
https://urldefense.com/v3/__http://www.cs.reading.ac.uk/__;!!GqivPVa7Brio!K…
https://urldefense.com/v3/__https://hps.vi4io.org/__;!!GqivPVa7Brio!KR_eW6C…
PGP Fingerprint: 1468 1A86 A908 D77E B40F 45D6 2B15 73A5 9D39 A28E
_______________________________________________
IO-500 mailing list
IO-500(a)vi4io.org
https://urldefense.com/v3/__https://www.vi4io.org/mailman/listinfo/io-500__…