Find doesn't respect stonewall, to my knowledge.
On Jun 1, 2020, at 10:23 AM, John Bent
<johnbent@gmail.com<mailto:johnbent@gmail.com>> wrote:
If it was single-threaded, then won't it hit the stonewall and exit right at 300? The
super-long ones are due to stragglers which are only possible when nproc>1, no?
Thanks for the very long and detailed report. I'm happy to say that the extra month
for submissions should allow us time to help get you sorted out. I'm not sure, but I
wonder if you might have better success with these issues if you open them as individual
'Issues' at our repo:
https://github.com/VI4IO/io-500-dev/issues<https://urldefense.com/v3/_...
Thanks,
John(*)
* speaking as individual and not for committee
On Mon, Jun 1, 2020 at 7:59 AM Carlile, Ken via IO-500
<io-500@vi4io.org<mailto:io-500@vi4io.org>> wrote:
There is definitely something up with your pfind in the C version. I used similar
parameters in my ini file and it did apply them correctly. One thing I notice in your
output is that for the C version, for whatever reason, it was setting nproc=1. This means
that it was running entirely single threaded, so it probably wasn't hung, it was just
going to take 10 years to run.
My io500.sh is vanilla except for the mpi arguments (of course), and my ini is fairly
clean without any trickiness. I did have to remove the nproc parameter because it
wasn't respected by the bash version.
--Ken
On Jun 1, 2020, at 9:47 AM, Pinkesh Valdria via IO-500
<io-500@vi4io.org<mailto:io-500@vi4io.org>> wrote:
I made some progress over the weekend to troubleshoot why find phase was not working, but
I am not out of the woods. Appreciate if someone can confirm if I am on the right path
or acknowledge if the below are known issues and there are workarounds for them.
Thanks for your help. It’s a long email, but detailed to ensure, there is not a lot of
back-n-forth.
Here are some differences I found and workaround I had to use.
Issue1:
Io500 (C version) expects a field labelled: external-extra-args in the config.ini
file, but the non-C version (io500.sh) logic is looking for field labelled:
“external-args” (without “-extra-“) , see below line:
https://github.com/VI4IO/io500-app/blob/master/io500.sh#L229<https://u...
io500_find_cmd_args="$(get_ini_param find external-args)"
less config-full.ini
[find]
# Set to an external script to perform the find phase
external-script =
# Extra arguments for external scripts
external-extra-args =
# Startup arguments for external scripts
external-mpi-args =
# Set the number of processes for pfind
nproc =
# Pfind queue length
pfind-queue-length = 10000
# Pfind Steal from next
pfind-steal-next = FALSE
# Parallelize the readdir by using hashing. Your system must support this!
pfind-parallelize-single-dir-access-using-hashing = FALSE
Temporary workaround: I changed the io500.sh code to look for:
io500_find_cmd_args="$(get_ini_param find external-extra-args)"
Issue2: I had to set io500_find_mpi to True in the io500.sh script to avoid getting this
error: “io500_find_mpi: unbound variable” , but I don’t know if there is a different way
to set the value for the C-version using config.ini file, can someone share how to pass
that value for the C-app version?
function setup_find {
io500_find_mpi="True"
Issue3: How do I validate that the parameters I am setting in the config.ini file are
been used at runtime. I set the following, but don’t see them below:
[find]
external-script = /mnt/beeond/io500-app/bin/pfind
#nproc = 30
pfind-queue-length = 2000
pfind-steal-next = TRUE
pfind-parallelize-single-dir-access-using-hashing = FALSE
io500 – C version hangs at the below command and I don’t see queue length, steal-next,
etc
[find]
t_start = 2020-06-01 08:28:30
exe = /mnt/beeond/io500-app/bin/pfind ./out//2020.06.01-06.57.23-app -newer
./out//2020.06.01-06.57.23-app/timestampfile -size 3901c -name "*01*"
nproc = 1
Issue4: Manual workaround to make “find” phase work by setting some parameters in
io500.sh, but they are not passed to the C-version and hence that fails.
In io500.sh script:
function setup_find {
io500_find_mpi="True"
io500_find_cmd_args="-N -q 2000 -s $io500_stonewall_timer -r
$io500_result_dir/pfind_results"
Non-C version - success
[Starting] find
[Exec] mpiexec --allow-run-as-root -mca btl self -x UCX_TLS=rc,self,sm -x
HCOLL_ENABLE_MCAST_ALL=0 -mca coll_hcoll_enable 0 -x UCX_IB_TRAFFIC_CLASS=105 -x
UCX_IB_GID_INDEX=3 -n 30 -npernode 10 --hostfile
/mnt/beeond/hostsfile.cn<https://urldefense.com/v3/__http://hostsfile.cn__;!!Eh6p8Q!UAD1ARGqLl7x0aK3WtuJ3lze4kCBzJZJ6XUEX0MSHczeulngU5g7-rSo5l1AbKkH4ydb$>
/mnt/beeond/io500-app/bin/pfind ./out//2020.06.01-06.57.23-scr -newer
./out//2020.06.01-06.57.23-scr/timestampfile -size 3901c -name "*01*" -N -q 2000
-s 300 -r ./results//2020.06.01-06.57.23-scr/pfind_results
[Results] in ./results//2020.06.01-06.57.23-scr/find.txt.
[FIND] MATCHED 28170/15966876 in 44.6593 seconds
[RESULT] IOPS phase 3 find 357.520 kiops : time 44.66
seconds
io500 – C version hangs at the below command and I don’t see queue length, steal-next,
etc
[find]
t_start = 2020-06-01 08:28:30
exe = /mnt/beeond/io500-app/bin/pfind ./out//2020.06.01-06.57.23-app -newer
./out//2020.06.01-06.57.23-app/timestampfile -size 3901c -name "*01*"
nproc = 1
Issue5: As you can see in issue#4, I am passing some parameters with bash variables, if
I do the same in the config.ini file, they will be passed as is without getting
interpreted by the bash script. How do I pass such variable for the C-version of io500
?
Already tried the below and they are not interpreted when processed by io500 C-app
version.
In config.ini file:
external-extra-args = -s \$io500_stonewall_timer -r \$io500_result_dir/pfind_results
or
external-extra-args = -s $io500_stonewall_timer -r $io500_result_dir/pfind_results
[find]
t_start = 2020-05-31 11:43:52
exe = /mnt/beeond/io500-app/bin/pfind -s $io500_stonewall_timer -r
$io500_result_dir/pfind_results ./out//2020.05.31-10.52.56-app -newer
./out//2020.05.31-10.52.56-app/timestampfile -size 3901c -name "*01*"
nproc = 1
[find]
t_start = 2020-05-31 15:55:52
exe = /mnt/beeond/io500-app/bin/pfind -s \$io500_stonewall_timer -r
\$io500_result_dir/pfind_results ./out//2020.05.31-15.55.38-app -newer
./out//2020.05.31-15.55.38-app/timestampfile -size 3901c -name "*01*"
nproc = 1
This is a small test cluster
less 2020.06.01-06.57.23-scr/result_summary.txt
[RESULT] BW phase 1 ior_easy_write 6.062 GiB/s : time 362.17
seconds
[RESULT] IOPS phase 1 mdtest_easy_write 7.300 kiops : time 2054.93
seconds
[RESULT] BW phase 2 ior_hard_write 1.605 GiB/s : time 321.58
seconds
[RESULT] IOPS phase 2 mdtest_hard_write 3.173 kiops : time 304.69
seconds
[RESULT] IOPS phase 3 find 357.520 kiops : time 44.66
seconds
[RESULT] BW phase 3 ior_easy_read 8.269 GiB/s : time 265.51
seconds
[RESULT] IOPS phase 4 mdtest_easy_stat 144.149 kiops : time 104.06
seconds
[RESULT] BW phase 4 ior_hard_read 3.847 GiB/s : time 134.18
seconds
[RESULT] IOPS phase 5 mdtest_hard_stat 82.220 kiops : time 11.76
seconds
[RESULT] IOPS phase 6 mdtest_easy_delete 54.334 kiops : time 276.07
seconds
[RESULT] IOPS phase 7 mdtest_hard_read 22.822 kiops : time 42.37
seconds
[RESULT] IOPS phase 8 mdtest_hard_delete 8.042 kiops : time 123.97
seconds
[SCORE] Bandwidth 4.19424 GiB/s : IOPS 31.5378 kiops : TOTAL 11.5012
C-version app – partial result – which hangs at find:
[root@inst-5n58i-good-crow results]# less 2020.06.01-06.57.23-app/result.txt | egrep
"\[|score"
[ior-easy-write]
score = 6.136592
[mdtest-easy-write]
score = 41.913878
[timestamp]
[ior-hard-write]
score = 1.538194
[mdtest-hard-write]
score = 3.012750
[find]
[root@inst-5n58i-good-crow results]#
From: Pinkesh Valdria
<pinkesh.valdria@oracle.com<mailto:pinkesh.valdria@oracle.com>>
Date: Saturday, May 30, 2020 at 3:01 AM
To: Andreas Dilger <adilger@dilger.ca<mailto:adilger@dilger.ca>>
Cc: <io-500@vi4io.org<mailto:io-500@vi4io.org>>
Subject: Re: [IO-500] Io500 runs twice - is that expected starting 2020 ?
Thanks Andreas,
The C-app benchmark failed, I mean it never completed and there was no score at the end,
I waited for 8 hours before I did ctrl-c . It only has 4 results vs 12 results in the
first section. Are there special logs to see for the C version of the benchmark.
The below is a very small development cluster I am using until I figure out how to run
IO500 correctly.
[Leaving] datafiles in ./out//2020.05.29-17.31.34-scr
[Summary] Results files in ./results//2020.05.29-17.31.34-scr
[Summary] Data files in ./out//2020.05.29-17.31.34-scr
[RESULT] BW phase 1 ior_easy_write 6.188 GiB/s : time 357.44
seconds
[RESULT] BW phase 2 ior_hard_write 1.132 GiB/s : time 367.41
seconds
[RESULT] BW phase 3 ior_easy_read 8.090 GiB/s : time 273.37
seconds
[RESULT] BW phase 4 ior_hard_read 3.726 GiB/s : time 111.69
seconds
[RESULT] IOPS phase 1 mdtest_easy_write 4.263 kiops : time 3518.34
seconds
[RESULT] IOPS phase 2 mdtest_hard_write 2.953 kiops : time 303.52
seconds
[RESULT] IOPS phase 3 find 91.550 kiops : time 173.64
seconds
[RESULT] IOPS phase 4 mdtest_easy_stat 137.243 kiops : time 109.30
seconds
[RESULT] IOPS phase 5 mdtest_hard_stat 84.140 kiops : time 10.65
seconds
[RESULT] IOPS phase 6 mdtest_easy_delete 55.311 kiops : time 271.19
seconds
[RESULT] IOPS phase 7 mdtest_hard_read 21.778 kiops : time 41.16
seconds
[RESULT] IOPS phase 8 mdtest_hard_delete 7.133 kiops : time 129.24
seconds
[SCORE] Bandwidth 3.81212 GiB/s : IOPS 24.1149 kiops : TOTAL 9.58796
The io500.sh was run
Running the C version of the benchmark now
IO500 version io500-isc20
[RESULT] ior-easy-write 6.233210 GiB/s : time 359.216 seconds
[RESULT] mdtest-easy-write 3.750549 kIOPS : time 3999.448 seconds
[RESULT] ior-hard-write 1.415903 GiB/s : time 349.737 seconds
[RESULT] mdtest-hard-write 3.006432 kIOPS : time 305.006 seconds
^C
[root@inst-q7cdd-good-crow io500-app]#
From: Andreas Dilger <adilger@dilger.ca<mailto:adilger@dilger.ca>>
Date: Saturday, May 30, 2020 at 2:33 AM
To: Pinkesh Valdria
<pinkesh.valdria@oracle.com<mailto:pinkesh.valdria@oracle.com>>
Cc: <io-500@vi4io.org<mailto:io-500@vi4io.org>>
Subject: Re: [IO-500] Io500 runs twice - is that expected starting 2020 ?
Hi Pinkesh,
The dual runs of the IO500 benchmark for this list are intentional,
and documented in the README-ISC20.txt file in the source tree.
This is to allow comparison between the historical io500.sh script and the
new C application that runs the same IOR, mdtest, and find commands.
Please submit both results for ISC'20.
We wanted to be sure that the transition to the new C-app didn't
introduce any errors in the results. The need to run the benchmark
twice will hopefully be gone for the SC'20 list.
Cheers, Andreas(*)
* speaking on my own behalf and not on behalf of the IO500 board
On May 29, 2020, at 16:02, Pinkesh Valdria via IO-500
<io-500@vi4io.org<mailto:io-500@vi4io.org>> wrote:
Hello IO-500 experts,
I am trying to configure io500 . When I run it, it runs twice, first one is regular and
2nd one is called “Running the C version of the benchmark now”. Is it because I
misconfigured it or is it required to run both, starting 2020 ? My config*.ini file is
below.
[root@inst-q7cdd-good-crow io500-app]# ./io500.sh config-test1.ini
System: inst-q7cdd-good-crow
…..
Running the IO500 Benchmark now
[Creating] directories
…..
[Summary] Results files in ./results//2020.05.29-17.31.34-scr
[Summary] Data files in ./out//2020.05.29-17.31.34-scr
[RESULT] BW phase 1 ior_easy_write 6.188 GiB/s : time 357.44
seconds
[RESULT] BW phase 2 ior_hard_write 1.132 GiB/s : time 367.41
seconds
[RESULT] BW phase 3 ior_easy_read 8.090 GiB/s : time 273.37
seconds
[RESULT] BW phase 4 ior_hard_read 3.726 GiB/s : time 111.69
seconds
[RESULT] IOPS phase 1 mdtest_easy_write 4.263 kiops : time 3518.34
seconds
[RESULT] IOPS phase 2 mdtest_hard_write 2.953 kiops : time 303.52
seconds
[RESULT] IOPS phase 3 find 91.550 kiops : time 173.64
seconds
[RESULT] IOPS phase 4 mdtest_easy_stat 137.243 kiops : time 109.30
seconds
[RESULT] IOPS phase 5 mdtest_hard_stat 84.140 kiops : time 10.65
seconds
[RESULT] IOPS phase 6 mdtest_easy_delete 55.311 kiops : time 271.19
seconds
[RESULT] IOPS phase 7 mdtest_hard_read 21.778 kiops : time 41.16
seconds
[RESULT] IOPS phase 8 mdtest_hard_delete 7.133 kiops : time 129.24
seconds
[SCORE] Bandwidth 3.81212 GiB/s : IOPS 24.1149 kiops : TOTAL 9.58796
The io500.sh was run
Running the C version of the benchmark now
IO500 version io500-isc20
<currently running …when I posted this question ….
***************************************************
config-test1.ini (END)
***************************************************
[global]
datadir = ./out/
resultdir = ./results/
timestamp-resultdir = TRUE
# Chose parameters that are very small for all benchmarks
[debug]
stonewall-time = 300 # for testing
[ior-easy]
transferSize = 2m
blockSize = 102400m
[mdtest-easy]
API = POSIX
# Files per proc
n = 500000
[ior-hard]
API = POSIX
# Number of segments 10000000
segmentCount = 400000
[mdtest-hard]
API = POSIX
# Files per proc 1000000
n = 40000
[find]
external-script = /mnt/beeond/io500-app/bin/pfind
pfind-parallelize-single-dir-access-using-hashing = FALSE
***************************************************
_______________________________________________
IO-500 mailing list
IO-500@vi4io.org<mailto:IO-500@vi4io.org>
https://www.vi4io.org/mailman/listinfo/io-500<https://urldefense.com/v...
_______________________________________________
IO-500 mailing list
IO-500@vi4io.org<mailto:IO-500@vi4io.org>
https://urldefense.com/v3/__https://www.vi4io.org/mailman/listinfo/io-500...
_______________________________________________
IO-500 mailing list
IO-500@vi4io.org<mailto:IO-500@vi4io.org>
https://www.vi4io.org/mailman/listinfo/io-500<https://urldefense.com/v...