I made some progress over the weekend to troubleshoot why find phase was not working, but
I am not out of the woods. Appreciate if someone can confirm if I am on the right path
or acknowledge if the below are known issues and there are workarounds for them.
Thanks for your help. It’s a long email, but detailed to ensure, there is not a lot of
back-n-forth.
Here are some differences I found and workaround I had to use.
Issue1:
Io500 (C version) expects a field labelled: external-extra-args in the config.ini
file, but the non-C version (io500.sh) logic is looking for field labelled:
“external-args” (without “-extra-“) , see below line:
https://github.com/VI4IO/io500-app/blob/master/io500.sh#L229
io500_find_cmd_args="$(get_ini_param find external-args)"
less config-full.ini
[find]
# Set to an external script to perform the find phase
external-script =
# Extra arguments for external scripts
external-extra-args =
# Startup arguments for external scripts
external-mpi-args =
# Set the number of processes for pfind
nproc =
# Pfind queue length
pfind-queue-length = 10000
# Pfind Steal from next
pfind-steal-next = FALSE
# Parallelize the readdir by using hashing. Your system must support this!
pfind-parallelize-single-dir-access-using-hashing = FALSE
Temporary workaround: I changed the io500.sh code to look for:
io500_find_cmd_args="$(get_ini_param find external-extra-args)"
Issue2: I had to set io500_find_mpi to True in the io500.sh script to avoid getting this
error: “io500_find_mpi: unbound variable” , but I don’t know if there is a different way
to set the value for the C-version using config.ini file, can someone share how to pass
that value for the C-app version?
function setup_find {
io500_find_mpi="True"
Issue3: How do I validate that the parameters I am setting in the config.ini file are
been used at runtime. I set the following, but don’t see them below:
[find]
external-script = /mnt/beeond/io500-app/bin/pfind
#nproc = 30
pfind-queue-length = 2000
pfind-steal-next = TRUE
pfind-parallelize-single-dir-access-using-hashing = FALSE
io500 – C version hangs at the below command and I don’t see queue length, steal-next, etc
[find]
t_start = 2020-06-01 08:28:30
exe = /mnt/beeond/io500-app/bin/pfind ./out//2020.06.01-06.57.23-app -newer
./out//2020.06.01-06.57.23-app/timestampfile -size 3901c -name "*01*"
nproc = 1
Issue4: Manual workaround to make “find” phase work by setting some parameters in
io500.sh, but they are not passed to the C-version and hence that fails.
In io500.sh script:
function setup_find {
io500_find_mpi="True"
io500_find_cmd_args="-N -q 2000 -s $io500_stonewall_timer -r
$io500_result_dir/pfind_results"
Non-C version - success
[Starting] find
[Exec] mpiexec --allow-run-as-root -mca btl self -x UCX_TLS=rc,self,sm -x
HCOLL_ENABLE_MCAST_ALL=0 -mca coll_hcoll_enable 0 -x UCX_IB_TRAFFIC_CLASS=105 -x
UCX_IB_GID_INDEX=3 -n 30 -npernode 10 --hostfile /mnt/beeond/hostsfile.cn
/mnt/beeond/io500-app/bin/pfind ./out//2020.06.01-06.57.23-scr -newer
./out//2020.06.01-06.57.23-scr/timestampfile -size 3901c -name "*01*" -N -q 2000
-s 300 -r ./results//2020.06.01-06.57.23-scr/pfind_results
[Results] in ./results//2020.06.01-06.57.23-scr/find.txt.
[FIND] MATCHED 28170/15966876 in 44.6593 seconds
[RESULT] IOPS phase 3 find 357.520 kiops : time 44.66
seconds
io500 – C version hangs at the below command and I don’t see queue length, steal-next, etc
[find]
t_start = 2020-06-01 08:28:30
exe = /mnt/beeond/io500-app/bin/pfind ./out//2020.06.01-06.57.23-app -newer
./out//2020.06.01-06.57.23-app/timestampfile -size 3901c -name "*01*"
nproc = 1
Issue5: As you can see in issue#4, I am passing some parameters with bash variables, if
I do the same in the config.ini file, they will be passed as is without getting
interpreted by the bash script. How do I pass such variable for the C-version of io500 ?
Already tried the below and they are not interpreted when processed by io500 C-app
version.
In config.ini file:
external-extra-args = -s \$io500_stonewall_timer -r \$io500_result_dir/pfind_results
or
external-extra-args = -s $io500_stonewall_timer -r $io500_result_dir/pfind_results
[find]
t_start = 2020-05-31 11:43:52
exe = /mnt/beeond/io500-app/bin/pfind -s $io500_stonewall_timer -r
$io500_result_dir/pfind_results ./out//2020.05.31-10.52.56-app -newer
./out//2020.05.31-10.52.56-app/timestampfile -size 3901c -name "*01*"
nproc = 1
[find]
t_start = 2020-05-31 15:55:52
exe = /mnt/beeond/io500-app/bin/pfind -s \$io500_stonewall_timer -r
\$io500_result_dir/pfind_results ./out//2020.05.31-15.55.38-app -newer
./out//2020.05.31-15.55.38-app/timestampfile -size 3901c -name "*01*"
nproc = 1
This is a small test cluster
less 2020.06.01-06.57.23-scr/result_summary.txt
[RESULT] BW phase 1 ior_easy_write 6.062 GiB/s : time 362.17
seconds
[RESULT] IOPS phase 1 mdtest_easy_write 7.300 kiops : time 2054.93
seconds
[RESULT] BW phase 2 ior_hard_write 1.605 GiB/s : time 321.58
seconds
[RESULT] IOPS phase 2 mdtest_hard_write 3.173 kiops : time 304.69
seconds
[RESULT] IOPS phase 3 find 357.520 kiops : time 44.66
seconds
[RESULT] BW phase 3 ior_easy_read 8.269 GiB/s : time 265.51
seconds
[RESULT] IOPS phase 4 mdtest_easy_stat 144.149 kiops : time 104.06
seconds
[RESULT] BW phase 4 ior_hard_read 3.847 GiB/s : time 134.18
seconds
[RESULT] IOPS phase 5 mdtest_hard_stat 82.220 kiops : time 11.76
seconds
[RESULT] IOPS phase 6 mdtest_easy_delete 54.334 kiops : time 276.07
seconds
[RESULT] IOPS phase 7 mdtest_hard_read 22.822 kiops : time 42.37
seconds
[RESULT] IOPS phase 8 mdtest_hard_delete 8.042 kiops : time 123.97
seconds
[SCORE] Bandwidth 4.19424 GiB/s : IOPS 31.5378 kiops : TOTAL 11.5012
C-version app – partial result – which hangs at find:
[root@inst-5n58i-good-crow results]# less 2020.06.01-06.57.23-app/result.txt | egrep
"\[|score"
[ior-easy-write]
score = 6.136592
[mdtest-easy-write]
score = 41.913878
[timestamp]
[ior-hard-write]
score = 1.538194
[mdtest-hard-write]
score = 3.012750
[find]
[root@inst-5n58i-good-crow results]#
From: Pinkesh Valdria <pinkesh.valdria(a)oracle.com>
Date: Saturday, May 30, 2020 at 3:01 AM
To: Andreas Dilger <adilger(a)dilger.ca>
Cc: <io-500(a)vi4io.org>
Subject: Re: [IO-500] Io500 runs twice - is that expected starting 2020 ?
Thanks Andreas,
The C-app benchmark failed, I mean it never completed and there was no score at the end,
I waited for 8 hours before I did ctrl-c . It only has 4 results vs 12 results in the
first section. Are there special logs to see for the C version of the benchmark.
The below is a very small development cluster I am using until I figure out how to run
IO500 correctly.
[Leaving] datafiles in ./out//2020.05.29-17.31.34-scr
[Summary] Results files in ./results//2020.05.29-17.31.34-scr
[Summary] Data files in ./out//2020.05.29-17.31.34-scr
[RESULT] BW phase 1 ior_easy_write 6.188 GiB/s : time 357.44
seconds
[RESULT] BW phase 2 ior_hard_write 1.132 GiB/s : time 367.41
seconds
[RESULT] BW phase 3 ior_easy_read 8.090 GiB/s : time 273.37
seconds
[RESULT] BW phase 4 ior_hard_read 3.726 GiB/s : time 111.69
seconds
[RESULT] IOPS phase 1 mdtest_easy_write 4.263 kiops : time 3518.34
seconds
[RESULT] IOPS phase 2 mdtest_hard_write 2.953 kiops : time 303.52
seconds
[RESULT] IOPS phase 3 find 91.550 kiops : time 173.64
seconds
[RESULT] IOPS phase 4 mdtest_easy_stat 137.243 kiops : time 109.30
seconds
[RESULT] IOPS phase 5 mdtest_hard_stat 84.140 kiops : time 10.65
seconds
[RESULT] IOPS phase 6 mdtest_easy_delete 55.311 kiops : time 271.19
seconds
[RESULT] IOPS phase 7 mdtest_hard_read 21.778 kiops : time 41.16
seconds
[RESULT] IOPS phase 8 mdtest_hard_delete 7.133 kiops : time 129.24
seconds
[SCORE] Bandwidth 3.81212 GiB/s : IOPS 24.1149 kiops : TOTAL 9.58796
The io500.sh was run
Running the C version of the benchmark now
IO500 version io500-isc20
[RESULT] ior-easy-write 6.233210 GiB/s : time 359.216 seconds
[RESULT] mdtest-easy-write 3.750549 kIOPS : time 3999.448 seconds
[RESULT] ior-hard-write 1.415903 GiB/s : time 349.737 seconds
[RESULT] mdtest-hard-write 3.006432 kIOPS : time 305.006 seconds
^C
[root@inst-q7cdd-good-crow io500-app]#
From: Andreas Dilger <adilger(a)dilger.ca>
Date: Saturday, May 30, 2020 at 2:33 AM
To: Pinkesh Valdria <pinkesh.valdria(a)oracle.com>
Cc: <io-500(a)vi4io.org>
Subject: Re: [IO-500] Io500 runs twice - is that expected starting 2020 ?
Hi Pinkesh,
The dual runs of the IO500 benchmark for this list are intentional,
and documented in the README-ISC20.txt file in the source tree.
This is to allow comparison between the historical io500.sh script and the
new C application that runs the same IOR, mdtest, and find commands.
Please submit both results for ISC'20.
We wanted to be sure that the transition to the new C-app didn't
introduce any errors in the results. The need to run the benchmark
twice will hopefully be gone for the SC'20 list.
Cheers, Andreas(*)
* speaking on my own behalf and not on behalf of the IO500 board
On May 29, 2020, at 16:02, Pinkesh Valdria via IO-500 <io-500(a)vi4io.org> wrote:
Hello IO-500 experts,
I am trying to configure io500 . When I run it, it runs twice, first one is regular and
2nd one is called “Running the C version of the benchmark now”. Is it because I
misconfigured it or is it required to run both, starting 2020 ? My config*.ini file is
below.
[root@inst-q7cdd-good-crow io500-app]# ./io500.sh config-test1.ini
System: inst-q7cdd-good-crow
…..
Running the IO500 Benchmark now
[Creating] directories
…..
[Summary] Results files in ./results//2020.05.29-17.31.34-scr
[Summary] Data files in ./out//2020.05.29-17.31.34-scr
[RESULT] BW phase 1 ior_easy_write 6.188 GiB/s : time 357.44
seconds
[RESULT] BW phase 2 ior_hard_write 1.132 GiB/s : time 367.41
seconds
[RESULT] BW phase 3 ior_easy_read 8.090 GiB/s : time 273.37
seconds
[RESULT] BW phase 4 ior_hard_read 3.726 GiB/s : time 111.69
seconds
[RESULT] IOPS phase 1 mdtest_easy_write 4.263 kiops : time 3518.34
seconds
[RESULT] IOPS phase 2 mdtest_hard_write 2.953 kiops : time 303.52
seconds
[RESULT] IOPS phase 3 find 91.550 kiops : time 173.64
seconds
[RESULT] IOPS phase 4 mdtest_easy_stat 137.243 kiops : time 109.30
seconds
[RESULT] IOPS phase 5 mdtest_hard_stat 84.140 kiops : time 10.65
seconds
[RESULT] IOPS phase 6 mdtest_easy_delete 55.311 kiops : time 271.19
seconds
[RESULT] IOPS phase 7 mdtest_hard_read 21.778 kiops : time 41.16
seconds
[RESULT] IOPS phase 8 mdtest_hard_delete 7.133 kiops : time 129.24
seconds
[SCORE] Bandwidth 3.81212 GiB/s : IOPS 24.1149 kiops : TOTAL 9.58796
The io500.sh was run
Running the C version of the benchmark now
IO500 version io500-isc20
<currently running …when I posted this question ….
***************************************************
config-test1.ini (END)
***************************************************
[global]
datadir = ./out/
resultdir = ./results/
timestamp-resultdir = TRUE
# Chose parameters that are very small for all benchmarks
[debug]
stonewall-time = 300 # for testing
[ior-easy]
transferSize = 2m
blockSize = 102400m
[mdtest-easy]
API = POSIX
# Files per proc
n = 500000
[ior-hard]
API = POSIX
# Number of segments 10000000
segmentCount = 400000
[mdtest-hard]
API = POSIX
# Files per proc 1000000
n = 40000
[find]
external-script = /mnt/beeond/io500-app/bin/pfind
pfind-parallelize-single-dir-access-using-hashing = FALSE
***************************************************
_______________________________________________
IO-500 mailing list
IO-500(a)vi4io.org
https://www.vi4io.org/mailman/listinfo/io-500