I made some progress over the weekend to troubleshoot why find phase was not working, but I am not out of the woods.   Appreciate if someone can confirm if I am on the right path or acknowledge if the below are known issues and there are workarounds for them. 

 

Thanks for your help.  It’s a long email, but detailed to ensure, there is not a lot of back-n-forth. 

 

Here are some differences I found and workaround I had to use. 

 

Issue1: 

Io500 (C version) expects a field labelled:  external-extra-args    in the config.ini file,  but the non-C version (io500.sh) logic is looking for field labelled:  “external-args” (without “-extra-“)  , see below line:

https://github.com/VI4IO/io500-app/blob/master/io500.sh#L229

  io500_find_cmd_args="$(get_ini_param find external-args)"

 

less config-full.ini

[find]

# Set to an external script to perform the find phase

external-script =

# Extra arguments for external scripts

external-extra-args =

# Startup arguments for external scripts

external-mpi-args =

# Set the number of processes for pfind

nproc =

# Pfind queue length

pfind-queue-length = 10000

# Pfind Steal from next

pfind-steal-next = FALSE

# Parallelize the readdir by using hashing. Your system must support this!

pfind-parallelize-single-dir-access-using-hashing = FALSE

 

Temporary workaround:   I changed the io500.sh code to look for:    io500_find_cmd_args="$(get_ini_param find external-extra-args)"

 

 

Issue2:   I had to set io500_find_mpi to True in the io500.sh script to avoid getting this error: “io500_find_mpi: unbound variable”  , but I don’t know if there is a different way to set the value for the C-version using config.ini file,  can someone share how to pass that value for the C-app version? 

 

function setup_find {

  io500_find_mpi="True"

 

Issue3:  How do I validate that the parameters I am setting in the config.ini file are been used at runtime.   I set the following, but don’t see them below:

 

[find]

external-script = /mnt/beeond/io500-app/bin/pfind

#nproc = 30

pfind-queue-length = 2000

pfind-steal-next = TRUE

pfind-parallelize-single-dir-access-using-hashing = FALSE

 

io500 – C version hangs at the below command and I don’t see queue length, steal-next, etc

[find]

t_start         = 2020-06-01 08:28:30

exe             =  /mnt/beeond/io500-app/bin/pfind  ./out//2020.06.01-06.57.23-app -newer ./out//2020.06.01-06.57.23-app/timestampfile -size 3901c -name "*01*"

nproc           = 1

 

 

Issue4:   Manual workaround to make “find” phase work by setting some parameters in io500.sh, but they are not passed to the C-version and hence that fails. 

 

In io500.sh script: 

function setup_find {

  io500_find_mpi="True"

  io500_find_cmd_args="-N -q 2000 -s $io500_stonewall_timer -r $io500_result_dir/pfind_results"

 

 

Non-C version  - success

[Starting] find

[Exec] mpiexec --allow-run-as-root -mca btl self -x UCX_TLS=rc,self,sm -x HCOLL_ENABLE_MCAST_ALL=0 -mca coll_hcoll_enable 0 -x UCX_IB_TRAFFIC_CLASS=105 -x UCX_IB_GID_INDEX=3 -n 30 -npernode 10 --hostfile /mnt/beeond/hostsfile.cn /mnt/beeond/io500-app/bin/pfind ./out//2020.06.01-06.57.23-scr -newer ./out//2020.06.01-06.57.23-scr/timestampfile -size 3901c -name "*01*" -N -q 2000 -s 300 -r ./results//2020.06.01-06.57.23-scr/pfind_results

[Results] in ./results//2020.06.01-06.57.23-scr/find.txt.

[FIND] MATCHED 28170/15966876 in 44.6593 seconds

[RESULT] IOPS phase 3                      find              357.520 kiops : time  44.66 seconds

 

io500 – C version hangs at the below command and I don’t see queue length, steal-next, etc

[find]

t_start         = 2020-06-01 08:28:30

exe             =  /mnt/beeond/io500-app/bin/pfind  ./out//2020.06.01-06.57.23-app -newer ./out//2020.06.01-06.57.23-app/timestampfile -size 3901c -name "*01*"

nproc           = 1

 

 

 

Issue5:   As you can see in issue#4,  I am passing some parameters with bash variables, if I do the same in the config.ini file,   they will be passed as is without getting interpreted by the bash script.  How do I pass such variable for the C-version of io500 ?

 

Already tried the below and they are not interpreted when processed by io500 C-app version.

In config.ini file:

external-extra-args =  -s \$io500_stonewall_timer -r \$io500_result_dir/pfind_results

or

external-extra-args =  -s $io500_stonewall_timer -r $io500_result_dir/pfind_results

 

 

[find]

t_start         = 2020-05-31 11:43:52

exe             =  /mnt/beeond/io500-app/bin/pfind -s $io500_stonewall_timer -r $io500_result_dir/pfind_results ./out//2020.05.31-10.52.56-app -newer ./out//2020.05.31-10.52.56-app/timestampfile -size 3901c -name "*01*"

nproc           = 1

 

[find]

t_start         = 2020-05-31 15:55:52

exe             =  /mnt/beeond/io500-app/bin/pfind -s \$io500_stonewall_timer -r \$io500_result_dir/pfind_results ./out//2020.05.31-15.55.38-app -newer ./out//2020.05.31-15.55.38-app/timestampfile -size 3901c -name "*01*"

nproc           = 1

 

 

This is a small test cluster

 

less 2020.06.01-06.57.23-scr/result_summary.txt

[RESULT] BW   phase 1            ior_easy_write                6.062 GiB/s : time 362.17 seconds

[RESULT] IOPS phase 1         mdtest_easy_write                7.300 kiops : time 2054.93 seconds

[RESULT] BW   phase 2            ior_hard_write                1.605 GiB/s : time 321.58 seconds

[RESULT] IOPS phase 2         mdtest_hard_write                3.173 kiops : time 304.69 seconds

[RESULT] IOPS phase 3                      find              357.520 kiops : time  44.66 seconds

[RESULT] BW   phase 3             ior_easy_read                8.269 GiB/s : time 265.51 seconds

[RESULT] IOPS phase 4          mdtest_easy_stat              144.149 kiops : time 104.06 seconds

[RESULT] BW   phase 4             ior_hard_read                3.847 GiB/s : time 134.18 seconds

[RESULT] IOPS phase 5          mdtest_hard_stat               82.220 kiops : time  11.76 seconds

[RESULT] IOPS phase 6        mdtest_easy_delete               54.334 kiops : time 276.07 seconds

[RESULT] IOPS phase 7          mdtest_hard_read               22.822 kiops : time  42.37 seconds

[RESULT] IOPS phase 8        mdtest_hard_delete                8.042 kiops : time 123.97 seconds

[SCORE] Bandwidth 4.19424 GiB/s : IOPS 31.5378 kiops : TOTAL 11.5012

 

C-version app – partial result – which hangs at find:

[root@inst-5n58i-good-crow results]# less 2020.06.01-06.57.23-app/result.txt  | egrep "\[|score"

[ior-easy-write]

score           = 6.136592

[mdtest-easy-write]

score           = 41.913878

[timestamp]

[ior-hard-write]

score           = 1.538194

[mdtest-hard-write]

score           = 3.012750

[find]

[root@inst-5n58i-good-crow results]#

 

 

 

From: Pinkesh Valdria <pinkesh.valdria@oracle.com>
Date: Saturday, May 30, 2020 at 3:01 AM
To: Andreas Dilger <adilger@dilger.ca>
Cc: <io-500@vi4io.org>
Subject: Re: [IO-500] Io500 runs twice - is that expected starting 2020 ?

 

Thanks Andreas, 

 

The C-app benchmark failed,  I mean it never completed and there was no score at the end,  I waited for 8 hours before I did ctrl-c .    It only has 4 results vs 12 results in the first section.    Are there special logs to see for the C version of the benchmark. 

 

The below is a very small development cluster I am using until I figure out how to run IO500 correctly.   

[Leaving] datafiles in ./out//2020.05.29-17.31.34-scr

[Summary] Results files in ./results//2020.05.29-17.31.34-scr

[Summary] Data files in ./out//2020.05.29-17.31.34-scr

[RESULT] BW   phase 1            ior_easy_write                6.188 GiB/s : time 357.44 seconds

[RESULT] BW   phase 2            ior_hard_write                1.132 GiB/s : time 367.41 seconds

[RESULT] BW   phase 3             ior_easy_read                8.090 GiB/s : time 273.37 seconds

[RESULT] BW   phase 4             ior_hard_read                3.726 GiB/s : time 111.69 seconds

[RESULT] IOPS phase 1         mdtest_easy_write                4.263 kiops : time 3518.34 seconds

[RESULT] IOPS phase 2         mdtest_hard_write                2.953 kiops : time 303.52 seconds

[RESULT] IOPS phase 3                      find               91.550 kiops : time 173.64 seconds

[RESULT] IOPS phase 4          mdtest_easy_stat              137.243 kiops : time 109.30 seconds

[RESULT] IOPS phase 5          mdtest_hard_stat               84.140 kiops : time  10.65 seconds

[RESULT] IOPS phase 6        mdtest_easy_delete               55.311 kiops : time 271.19 seconds

[RESULT] IOPS phase 7          mdtest_hard_read               21.778 kiops : time  41.16 seconds

[RESULT] IOPS phase 8        mdtest_hard_delete                7.133 kiops : time 129.24 seconds

[SCORE] Bandwidth 3.81212 GiB/s : IOPS 24.1149 kiops : TOTAL 9.58796

The io500.sh was run

 

Running the C version of the benchmark now

IO500 version io500-isc20

[RESULT]       ior-easy-write        6.233210 GiB/s  : time 359.216 seconds

[RESULT]    mdtest-easy-write        3.750549 kIOPS : time 3999.448 seconds

[RESULT]       ior-hard-write        1.415903 GiB/s  : time 349.737 seconds

[RESULT]    mdtest-hard-write        3.006432 kIOPS : time 305.006 seconds

^C

[root@inst-q7cdd-good-crow io500-app]#

 

 

 

 

 

 

From: Andreas Dilger <adilger@dilger.ca>
Date: Saturday, May 30, 2020 at 2:33 AM
To: Pinkesh Valdria <pinkesh.valdria@oracle.com>
Cc: <io-500@vi4io.org>
Subject: Re: [IO-500] Io500 runs twice - is that expected starting 2020 ?

 

Hi Pinkesh,

The dual runs of the IO500 benchmark for this list are intentional,

and documented in the README-ISC20.txt file in the source tree.

This is to allow comparison between the historical io500.sh script and the

new C application that runs the same IOR, mdtest, and find commands.

Please submit both results for ISC'20. 

 

We wanted to be sure that the transition to the new C-app didn't

introduce any errors in the results. The need to run the benchmark

twice will hopefully be gone for the SC'20 list. 

 

Cheers, Andreas(*)

* speaking on my own behalf and not on behalf of the IO500 board




On May 29, 2020, at 16:02, Pinkesh Valdria via IO-500 <io-500@vi4io.org> wrote:



Hello IO-500 experts, 

 

I am trying to configure io500 .  When I run it,  it runs twice,  first one is regular and 2nd one is called “Running the C version of the benchmark now”.   Is it because I misconfigured it or is it required to run both, starting 2020 ?     My config*.ini file is below. 

 

 

[root@inst-q7cdd-good-crow io500-app]# ./io500.sh config-test1.ini

System:  inst-q7cdd-good-crow

…..

Running the IO500 Benchmark now

[Creating] directories

…..

[Summary] Results files in ./results//2020.05.29-17.31.34-scr

[Summary] Data files in ./out//2020.05.29-17.31.34-scr

[RESULT] BW   phase 1            ior_easy_write                6.188 GiB/s : time 357.44 seconds

[RESULT] BW   phase 2            ior_hard_write                1.132 GiB/s : time 367.41 seconds

[RESULT] BW   phase 3             ior_easy_read                8.090 GiB/s : time 273.37 seconds

[RESULT] BW   phase 4             ior_hard_read                3.726 GiB/s : time 111.69 seconds

[RESULT] IOPS phase 1         mdtest_easy_write                4.263 kiops : time 3518.34 seconds

[RESULT] IOPS phase 2         mdtest_hard_write                2.953 kiops : time 303.52 seconds

[RESULT] IOPS phase 3                      find               91.550 kiops : time 173.64 seconds

[RESULT] IOPS phase 4          mdtest_easy_stat              137.243 kiops : time 109.30 seconds

[RESULT] IOPS phase 5          mdtest_hard_stat               84.140 kiops : time  10.65 seconds

[RESULT] IOPS phase 6        mdtest_easy_delete               55.311 kiops : time 271.19 seconds

[RESULT] IOPS phase 7          mdtest_hard_read               21.778 kiops : time  41.16 seconds

[RESULT] IOPS phase 8        mdtest_hard_delete                7.133 kiops : time 129.24 seconds

[SCORE] Bandwidth 3.81212 GiB/s : IOPS 24.1149 kiops : TOTAL 9.58796

The io500.sh was run

 

Running the C version of the benchmark now

IO500 version io500-isc20

<currently running …when I posted this question ….

 

 

***************************************************

config-test1.ini (END)

***************************************************

[global]

datadir = ./out/

resultdir = ./results/

 

timestamp-resultdir = TRUE

 

# Chose parameters that are very small for all benchmarks

 

[debug]

stonewall-time = 300 # for testing

 

[ior-easy]

transferSize = 2m

blockSize = 102400m

 

[mdtest-easy]

API = POSIX

# Files per proc

n = 500000

 

[ior-hard]

API = POSIX

# Number of segments  10000000

segmentCount = 400000

 

[mdtest-hard]

API = POSIX

# Files per proc 1000000

n = 40000

 

[find]

external-script = /mnt/beeond/io500-app/bin/pfind

pfind-parallelize-single-dir-access-using-hashing = FALSE

 

***************************************************

 

_______________________________________________
IO-500 mailing list
IO-500@vi4io.org
https://www.vi4io.org/mailman/listinfo/io-500