This is my ini section relating to pfind: 

 44 [find]
 45 t_start         = 2020-06-01 07:10:16
 46 exe             = ./pfind ./datadir/2020.06.01-06.12.29-app -newer ./datadir/2020.06.01-06.12.29-app/timestampfile -size 3901c -name *01* -C -N -H 1 -q 20000
 47 found           = 12236
 48 total-files     = 44775906
 49 score           = 8.283603
 50 t_delta         = 5405.4574
 51 t_end           = 2020-06-01 08:40:21

and my ini file:

 55 [find]
 56 # Pfind Steal from next
 57 pfind-steal-next = TRUE
 58 # Pfind queue length
 59 pfind-queue-length = 20000
 60 # Pfind with hashing
 61 pfind-parallelize-single-dir-access-using-hashing = TRUE
 62 # Set the number of processes for pfind
 63 #nproc = 900

I know that the C version can be particularly sensitive to the syntax in the settings, especially the booleans. 

--Ken

On Jun 1, 2020, at 9:58 AM, Carlile, Ken via IO-500 <io-500@vi4io.org> wrote:

There is definitely something up with your pfind in the C version. I used similar parameters in my ini file and it did apply them correctly. One thing I notice in your output is that for the C version, for whatever reason, it was setting nproc=1. This means that it was running entirely single threaded, so it probably wasn't hung, it was just going to take 10 years to run. 

My io500.sh is vanilla except for the mpi arguments (of course), and my ini is fairly clean without any trickiness. I did have to remove the nproc parameter because it wasn't respected by the bash version. 

--Ken

On Jun 1, 2020, at 9:47 AM, Pinkesh Valdria via IO-500 <io-500@vi4io.org> wrote:


I made some progress over the weekend to troubleshoot why find phase was not working, but I am not out of the woods.   Appreciate if someone can confirm if I am on the right path or acknowledge if the below are known issues and there are workarounds for them.  
 
Thanks for your help.  It’s a long email, but detailed to ensure, there is not a lot of back-n-forth. 
 
Here are some differences I found and workaround I had to use. 
 
Issue1:  
Io500 (C version) expects a field labelled:  external-extra-args    in the config.ini file,  but the non-C version (io500.sh) logic is looking for field labelled:  “external-args” (without “-extra-“)  , see below line:
  io500_find_cmd_args="$(get_ini_param find external-args)"
 
less config-full.ini
[find]
# Set to an external script to perform the find phase
external-script =
# Extra arguments for external scripts
external-extra-args =
# Startup arguments for external scripts
external-mpi-args =
# Set the number of processes for pfind
nproc =
# Pfind queue length
pfind-queue-length = 10000
# Pfind Steal from next
pfind-steal-next = FALSE
# Parallelize the readdir by using hashing. Your system must support this!
pfind-parallelize-single-dir-access-using-hashing = FALSE
 
Temporary workaround:   I changed the io500.sh code to look for:    io500_find_cmd_args="$(get_ini_param find external-extra-args)"
 
 
Issue2:   I had to set io500_find_mpi to True in the io500.sh script to avoid getting this error: “io500_find_mpi: unbound variable”  , but I don’t know if there is a different way to set the value for the C-version using config.ini file,  can someone share how to pass that value for the C-app version?  
 
function setup_find {
  io500_find_mpi="True"
 
Issue3:  How do I validate that the parameters I am setting in the config.ini file are been used at runtime.   I set the following, but don’t see them below:
 
[find]
external-script = /mnt/beeond/io500-app/bin/pfind
#nproc = 30
pfind-queue-length = 2000
pfind-steal-next = TRUE
pfind-parallelize-single-dir-access-using-hashing = FALSE
 
io500 – C version hangs at the below command and I don’t see queue length, steal-next, etc
[find]
t_start         = 2020-06-01 08:28:30
exe             =  /mnt/beeond/io500-app/bin/pfind  ./out//2020.06.01-06.57.23-app -newer ./out//2020.06.01-06.57.23-app/timestampfile -size 3901c -name "*01*"
nproc           = 1
 
 
Issue4:   Manual workaround to make “find” phase work by setting some parameters in io500.sh, but they are not passed to the C-version and hence that fails. 
 
In io500.sh script:  
function setup_find {
  io500_find_mpi="True"
  io500_find_cmd_args="-N -q 2000 -s $io500_stonewall_timer -r $io500_result_dir/pfind_results"
 
 
Non-C version  - success 
[Starting] find
[Exec] mpiexec --allow-run-as-root -mca btl self -x UCX_TLS=rc,self,sm -x HCOLL_ENABLE_MCAST_ALL=0 -mca coll_hcoll_enable 0 -x UCX_IB_TRAFFIC_CLASS=105 -x UCX_IB_GID_INDEX=3 -n 30 -npernode 10 --hostfile /mnt/beeond/hostsfile.cn /mnt/beeond/io500-app/bin/pfind ./out//2020.06.01-06.57.23-scr -newer ./out//2020.06.01-06.57.23-scr/timestampfile -size 3901c -name "*01*" -N -q 2000 -s 300 -r ./results//2020.06.01-06.57.23-scr/pfind_results
[Results] in ./results//2020.06.01-06.57.23-scr/find.txt.
[FIND] MATCHED 28170/15966876 in 44.6593 seconds
[RESULT] IOPS phase 3                      find              357.520 kiops : time  44.66 seconds
 
io500 – C version hangs at the below command and I don’t see queue length, steal-next, etc
[find]
t_start         = 2020-06-01 08:28:30
exe             =  /mnt/beeond/io500-app/bin/pfind  ./out//2020.06.01-06.57.23-app -newer ./out//2020.06.01-06.57.23-app/timestampfile -size 3901c -name "*01*"
nproc           = 1
 
 
 
Issue5:   As you can see in issue#4,  I am passing some parameters with bash variables, if I do the same in the config.ini file,   they will be passed as is without getting interpreted by the bash script.  How do I pass such variable for the C-version of io500 ?
 
Already tried the below and they are not interpreted when processed by io500 C-app version.
In config.ini file: 
external-extra-args =  -s \$io500_stonewall_timer -r \$io500_result_dir/pfind_results
or
external-extra-args =  -s $io500_stonewall_timer -r $io500_result_dir/pfind_results
 
 
[find]
t_start         = 2020-05-31 11:43:52
exe             =  /mnt/beeond/io500-app/bin/pfind -s $io500_stonewall_timer -r $io500_result_dir/pfind_results ./out//2020.05.31-10.52.56-app -newer ./out//2020.05.31-10.52.56-app/timestampfile -size 3901c -name "*01*"
nproc           = 1
 
[find]
t_start         = 2020-05-31 15:55:52
exe             =  /mnt/beeond/io500-app/bin/pfind -s \$io500_stonewall_timer -r \$io500_result_dir/pfind_results ./out//2020.05.31-15.55.38-app -newer ./out//2020.05.31-15.55.38-app/timestampfile -size 3901c -name "*01*"
nproc           = 1
 
 
This is a small test cluster 
 
less 2020.06.01-06.57.23-scr/result_summary.txt
[RESULT] BW   phase 1            ior_easy_write                6.062 GiB/s : time 362.17 seconds
[RESULT] IOPS phase 1         mdtest_easy_write                7.300 kiops : time 2054.93 seconds
[RESULT] BW   phase 2            ior_hard_write                1.605 GiB/s : time 321.58 seconds
[RESULT] IOPS phase 2         mdtest_hard_write                3.173 kiops : time 304.69 seconds
[RESULT] IOPS phase 3                      find              357.520 kiops : time  44.66 seconds
[RESULT] BW   phase 3             ior_easy_read                8.269 GiB/s : time 265.51 seconds
[RESULT] IOPS phase 4          mdtest_easy_stat              144.149 kiops : time 104.06 seconds
[RESULT] BW   phase 4             ior_hard_read                3.847 GiB/s : time 134.18 seconds
[RESULT] IOPS phase 5          mdtest_hard_stat               82.220 kiops : time  11.76 seconds
[RESULT] IOPS phase 6        mdtest_easy_delete               54.334 kiops : time 276.07 seconds
[RESULT] IOPS phase 7          mdtest_hard_read               22.822 kiops : time  42.37 seconds
[RESULT] IOPS phase 8        mdtest_hard_delete                8.042 kiops : time 123.97 seconds
[SCORE] Bandwidth 4.19424 GiB/s : IOPS 31.5378 kiops : TOTAL 11.5012
 
C-version app – partial result – which hangs at find:
[root@inst-5n58i-good-crow results]# less 2020.06.01-06.57.23-app/result.txt  | egrep "\[|score"
[ior-easy-write]
score           = 6.136592
[mdtest-easy-write]
score           = 41.913878
[timestamp]
[ior-hard-write]
score           = 1.538194
[mdtest-hard-write]
score           = 3.012750
[find]
[root@inst-5n58i-good-crow results]#
 
 
 
From: Pinkesh Valdria <pinkesh.valdria@oracle.com>
Date: Saturday, May 30, 2020 at 3:01 AM
To: Andreas Dilger <adilger@dilger.ca>
Cc: <io-500@vi4io.org>
Subject: Re: [IO-500] Io500 runs twice - is that expected starting 2020 ?
 
Thanks Andreas,  
 
The C-app benchmark failed,  I mean it never completed and there was no score at the end,  I waited for 8 hours before I did ctrl-c .    It only has 4 results vs 12 results in the first section.    Are there special logs to see for the C version of the benchmark.  
 
The below is a very small development cluster I am using until I figure out how to run IO500 correctly.   
[Leaving] datafiles in ./out//2020.05.29-17.31.34-scr
[Summary] Results files in ./results//2020.05.29-17.31.34-scr
[Summary] Data files in ./out//2020.05.29-17.31.34-scr
[RESULT] BW   phase 1            ior_easy_write                6.188 GiB/s : time 357.44 seconds
[RESULT] BW   phase 2            ior_hard_write                1.132 GiB/s : time 367.41 seconds
[RESULT] BW   phase 3             ior_easy_read                8.090 GiB/s : time 273.37 seconds
[RESULT] BW   phase 4             ior_hard_read                3.726 GiB/s : time 111.69 seconds
[RESULT] IOPS phase 1         mdtest_easy_write                4.263 kiops : time 3518.34 seconds
[RESULT] IOPS phase 2         mdtest_hard_write                2.953 kiops : time 303.52 seconds
[RESULT] IOPS phase 3                      find               91.550 kiops : time 173.64 seconds
[RESULT] IOPS phase 4          mdtest_easy_stat              137.243 kiops : time 109.30 seconds
[RESULT] IOPS phase 5          mdtest_hard_stat               84.140 kiops : time  10.65 seconds
[RESULT] IOPS phase 6        mdtest_easy_delete               55.311 kiops : time 271.19 seconds
[RESULT] IOPS phase 7          mdtest_hard_read               21.778 kiops : time  41.16 seconds
[RESULT] IOPS phase 8        mdtest_hard_delete                7.133 kiops : time 129.24 seconds
[SCORE] Bandwidth 3.81212 GiB/s : IOPS 24.1149 kiops : TOTAL 9.58796
The io500.sh was run
 
Running the C version of the benchmark now
IO500 version io500-isc20
[RESULT]       ior-easy-write        6.233210 GiB/s  : time 359.216 seconds
[RESULT]    mdtest-easy-write        3.750549 kIOPS : time 3999.448 seconds
[RESULT]       ior-hard-write        1.415903 GiB/s  : time 349.737 seconds
[RESULT]    mdtest-hard-write        3.006432 kIOPS : time 305.006 seconds
^C
[root@inst-q7cdd-good-crow io500-app]#
 
 
 
 
 
 
From: Andreas Dilger <adilger@dilger.ca>
Date: Saturday, May 30, 2020 at 2:33 AM
To: Pinkesh Valdria <pinkesh.valdria@oracle.com>
Cc: <io-500@vi4io.org>
Subject: Re: [IO-500] Io500 runs twice - is that expected starting 2020 ?
 
Hi Pinkesh,
The dual runs of the IO500 benchmark for this list are intentional,
and documented in the README-ISC20.txt file in the source tree.
This is to allow comparison between the historical io500.sh script and the
new C application that runs the same IOR, mdtest, and find commands.
Please submit both results for ISC'20. 
 
We wanted to be sure that the transition to the new C-app didn't
introduce any errors in the results. The need to run the benchmark
twice will hopefully be gone for the SC'20 list. 
 
Cheers, Andreas(*)
* speaking on my own behalf and not on behalf of the IO500 board



On May 29, 2020, at 16:02, Pinkesh Valdria via IO-500 <io-500@vi4io.org> wrote:


Hello IO-500 experts,  
 
I am trying to configure io500 .  When I run it,  it runs twice,  first one is regular and 2ndone is called “Running the C version of the benchmark now”.   Is it because I misconfigured it or is it required to run both, starting 2020 ?     My config*.ini file is below.  
 
 
[root@inst-q7cdd-good-crow io500-app]# ./io500.sh config-test1.ini
System:  inst-q7cdd-good-crow
…..
Running the IO500 Benchmark now
[Creating] directories
…..
[Summary] Results files in ./results//2020.05.29-17.31.34-scr
[Summary] Data files in ./out//2020.05.29-17.31.34-scr
[RESULT] BW   phase 1            ior_easy_write                6.188 GiB/s : time 357.44 seconds
[RESULT] BW   phase 2            ior_hard_write                1.132 GiB/s : time 367.41 seconds
[RESULT] BW   phase 3             ior_easy_read                8.090 GiB/s : time 273.37 seconds
[RESULT] BW   phase 4             ior_hard_read                3.726 GiB/s : time 111.69 seconds
[RESULT] IOPS phase 1         mdtest_easy_write                4.263 kiops : time 3518.34 seconds
[RESULT] IOPS phase 2         mdtest_hard_write                2.953 kiops : time 303.52 seconds
[RESULT] IOPS phase 3                      find               91.550 kiops : time 173.64 seconds
[RESULT] IOPS phase 4          mdtest_easy_stat              137.243 kiops : time 109.30 seconds
[RESULT] IOPS phase 5          mdtest_hard_stat               84.140 kiops : time  10.65 seconds
[RESULT] IOPS phase 6        mdtest_easy_delete               55.311 kiops : time 271.19 seconds
[RESULT] IOPS phase 7          mdtest_hard_read               21.778 kiops : time  41.16 seconds
[RESULT] IOPS phase 8        mdtest_hard_delete                7.133 kiops : time 129.24 seconds
[SCORE] Bandwidth 3.81212 GiB/s : IOPS 24.1149 kiops : TOTAL 9.58796
The io500.sh was run
 
Running the C version of the benchmark now
IO500 version io500-isc20
<currently running …when I posted this question ….
 
 
***************************************************
config-test1.ini (END)
***************************************************
[global]
datadir = ./out/
resultdir = ./results/
 
timestamp-resultdir = TRUE
 
# Chose parameters that are very small for all benchmarks
 
[debug]
stonewall-time = 300 # for testing
 
[ior-easy]
transferSize = 2m
blockSize = 102400m
 
[mdtest-easy]
API = POSIX
# Files per proc
n = 500000
 
[ior-hard]
API = POSIX
# Number of segments  10000000
segmentCount = 400000
 
[mdtest-hard]
API = POSIX
# Files per proc 1000000
n = 40000
 
[find]
external-script = /mnt/beeond/io500-app/bin/pfind
pfind-parallelize-single-dir-access-using-hashing = FALSE
 
***************************************************
 
_______________________________________________
IO-500 mailing list
IO-500@vi4io.org
https://www.vi4io.org/mailman/listinfo/io-500
_______________________________________________
IO-500 mailing list
IO-500@vi4io.org
https://urldefense.com/v3/__https://www.vi4io.org/mailman/listinfo/io-500__;!!Eh6p8Q!R-JgQDihldoFTcidslJlvalZjtXD5PSvzvSp6g1wH9Y9OZ84QZgCi33sfKss8cpEYttL$ 
_______________________________________________
IO-500 mailing list
IO-500@vi4io.org
https://urldefense.com/v3/__https://www.vi4io.org/mailman/listinfo/io-500__;!!Eh6p8Q!WVqEWe7SJ8S-AX7rw556DLTewQ1uxWaOoezzKCR10LszqztYMl7u6_9ajlShK4o3Mx6T$