If the test is running on a non-production Lustre filesystem, another option is to enable the "large_dir" feature for ldiskfs to allow larger single directories. 

The code for the feature has existed for a long time in ldiskfs, but was only added to e2fsprogs-1.44, so older e2fsprogs-1.42.12 will refuse to touch such a filesystem. While there is an e2fsprogs-1.44.3-wc1 version that has support for it today, it is still in testing, so I'm reluctant to advise people to enable the feature in production. 

To enable the feature on an existing MDT run the following command on the MDS as root:

     tune2fs -O large_dir /dev/<mdtdev>

which should also work with older e2fsprogs. 

This should allow a single directory to be up to 6B entries in theory, though ldiskfs has a cap of 4B inodes today.  Performance will start to decline somewhat after 10M entries as it adds another level to the tree, and it will need enough RAM on the MDS to cache effectively (about 1GB per 10M entries).

Cheers, Andreas

On Sep 27, 2018, at 10:08, John Bent <johnbent@gmail.com> wrote:

Hello Julian,

There is another customer who is struggling with this same problem and also wants to use stonewall.  Are the two features compatible yet?

Thanks,

John

On Thu, Jul 26, 2018 at 3:57 AM Julian Kunkel <juliankunkel@googlemail.com> wrote:
Dear Osama,
thanks for pointing this out.
I was not aware of this problem.

I now made some modifications to mdtest to compensate for this limitation.
Now it can limit the number of files per directories creating one
extra top-level directory if needed.
E.g. with a limit of 50 files and a total of 200 files, one would see
in the top-level testing directory:
#test-dir.0-0
#test-dir.0-1
#test-dir.0-2
#test-dir.0-3

The subdirectory tree of each directory is the complete tree as before
under #test-dir.0-0.
There is no synchronization when switching between these trees.
I expect that this emulates very well the behavior of a single large
directory but well there is no perfect solution.


The stuff is already integrated into the testing branch of io-500-dev.
To use it:

In io-500-dev
$ git checkout testing
$ ./utilities/prepare.sh

in io500.sh
Line 85:
io500_mdtest_hard_other_options=""
Change that to:
io500_mdtest_hard_other_options="-I 8000000"
Which effectively will create a directory for each batch of 8M files.
Note that the number of items (-n) must be a multiple of (-I)

Note that that feature cannot be used with stonewalling at the moment!
At some point, the mdtest code needs a redesign to allow such changes.

Hence, set in L30:
io500_stonewall_timer=0

If you want to give it a try, I would welcome it...

Best,
Julian

2018-07-26 3:05 GMT+01:00 Osamu Tatebe <tatebe@cs.tsukuba.ac.jp>:
> Hi,
>
> I would like to remind an issue regarding mdtest-hard benchmark.  It
> creates files in a single directory for at least five minutes.  When
> the metadata performance is more improved, the greater number of files
> will be created.  This will hit the limit of the number of files in a
> single directory.
>
> Actually, the limit of Lustre file system is about 8M files although
> it depends the length of the pathname.  When the metadata performance
> is better than 27K IOPS, it is not possible to execute the mdtest-hard
> create benchmark for five minutes since the number of files is larger
> than 8M.
>
> This is a reason we cannot submit the Lustre result.  I expect this
> will be quite common when the metadata performance is more improved.
>
> Regards,
> Osamu
>
> ---
> Osamu Tatebe, Ph.D.
> Center for Computational Sciences, University of Tsukuba
> 1-1-1 Tennodai, Tsukuba, Ibaraki 3058577 Japan
> _______________________________________________
> IO-500 mailing list
> IO-500@vi4io.org
> https://www.vi4io.org/mailman/listinfo/io-500



--
Dr. Julian Kunkel
Lecturer, Department of Computer Science
+44 (0) 118 378 8218
http://www.cs.reading.ac.uk/
https://hps.vi4io.org/
_______________________________________________
IO-500 mailing list
IO-500@vi4io.org
https://www.vi4io.org/mailman/listinfo/io-500
_______________________________________________
IO-500 mailing list
IO-500@vi4io.org
https://www.vi4io.org/mailman/listinfo/io-500