If the test is running on a non-production Lustre filesystem, another option is to enable
the "large_dir" feature for ldiskfs to allow larger single directories.
The code for the feature has existed for a long time in ldiskfs, but was only added to
e2fsprogs-1.44, so older e2fsprogs-1.42.12 will refuse to touch such a filesystem. While
there is an e2fsprogs-1.44.3-wc1 version that has support for it today, it is still in
testing, so I'm reluctant to advise people to enable the feature in production.
To enable the feature on an existing MDT run the following command on the MDS as root:
tune2fs -O large_dir /dev/<mdtdev>
which should also work with older e2fsprogs.
This should allow a single directory to be up to 6B entries in theory, though ldiskfs has
a cap of 4B inodes today. Performance will start to decline somewhat after 10M entries as
it adds another level to the tree, and it will need enough RAM on the MDS to cache
effectively (about 1GB per 10M entries).
Cheers, Andreas
On Sep 27, 2018, at 10:08, John Bent <johnbent(a)gmail.com>
wrote:
Hello Julian,
There is another customer who is struggling with this same problem and also wants to use
stonewall. Are the two features compatible yet?
Thanks,
John
> On Thu, Jul 26, 2018 at 3:57 AM Julian Kunkel <juliankunkel(a)googlemail.com>
wrote:
> Dear Osama,
> thanks for pointing this out.
> I was not aware of this problem.
>
> I now made some modifications to mdtest to compensate for this limitation.
> Now it can limit the number of files per directories creating one
> extra top-level directory if needed.
> E.g. with a limit of 50 files and a total of 200 files, one would see
> in the top-level testing directory:
> #test-dir.0-0
> #test-dir.0-1
> #test-dir.0-2
> #test-dir.0-3
>
> The subdirectory tree of each directory is the complete tree as before
> under #test-dir.0-0.
> There is no synchronization when switching between these trees.
> I expect that this emulates very well the behavior of a single large
> directory but well there is no perfect solution.
>
>
> The stuff is already integrated into the testing branch of io-500-dev.
> To use it:
>
> In io-500-dev
> $ git checkout testing
> $ ./utilities/prepare.sh
>
> in io500.sh
> Line 85:
> io500_mdtest_hard_other_options=""
> Change that to:
> io500_mdtest_hard_other_options="-I 8000000"
> Which effectively will create a directory for each batch of 8M files.
> Note that the number of items (-n) must be a multiple of (-I)
>
> Note that that feature cannot be used with stonewalling at the moment!
> At some point, the mdtest code needs a redesign to allow such changes.
>
> Hence, set in L30:
> io500_stonewall_timer=0
>
> If you want to give it a try, I would welcome it...
>
> Best,
> Julian
>
> 2018-07-26 3:05 GMT+01:00 Osamu Tatebe <tatebe(a)cs.tsukuba.ac.jp>:
> > Hi,
> >
> > I would like to remind an issue regarding mdtest-hard benchmark. It
> > creates files in a single directory for at least five minutes. When
> > the metadata performance is more improved, the greater number of files
> > will be created. This will hit the limit of the number of files in a
> > single directory.
> >
> > Actually, the limit of Lustre file system is about 8M files although
> > it depends the length of the pathname. When the metadata performance
> > is better than 27K IOPS, it is not possible to execute the mdtest-hard
> > create benchmark for five minutes since the number of files is larger
> > than 8M.
> >
> > This is a reason we cannot submit the Lustre result. I expect this
> > will be quite common when the metadata performance is more improved.
> >
> > Regards,
> > Osamu
> >
> > ---
> > Osamu Tatebe, Ph.D.
> > Center for Computational Sciences, University of Tsukuba
> > 1-1-1 Tennodai, Tsukuba, Ibaraki 3058577 Japan
> > _______________________________________________
> > IO-500 mailing list
> > IO-500(a)vi4io.org
> >
https://www.vi4io.org/mailman/listinfo/io-500
>
>
>
> --
> Dr. Julian Kunkel
> Lecturer, Department of Computer Science
> +44 (0) 118 378 8218
>
http://www.cs.reading.ac.uk/
>
https://hps.vi4io.org/
> _______________________________________________
> IO-500 mailing list
> IO-500(a)vi4io.org
>
https://www.vi4io.org/mailman/listinfo/io-500
_______________________________________________
IO-500 mailing list
IO-500(a)vi4io.org
https://www.vi4io.org/mailman/listinfo/io-500