Hi Guys,
I've been working on improving CephFS scores during the hard portions of
the test suite and noticed something interesting. We currently have a
fairly large discrepancy between easy and hard mdtest write tests
meaning we also have a fairly large disparity in the amount of data
written during each test. What I've noticed is that as I've improved
the throughput of mdtest hard writes, our find throughput has dropped.
You can see an example of that happening in the results posted here:
https://github.com/ceph/ceph/pull/34574
That makes sense if we are slower at finding files in a single giant
directory vs spread across many directories being serviced by dedicated
MDSes. IE I think the find score is probably overinflated when
significantly less data is written during the hard mdtest write phase
relative to the easy write phase. Does this seem like a reasonable
hypothesis? If so, it seems like this could be solved by running
separate find tests across the easy and hard datasets and scoring them
in the usual way.
Thoughts?
Mark