Hey Mark,

I believe the rational is that the process of creating directories takes time.  Some of the tests might create a lot of directories and we want that included in the measured time.  So doing the precreation of the directories moves work from the measured benchmark phase to an unmeasured precreate phase.  Does that help?

Thanks,

John(*)
* These statements merely reflect my own personal view; the only mechanism for announcing official IO500 policies and decisions is the committee@io500.org email address.

On Thu, May 28, 2020 at 12:17 PM Mark Nelson <mnelson@redhat.com> wrote:
Thinking about this more, could I please ask what the rationale here
is?  Ultimately we'll do the pinning one way or another (maybe not for
ISC20, we'll see).  Right now we pin the easy mdtest subdirs
individually in the script.  We can accomplish the same thing with a
top-level xattr and round-robin as subdirectories are created inside
ceph itself, but that just trades user-control over the pinning scheme
for the convenience of setting a top-level xattr.  The whole thing is
pretty arbitrary except that this isn't how we do it right now.


I'd like to understand where you guys are coming from on this one.  Are
you worried about being able to game the benchmark if you can set subdir
xattrs?  Wouldn't a real performance-focused user potentially want to
set subdir tunings (and not just in the ceph case) for a real-world
use-case that the easy mdtest benchmark is supposed to represent?


Mark


On 5/28/20 12:31 PM, Mark Nelson wrote:
> Sigh. That means I'll need to have our ephemeral pinning code do it
> inside ceph rather than just pinning those directories in the script
> as I've been doing previously.  Not impossible, just more work to do
> under the time crunch while also trying to debug the API issues with
> the new C version of the benchmark.  This is getting rather frustrating.
>
>
> Mark
>
>
> On 5/28/20 12:24 PM, John Bent wrote:
>> Mark and all,
>>
>> The committee just added a rule clarifying precreation of directories
>> to the rules page: https://www.vi4io.org/io500/rules/submission. The
>> newly added rule states:
>>
>> "Each of the four main phases (IOR easy and hard, and mdtest easy and
>> hard) has a subdirectory which can be precreated and tuned (e.g.
>> using tools such as lfs_setstripe or beegfs_ctl); however, additional
>> subdirectories within these subdirectories cannot be precreated."
>>
>> Below my signature, I am including my standard disclaimer that my
>> email is not necessarily an official IO500 position but note that the
>> rules page itself is.  :)
>>
>> Hope this is clear; please do reply with any questions or need for
>> further clarification,
>>
>> Thanks,
>>
>> John(*)
>> * These statements merely reflect my own personal view; the only
>> mechanism for announcing official IO500 policies and decisions is the
>> committee@io500.org <mailto:committee@io500.org> email address.
>>
>>
>> On Wed, May 27, 2020 at 5:14 PM John Bent <johnbent@gmail.com
>> <mailto:johnbent@gmail.com>> wrote:
>>
>>     Hey Mark,
>>
>>     Thanks for the interest.  It will be great to get your
>> contributions!
>>
>>     1.  Must be exactly 300 seconds.
>>     2. Does not include the directories.  Other historical submissions
>>     have tuned the directories exactly as you describe.
>>     3. Yes, 10+ metal nodes in AWS satisfies this requirement.
>>
>>     Other committee members, and community members, please chime in if
>>     I got anything wrong!  Mark, you might note the disclaimer below
>>     my signature which is just our committee's way of being careful.
>>     I'll make sure to discuss this email with the rest of the
>>     committee and will let you know if any of my answers need official
>>     clarification.
>>
>>     Thanks,
>>
>>     John(*)
>>
>>     * These statements merely reflect my own personal view; the only
>>     mechanism for announcing official IO500 policies and decisions is
>>     the committee@io500.org <mailto:committee@io500.org> email address.
>>
>>
>>     On Wed, May 27, 2020 at 4:44 PM Mark Nelson via IO-500
>>     <io-500@vi4io.org <mailto:io-500@vi4io.org>> wrote:
>>
>>         Hi Folks,
>>
>>
>>         We are thinking about throwing together some cephfs io500
>>         results for
>>         ISC20 and I just wanted to make sure that we are doing the
>>         right thing
>>         in a couple of cases.  Any help would be much appreciated
>>         since we've
>>         never submitted results before.  We might have a couple of
>>         additional
>>         questions later on, but for now:
>>
>>
>>         1) "All create/write phases must run for at least 300 seconds;
>>         the
>>         stonewall flag must be set to 300 which should ensure this."
>>
>>         Is it acceptable to set the stonewall higher than 300, or is a
>>         setting
>>         of exactly 300 required?
>>
>>
>>         2) "The file names for the mdtest output files may not be
>>         pre-created."
>>
>>         Does this also include the directories?  We have the ability
>>         to pin
>>         directories to specific MDSes that helps in the easy tests. We
>>         also have
>>         an experimental feature that more or less does this
>>         psuedo-randomly
>>         behind the scenes so long as a top level xattr is set, but it
>>         would be
>>         convenient if we could just pre-create the mdtest directories
>>         and set
>>         the xattr to pin them individually in the "directory setup"
>>         phase of the
>>         test if allowed.  Likewise, we have code that allows users to
>>         provide a
>>         hint if a specific directory is expected to have lots of files
>>         which can
>>         improve performance in the hard tests.  I would like to
>>         pre-create the
>>         mdtest directory so that we can set the xattr informing ceph
>>         that we
>>         expect a lot of files to be written in that directory.
>>
>>
>>         3) "Only submissions using at least 10 physical client nodes are
>>         eligible to win IO500 awards and at least one benchmark
>>         process must run
>>         on each."
>>
>>         We are planning on running on AWS.  So long as we are using
>>         10+ metal
>>         nodes does that meet the requirement to have "at least 10
>>         physical
>>         client nodes"?
>>
>>
>>         Thanks,
>>
>>         Mark
>>
>>         _______________________________________________
>>         IO-500 mailing list
>>         IO-500@vi4io.org <mailto:IO-500@vi4io.org>
>>         https://www.vi4io.org/mailman/listinfo/io-500
>>