To my knowledge, this is not a new rule. It seems to have been omitted from the rules this
time around for an unknown reason, but being able to pre-create the directories will skew
the results in relation to last fall's results (the last big reset). I think there is
a definite desire not to have a reset list again.
--Ken Carlile
On May 28, 2020, at 3:01 PM, Mark Nelson via IO-500
<io-500@vi4io.org<mailto:io-500@vi4io.org>> wrote:
So the gist of it is that currently we let applications/users pin a given directory to a
given MDS via an xattr. That takes it out of the automatic subtree partitioning scheme so
the user has direct control over where it goes. IE in the real world you might have an
application decide that each process is going to pin directories round-robin over all
MDSes, or use a hashing scheme, or perhaps if there is some locality involved pin their
directory to an MDS running in the same node/rack/zone/etc. The idea here is that once
the user/application decides to set their own directory pin, we stay out of the way.
If I understand correctly, the new rule would require that we can't pre-create any of
the sub-directories, meaning that mdtest itself would need to set the pin xattr per
directory, unless we can some how go in and set them after they are created by mdtest but
before files are written. Alternately, the rules would allow us to set an xattr on the
top level directories and do the pinning behind the scenes (something like a
"use_rr_pinning" xattr). We're planning on adding something like that anyway
as a convenience option, but the rule would still limit any kind of user-defined pinning
strategies using our existing pinning scheme.
I guess I just want to understand the rationale. Why allow the top level directory to be
tuned but not the lower levels? Why is the rule being introduced now in the middle of a
Call for Submissions?
Mark
On 5/28/20 1:25 PM, Julian Kunkel wrote:
Hi Mark,
just to be sure:
What you cannot do with the current ruleset is to pre-create a
directory for each individual *process* in mdtest-easy as the process
wants to create such a directory.
You can create the top-level directories, e.g., md-easy, md-hard and
configure them differently.
Setting a top-level xattr seems good to me, is there a serious problem
with this issue?
Best,
Julian
P.S. These statements merely reflect my own personal view; the only
mechanism for announcing official IO500 policies and decisions is the
committee@io500.org<mailto:committee@io500.org> email address.
On Thu, May 28, 2020 at 7:17 PM Mark Nelson via IO-500
<io-500@vi4io.org<mailto:io-500@vi4io.org>> wrote:
Thinking about this more, could I please ask what the rationale here
is? Ultimately we'll do the pinning one way or another (maybe not for
ISC20, we'll see). Right now we pin the easy mdtest subdirs
individually in the script. We can accomplish the same thing with a
top-level xattr and round-robin as subdirectories are created inside
ceph itself, but that just trades user-control over the pinning scheme
for the convenience of setting a top-level xattr. The whole thing is
pretty arbitrary except that this isn't how we do it right now.
I'd like to understand where you guys are coming from on this one. Are
you worried about being able to game the benchmark if you can set subdir
xattrs? Wouldn't a real performance-focused user potentially want to
set subdir tunings (and not just in the ceph case) for a real-world
use-case that the easy mdtest benchmark is supposed to represent?
Mark
On 5/28/20 12:31 PM, Mark Nelson wrote:
Sigh. That means I'll need to have our ephemeral pinning code do it
inside ceph rather than just pinning those directories in the script
as I've been doing previously. Not impossible, just more work to do
under the time crunch while also trying to debug the API issues with
the new C version of the benchmark. This is getting rather frustrating.
Mark
On 5/28/20 12:24 PM, John Bent wrote:
Mark and all,
The committee just added a rule clarifying precreation of directories
to the rules page:
https://urldefense.com/v3/__https://www.vi4io.org/io500/rules/submission_...
. The
newly added rule states:
"Each of the four main phases (IOR easy and hard, and mdtest easy and
hard) has a subdirectory which can be precreated and tuned (e.g.
using tools such as lfs_setstripe or beegfs_ctl); however, additional
subdirectories within these subdirectories cannot be precreated."
Below my signature, I am including my standard disclaimer that my
email is not necessarily an official IO500 position but note that the
rules page itself is. :)
Hope this is clear; please do reply with any questions or need for
further clarification,
Thanks,
John(*)
* These statements merely reflect my own personal view; the only
mechanism for announcing official IO500 policies and decisions is the
committee@io500.org<mailto:committee@io500.org> <mailto:committee@io500.org>
email address.
On Wed, May 27, 2020 at 5:14 PM John Bent
<johnbent@gmail.com<mailto:johnbent@gmail.com>
<mailto:johnbent@gmail.com>> wrote:
Hey Mark,
Thanks for the interest. It will be great to get your
contributions!
1. Must be exactly 300 seconds.
2. Does not include the directories. Other historical submissions
have tuned the directories exactly as you describe.
3. Yes, 10+ metal nodes in AWS satisfies this requirement.
Other committee members, and community members, please chime in if
I got anything wrong! Mark, you might note the disclaimer below
my signature which is just our committee's way of being careful.
I'll make sure to discuss this email with the rest of the
committee and will let you know if any of my answers need official
clarification.
Thanks,
John(*)
* These statements merely reflect my own personal view; the only
mechanism for announcing official IO500 policies and decisions is
the committee@io500.org<mailto:committee@io500.org>
<mailto:committee@io500.org> email address.
On Wed, May 27, 2020 at 4:44 PM Mark Nelson via IO-500
<io-500@vi4io.org<mailto:io-500@vi4io.org>
<mailto:io-500@vi4io.org>> wrote:
Hi Folks,
We are thinking about throwing together some cephfs io500
results for
ISC20 and I just wanted to make sure that we are doing the
right thing
in a couple of cases. Any help would be much appreciated
since we've
never submitted results before. We might have a couple of
additional
questions later on, but for now:
1) "All create/write phases must run for at least 300 seconds;
the
stonewall flag must be set to 300 which should ensure this."
Is it acceptable to set the stonewall higher than 300, or is a
setting
of exactly 300 required?
2) "The file names for the mdtest output files may not be
pre-created."
Does this also include the directories? We have the ability
to pin
directories to specific MDSes that helps in the easy tests. We
also have
an experimental feature that more or less does this
psuedo-randomly
behind the scenes so long as a top level xattr is set, but it
would be
convenient if we could just pre-create the mdtest directories
and set
the xattr to pin them individually in the "directory setup"
phase of the
test if allowed. Likewise, we have code that allows users to
provide a
hint if a specific directory is expected to have lots of files
which can
improve performance in the hard tests. I would like to
pre-create the
mdtest directory so that we can set the xattr informing ceph
that we
expect a lot of files to be written in that directory.
3) "Only submissions using at least 10 physical client nodes are
eligible to win IO500 awards and at least one benchmark
process must run
on each."
We are planning on running on AWS. So long as we are using
10+ metal
nodes does that meet the requirement to have "at least 10
physical
client nodes"?
Thanks,
Mark
_______________________________________________
IO-500 mailing list
IO-500@vi4io.org<mailto:IO-500@vi4io.org> <mailto:IO-500@vi4io.org>
https://urldefense.com/v3/__https://www.vi4io.org/mailman/listinfo/io-500...
_______________________________________________
IO-500 mailing list
IO-500@vi4io.org<mailto:IO-500@vi4io.org>
https://urldefense.com/v3/__https://www.vi4io.org/mailman/listinfo/io-500...
--
Dr. Julian Kunkel
Lecturer, Department of Computer Science
+44 (0) 118 378 8218
https://urldefense.com/v3/__http://www.cs.reading.ac.uk/__;!!Eh6p8Q!Rczxa...
PGP Fingerprint: 1468 1A86 A908 D77E B40F 45D6 2B15 73A5 9D39 A28E
_______________________________________________
IO-500 mailing list
IO-500@vi4io.org<mailto:IO-500@vi4io.org>
https://urldefense.com/v3/__https://www.vi4io.org/mailman/listinfo/io-500...