Thanks to everyone for the great discussion. From re-reading everyone's
comments, here is what I gleamed:
1. I think I derailed the conversation with my 1PB suggestion. It was
simply a capacity that I felt couldn't be reasonably stored in a single
node. So if the concern is shared memory on a single node, then that was
just 1 way to get around the issue without dictating architecture. But it
sounds like it was a poor suggestion :)
2. It sounds like everyone has different ideas for why the 10 node
challenge exists (e.g., let small systems submit, evaluate smaller apps on
large storage systems, encourage participation). While this seems ok (and
possibly desirable to have so many reasons for a benchmark's existence),
the actual results seem to be less useful than the hero runs in that what
guidance does it provide to our users? Some entries have huge backends,
some have a few TBs and .few storage storage servers...so its really
varied. I guess its 'reader of list beware' (as always seems to be the
case).
3. I like the thread limit suggestion, keeping it hard for many FSs and
probably a little more realistic how real jobs run on physical h/w and
utilize storage (as Steve stated). But its not clear to me that the 10
node challenge is primarily focused on real world anyway given the myriad
of reasons for its existence.
Finally I do still think it would be useful for the committee/community to
come up with a more singular focus for the 10 node challenge rather than
the grab bag that exists today.
Thanks to everyone again, beyond fixing FS perf bugs, the io500 also seems
like a great way for the community to discuss real issues.
On Fri, Oct 4, 2019 at 3:18 PM Carlile, Ken <carlilek(a)janelia.hhmi.org>
wrote:
+++
The 1 PB limit barring all flash systems Is exactly what I had in mind.
While I do have a few all flash systems (or ones that can convincingly make
the argument for the purposes of io500), arguably my fastest is 500T. I
suspect I have a faster one that’s only 100T, but I’ll never have a chance
to test that one...
--Ken
Sent from my <advertising redacted>
> On Oct 4, 2019, at 5:55 PM, Andreas Dilger <adilger(a)dilger.ca> wrote:
>
> IMHO, the "10 physical nodes" requirement makes sense from the point of
view
> previously stated, that running 10 virtual hosts on the same physical
machine
> can dramatically skew the results since they can share the same memory
and
> do virtually no actual network traffic, bypassing the "read your
neighbour"
> requirement.
>
> One of the main motivators for IO-500 is to explore filesystem scaling
in real
> clusters, and if we allow 10 virtual nodes, why not devolve into 10
containers
> in the same OS instance, or 10 mountpoints on a single node with a local
server?
> I think that is bypassing the intent of the benchmark completely.
>
> As for why the 10-node challenge exists, IMHO there are two motivations
for this:
> - see storage performance with a limited number of clients so that
users/admins
> can get a realistic sense of how IO performance will scale based on the
number
> of clients (i.e. assuming benchmark numbers are limited by protocol and
network
> performance), while the hero numbers are based on the number of servers
(i.e.
> assume benchmark numbers are limited by aggregate storage bandwidth and
IOPS).
> - let's be realistic- since the list doesn't yet have 500 submissions,
let alone
> 500 top systems, so this is a reasonable way to improve audience
participation
> that actually provides some useful metrics.
>
> In particular I like the idea of comparing the 10-node result with the
N-node
> result to see just how much of the storage bandwidth can be driven by a
small
> number of clients. In theory, the 10-node case could saturate the
network
> bandwidth (assuming storage bandwidth > 10x network bandwidth), but in
practise
> this is not always the case, and the difference shows areas that could
improve
> CPU or protocol or server efficiency. I think the IO-500 has already
driven
> real-world improvements (c.f. Cambridge) that have improved the lives of
users.
> I think that 10-node results will continue to be useful to sumbit even
after
> there are more than 500 larger results available on the list.
>
> As for the 1PB minimum, I think that would drive down participation,
especially
> in the (IMHO important) flash storage arena, since that can be
cost-prohibitive
> today. I think the list will naturally fill out over time, with new and
large
> systems coming online submitting results during their acceptance phase
pushing
> the 10-node results from the top spots, and eventually from the IO-500
list
> entirely. In the meantime, I don't see a need to refuse valid results
while
> there is still a lot of room on the list that needs to be filled.
>
> Cheers, Andreas
>
>
>> On Oct 4, 2019, at 10:46 AM, Carlile, Ken via IO-500 <io-500(a)vi4io.org>
wrote:
>>
>> I think the 1PB is a non-starter. Why exclude the small guys?
>>
>> What confuses me is the statement that it's ok to run multiple VMs as
long as the iron count is 10. Or am I misreading that?
>>
>> 10 clients makes sense to me because certain places simply don't HAVE
that many clients to throw at the benchmark, and it normalizes the speeds
across a standard number of clients.
>>
>> --Ken
>>
>>>> On Oct 4, 2019, at 12:43 PM, Dean Hildebrand via IO-500 <
io-500(a)vi4io.org> wrote:
>>>
>>> Julien, Thanks for the examples.
>>>
>>> I think what you may be getting at is that the 10 client challenge is
really about, "Given a large storage system that submits a result to the
standard io500, how well does it do with only 10 clients?".
>>>
>>> If this is the case, and we don't want to encourage the submission of
small non-scalable storage systems, then maybe there are other ways to
achieve it such as:
>>> - A submission to the 10 client challenge is only valid if a
submission is also made to the standard io500 list. Users can then look at
both rankings to get an understanding of the system.
>>> - Each submission must have at least 1PB of storage capacity, which
will increase by 10% each year.
>>>
>>> Just rough ideas, but maybe we need to clarify why an io500 list cares
about 10 clients?
>>> Dean
>>>
>>>
>>> On 10/3/19 1:39 AM, Julian Kunkel wrote:
>>>> Hi,
>>>> IMHO: A simple way of seeing this matter for the 10 node challenge is
>>>> that it really should be about 10 nodes with interconnects to
>>>> normalize results to some extent. Such runs can be seen in a real
>>>> configuration.
>>>> However, deploying 10 VMs on a single host and seeing a performance
>>>> gain vs. running directly on the host seems to be artificial.
>>>>
>>>> Regarding cheating: theoretically one could run 10 VMs on one big
>>>> node, the host could slow down the creation rates to a limit such that
>>>> all data is available in a big cache (say NVDIMMs) from the
>>>> perspective of the host (and the VMs then). Every read would then be
>>>> cached.
>>>>
>>>> Here is a rather artificial example (if you have more appropriate
>>>> numbers, use them):
>>>>
>>>> For IOR BW assume
>>>> * writes 5 GiB/s to NVDIMMs (throttled) => 1.5 * 2 TB space needed /
doable.
>>>> * read 500 GiB/s.
>>>> => (5*5*500*500)^0.25 = 50 score
>>>> Not an issue so far.
>>>>
>>>> For MD, 10 Million IOOPS for create and 100 Million for any
>>>> read/delete and find would give
>>>> (10000*10000*100000*100000*100000*100000*100000*100000)^(1/8)
>>>> => 56234.13
>>>>
>>>> Total score: sqrt(56234*50) = 1676.812
>>>>
>>>> Yes, it is a synthetic example but there could be technology out there
>>>> that generates such numbers o people may create an IOR backend to
>>>> exploit such a setup.
>>>> You could also use two nodes and only 1/5th of data needs to be
>>>> transferred over the network (as the IO500 does rank-shifting), that
>>>> would also lead to a superficial number.
>>>>
>>>> Personally I would be interested in such gaming results, you can
>>>> always submit such numbers to the full list as synthetic "upper
>>>> bounds".
>>>>
>>>> Best,
>>>> Julian
>>>>
>>>> On Wed, Oct 2, 2019 at 10:02 PM Dean Hildebrand via IO-500
>>>> <io-500(a)vi4io.org> wrote:
>>>>> As a cloud provider, this rule isn't too onerous as there is
always
a way to get dedicated machines through sole tenant offerings and simply
using large VMs (although it is a waste of $$ to use clients that have 60+
cores just to run a single benchmark process).
>>>>>
>>>>> I'm more curious about the thinking here, can someone from the
committee provide some background? This is one of those funny and rare
cases where we are worried about someone with fewer resources having an
advantage over someone with more resources. If a system with a 1 or 2
clients can beat 10...isn't that one measure of success from an HPC point
of view?
>>>>>
>>>>> Dean
>>>>>
>>>>> On 9/30/19 9:10 AM, John Bent via IO-500 wrote:
>>>>>
>>>>>> To IO500 Community,
>>>>>>
>>>>>>
>>>>>> The committee has received some queries about the rules
concerning
virtual machines for the 10 Node Challenge. As such, the committee has
added the following rule:
>>>>>>
>>>>>>
>>>>>> 13. For the 10 Node Challenge, there must be exactly 10
physical
nodes for client processes and at least one benchmark process must run on
each
>>>>>>
>>>>>> Virtual machines can be used but the above rule must be
followed.
More than one virtual machine can be run on each physical node.
>>>>>>
>>>>>>
>>>>>> Although we recognize that this may disadvantage cloud
architectures, we do want to stress that this rule only applies to the 10
Node Challenge. The committee did feel it was important to add this rule to
ensure that the 10 Node Challenge sublist offers the maximum potential for
fair comparisons by ensuring equivalent client hardware quantities.
Submissions with any number/combination of virtual and physical machines
can of course always be submitted to the full list.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thank you,
>>>>>>
>>>>>>
>>>>>> The IO500 Committee
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> IO-500 mailing list
>>>>> IO-500(a)vi4io.org
>>>>>
https://urldefense.com/v3/__https://www.vi4io.org/mailman/listinfo/io-500...
>>>>>
>>>>> _______________________________________________
>>>>> IO-500 mailing list
>>>>> IO-500(a)vi4io.org
>>>>>
https://urldefense.com/v3/__https://www.vi4io.org/mailman/listinfo/io-500...
>>>>
>>>>
>>>
>>> _______________________________________________
>>> IO-500 mailing list
>>> IO-500(a)vi4io.org
>>>
https://urldefense.com/v3/__https://www.vi4io.org/mailman/listinfo/io-500...
>>
>> _______________________________________________
>> IO-500 mailing list
>> IO-500(a)vi4io.org
>>
https://www.vi4io.org/mailman/listinfo/io-500
>
>
> Cheers, Andreas
>
>
>
>
>