Hi all,

I just wanted to add a note before the holidays help everyone disappear for a week or two.

I think we have settled on the following:

1. Anything that is capable of being treated as “storage” should have separate benchmarking. This will give us some idea how things like SCR will perform. It will also tell us worst case performance should data either be unavailable in the fast tier or the fast tier capacity is exceeded and data migration/retargeting is required. If data migration is required, some measure of the simultaneous drain/fill should also be benchmarked.

2. We need to settle on benchmarks for traditional HPC workloads, such as engineering codes and bulk synchronous simulations with distributed, but dependent data sets. We also need to determine what benchmarks we want to support for all phases of data movement/staging for other workloads, such as bio/genomics, chemistry, or data analytics workloads. Data distribution and reading performance are important. In the case of flash, erasing the data from a previous application needs to be included unless there is some guarantee that it won’t be an issue (doubtful).

Irene had some ideas about cloud workloads that may be different from those described above. Hopefully she can educate the rest of us on what we should include. Having support for Swift or S3 APIs, if supported, is a simple, but likely inadequate first step.

Sarp has tried this effort previously, but ran into serious issues. If he could share the specifics of what they ran into and/or what they developed before abandoning the project, that would be extremely valuable.

Best,

Jay