UM version4.5 benchmarks

Benchmarking UM Version4.5 on different Architectures

=Preamble=


 * Cluster/Parallel file systems are often a bottleneck. Timings are for writing to local disk, unless specified otherwise.
 * If the model is not filesystem-bound, it is often (MPI massage) latency-bound.
 * Only the master process writes output, this can lead to load-balance issues, which hinder scaling.
 * Worst case message latencies for a cohort of processors are what matter for scaling. The vast majority of messages are either ~100 bytes or ~1KB in size.  Latencies are reported for these key message sizes.

=Emerald=


 * Intel Westmere E5649 (2.53GHz)
 * QDR Infiniband (non-RoCE)
 * GCOMv3.1

HadCM3
=Intel SandyBridge Test System=


 * Test system: Quad socket, 8-core E-4650L (2.60GHz) (L for Low power)
 * 20MB L3 cache
 * GCOMv3.1

FAMOUS

 * The last line of this table shows a real problem scaling beyond 16 cores. Load balance?  (Latencies are much better than QDR IB.)
 * Would like to try to improve file writing performance and re-run.

HadCM3
=Polaris=


 * Intel E5-2670 @ 2.60GHz
 * Infiniband: Mellanox Technologies MT27500 Family [ConnectX-3]
 * Lustre