Difference between revisions of "UM version4.5 benchmarks"

From SourceWiki
Jump to navigation Jump to search
(Created page with 'category:JASMIN '''Benchmarking UM Version4.5 on different Architectures''' =Preamble= * Cluster/Parallel file systems are often a bottleneck. * If the model is not filesys…')
 
Line 13: Line 13:
  
 
=Intel SandyBridge=
 
=Intel SandyBridge=
 +
 +
* Test system: Quad socket, 8-core E-4650L (2.60GHz) (L for Low power)
 +
* 20MB L3 cache
 +
 +
 +
{| border="1" cellpadding="10"
 +
!colspan=4|MPI message latency
 +
|-
 +
||  || 0 bytes || 128 bytes || 1024 bytes
 +
|-
 +
|| between sockets || ~0.70us || ~1.15us || ~2.0us
 +
|-
 +
|}
 +
 +
==FAMOUS==
 +
 +
{| border="1" cellpadding="10"
 +
|| Domain Decomposition || Model-years/day
 +
|-
 +
|| 4x2 || ~327
 +
|-
 +
|| 8x2 || ~450
 +
|-
 +
|| 8x4 || ~480
 +
|-
 +
|}
 +
 +
==HadCM3==
 +
 +
{| border="1" cellpadding="10"
 +
|| Domain Decomposition || Model-years/day
 +
|-
 +
|| 8x2 || ~48
 +
|-
 +
|| 8x4 || ~65
 +
|-
 +
|}

Revision as of 15:24, 17 December 2012

Benchmarking UM Version4.5 on different Architectures

Preamble

  • Cluster/Parallel file systems are often a bottleneck.
  • If the model is not filesystem-bound, it is often (MPI massage) latency-bound.
  • Only the master process writes output, this can lead to load-balance issues, which hinder scaling.

AMD Bulldozer

Intel Westmere

Intel SandyBridge

  • Test system: Quad socket, 8-core E-4650L (2.60GHz) (L for Low power)
  • 20MB L3 cache


MPI message latency
0 bytes 128 bytes 1024 bytes
between sockets ~0.70us ~1.15us ~2.0us

FAMOUS

Domain Decomposition Model-years/day
4x2 ~327
8x2 ~450
8x4 ~480

HadCM3

Domain Decomposition Model-years/day
8x2 ~48
8x4 ~65