Your browser doesn't support javascript.
loading
Fluid Simulations Accelerated With 16 Bits: Approaching 4x Speedup on A64FX by Squeezing ShallowWaters.jl Into Float16.
Klöwer, Milan; Hatfield, Sam; Croci, Matteo; Düben, Peter D; Palmer, Tim N.
Afiliación
  • Klöwer M; Atmospheric, Oceanic and Planetary Physics University of Oxford Oxford UK.
  • Hatfield S; European Centre for Medium-Range Weather Forecasts Reading UK.
  • Croci M; Mathematical Institute University of Oxford Oxford UK.
  • Düben PD; European Centre for Medium-Range Weather Forecasts Reading UK.
  • Palmer TN; Atmospheric, Oceanic and Planetary Physics University of Oxford Oxford UK.
J Adv Model Earth Syst ; 14(2): e2021MS002684, 2022 Feb.
Article en En | MEDLINE | ID: mdl-35866041
Most Earth-system simulations run on conventional central processing units in 64-bit double precision floating-point numbers Float64, although the need for high-precision calculations in the presence of large uncertainties has been questioned. Fugaku, currently the world's fastest supercomputer, is based on A64FX microprocessors, which also support the 16-bit low-precision format Float16. We investigate the Float16 performance on A64FX with ShallowWaters.jl, the first fluid circulation model that runs entirely with 16-bit arithmetic. The model implements techniques that address precision and dynamic range issues in 16 bits. The precision-critical time integration is augmented to include compensated summation to minimize rounding errors. Such a compensated time integration is as precise but faster than mixed precision with 16 and 32-bit floats. As subnormals are inefficiently supported on A64FX the very limited range available in Float16 is 6 × 10-5 to 65,504. We develop the analysis-number format Sherlogs.jl to log the arithmetic results during the simulation. The equations in ShallowWaters.jl are then systematically rescaled to fit into Float16, using 97% of the available representable numbers. Consequently, we benchmark speedups of up to 3.8x on A64FX with Float16. Adding a compensated time integration, speedups reach up to 3.6x. Although ShallowWaters.jl is simplified compared to large Earth-system models, it shares essential algorithms and therefore shows that 16-bit calculations are indeed a competitive way to accelerate Earth-system simulations on available hardware.
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: J Adv Model Earth Syst Año: 2022 Tipo del documento: Article Pais de publicación: Estados Unidos

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: J Adv Model Earth Syst Año: 2022 Tipo del documento: Article Pais de publicación: Estados Unidos