Orateur
Prof.
David HILL
(Blaise Pascal University)
Description
Numerical reproducibility has been the rule for years with old single-core CPUs, standards for numerical computing (such as IEEE 754 for floating point computing for instance) and hardware features like ECC memory. Numerical reproducibility It is a key feature for computational science and the experimental scientific method. Being able to obtain exactly the same results from run to run when the environment and parameters are the same, is essential for debugging. However, hardware developments over the past decade have made it almost impossible to ensure computational reproducibility on high performance systems without a significant loss of performance. Not being able to debug a program for some scientific cases means we have lost an essential feature of our Turing machines. If top scientists are aware of the importance of numerical reproducibility and of their sources, many colleagues just have to be trained to realize the impact of this problem on their numerical computing. In this talk we will look at the causes of this loss of reproducibility. We will start with CPUs using out-of-order execution to improve performance and we will also examine what is called soft errors on large computing systems. We will also present some methods to avoid reproducibility losses for stochastic simulations.