Arteaga, A., Fuhrer, O., & Hoefler, T. (2014). Designing Bit-Reproducible Portable High-Performance Applications. 2014 IEEE 28th International Parallel and Distributed Processing Symposium. doi:10.1109/ipdps.2014.127
Bientinesi, P., Quintana-Ortí, E. S., & Geijn, R. A. van de. (2005). Representing linear algebra algorithms in code: the FLAME application program interfaces. ACM Transactions on Mathematical Software, 31(1), 27-59. doi:10.1145/1055531.1055533
Chohra, C., Langlois, P., & Parello, D. (2016). Efficiency of Reproducible Level 1 BLAS. Lecture Notes in Computer Science, 99-108. doi:10.1007/978-3-319-31769-4_8
[+]
Arteaga, A., Fuhrer, O., & Hoefler, T. (2014). Designing Bit-Reproducible Portable High-Performance Applications. 2014 IEEE 28th International Parallel and Distributed Processing Symposium. doi:10.1109/ipdps.2014.127
Bientinesi, P., Quintana-Ortí, E. S., & Geijn, R. A. van de. (2005). Representing linear algebra algorithms in code: the FLAME application program interfaces. ACM Transactions on Mathematical Software, 31(1), 27-59. doi:10.1145/1055531.1055533
Chohra, C., Langlois, P., & Parello, D. (2016). Efficiency of Reproducible Level 1 BLAS. Lecture Notes in Computer Science, 99-108. doi:10.1007/978-3-319-31769-4_8
Collange, S., Defour, D., Graillat, S., & Iakymchuk, R. (2015). Numerical reproducibility for the parallel reduction on multi- and many-core architectures. Parallel Computing, 49, 83-97. doi:10.1016/j.parco.2015.09.001
Demmel, J., & Hong Diep Nguyen. (2013). Fast Reproducible Floating-Point Summation. 2013 IEEE 21st Symposium on Computer Arithmetic. doi:10.1109/arith.2013.9
Demmel, J., & Nguyen, H. D. (2015). Parallel Reproducible Summation. IEEE Transactions on Computers, 64(7), 2060-2070. doi:10.1109/tc.2014.2345391
Dongarra, J. J., Du Croz, J., Hammarling, S., & Duff, I. S. (1990). A set of level 3 basic linear algebra subprograms. ACM Transactions on Mathematical Software, 16(1), 1-17. doi:10.1145/77626.79170
Dongarra, J., Hittinger, J., Bell, J., Chacon, L., Falgout, R., Heroux, M., … Wild, S. (2014). Applied Mathematics Research for Exascale Computing. doi:10.2172/1149042
Fousse, L., Hanrot, G., Lefèvre, V., Pélissier, P., & Zimmermann, P. (2007). MPFR. ACM Transactions on Mathematical Software, 33(2), 13. doi:10.1145/1236463.1236468
Haidar, A., Dong, T., Luszczek, P., Tomov, S., & Dongarra, J. (2015). Batched matrix computations on hardware accelerators based on GPUs. The International Journal of High Performance Computing Applications, 29(2), 193-208. doi:10.1177/1094342014567546
Hida, Y., Li, X. S., & Bailey, D. H. (s. f.). Algorithms for quad-double precision floating point arithmetic. Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001. doi:10.1109/arith.2001.930115
Higham, N. J. (2002). Accuracy and Stability of Numerical Algorithms. doi:10.1137/1.9780898718027
Iakymchuk, R., Defour, D., Collange, S., & Graillat, S. (2015). Reproducible Triangular Solvers for High-Performance Computing. 2015 12th International Conference on Information Technology - New Generations. doi:10.1109/itng.2015.63
Iakymchuk, R., Defour, D., Collange, S., & Graillat, S. (2016). Reproducible and Accurate Matrix Multiplication. Lecture Notes in Computer Science, 126-137. doi:10.1007/978-3-319-31769-4_11
Kulisch, U., & Snyder, V. (2010). The exact dot product as basic tool for long interval arithmetic. Computing, 91(3), 307-313. doi:10.1007/s00607-010-0127-7
Li, X. S., Demmel, J. W., Bailey, D. H., Henry, G., Hida, Y., Iskandar, J., … Yoo, D. J. (2002). Design, implementation and testing of extended and mixed precision BLAS. ACM Transactions on Mathematical Software, 28(2), 152-205. doi:10.1145/567806.567808
Muller, J.-M., Brisebarre, N., de Dinechin, F., Jeannerod, C.-P., Lefèvre, V., Melquiond, G., … Torres, S. (2010). Handbook of Floating-Point Arithmetic. doi:10.1007/978-0-8176-4705-6
Ogita, T., Rump, S. M., & Oishi, S. (2005). Accurate Sum and Dot Product. SIAM Journal on Scientific Computing, 26(6), 1955-1988. doi:10.1137/030601818
Ortega, J. . (1988). The ijk forms of factorization methods I. Vector computers. Parallel Computing, 7(2), 135-147. doi:10.1016/0167-8191(88)90035-x
Rump, S. M. (2009). Ultimately Fast Accurate Summation. SIAM Journal on Scientific Computing, 31(5), 3466-3502. doi:10.1137/080738490
Skeel, R. D. (1979). Scaling for Numerical Stability in Gaussian Elimination. Journal of the ACM, 26(3), 494-526. doi:10.1145/322139.322148
Zhu, Y.-K., & Hayes, W. B. (2010). Algorithm 908. ACM Transactions on Mathematical Software, 37(3), 1-13. doi:10.1145/1824801.1824815
[-]