Deng L et al (2013) Recent advances in deep learning for speech research at Microsoft. In: 2013 IEEE international conference on acoustics, speech and signal processing, May, pp 8604–8608
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th international conference on neural information processing systems—vol 1, ser. NIPS’12. Curran Associates Inc., USA, pp 1097–1105
Zhang J, Zong C (2015) Deep neural networks in machine translation: an overview. IEEE Intell Syst 30(5):16–25
[+]
Deng L et al (2013) Recent advances in deep learning for speech research at Microsoft. In: 2013 IEEE international conference on acoustics, speech and signal processing, May, pp 8604–8608
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th international conference on neural information processing systems—vol 1, ser. NIPS’12. Curran Associates Inc., USA, pp 1097–1105
Zhang J, Zong C (2015) Deep neural networks in machine translation: an overview. IEEE Intell Syst 30(5):16–25
Devlin J et al (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of 2019 conference North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol 1, pp 4171–4186
Sze V et al (2017) Efficient processing of deep neural networks: a tutorial and survey. Proc IEEE 105(12):2295–2329
Vaswani A et al (2017) Attention is all you need. In: Advances in neural information processing systems, vol 30, pp 5998–6008
Chellapilla K, Puri S, Simard P (2006) High performance convolutional neural networks for document processing. In: International workshop on frontiers in handwriting recognition, available as INRIA-00112631 report from https://hal.inria.fr/inria-00112631
Van Zee FG, van de Geijn RA (2015) BLIS: a framework for rapidly instantiating BLAS functionality. ACM Trans Math Softw 41(3):14:1–14:33
Dongarra JJ, Du Croz J, Hammarling S, Duff I (1990) A set of level 3 basic linear algebra subprograms. ACM Trans Math Softw 16(1):1–17
Goto K, van de Geijn R (2008) Anatomy of high-performance matrix multiplication. ACM Trans Math Softw 34(3):12:1–12:25
Low TM, Igual FD, Smith TM, Quintana-Orti ES (2016) Analytical modeling is enough for high-performance blis. ACM Trans Math Softw 43(2):1–18. https://doi.org/10.1145/2925987
Fabeiro JF, Andrade D, Fraguela BB (2016) Writing a performance-portable matrix multiplication. Parallel Comput 52:65–77
Zee FGV, Smith TM, Marker B, Low TM, Geijn RAVD, Igual FD, Smelyanskiy M, Zhang X, Kistler M, Austel V, Gunnels JA, Killough L (2016) The BLIS framework: experiments in portability. ACM Trans Math Softw 42(2):1–19. https://doi.org/10.1145/2755561
Smith TM, van de Geijn R, Smelyanskiy M, Hammond JR, Zee FGV (2014) Anatomy of high-performance many-threaded matrix multiplication. In: IPDPS ’14: Proceedings of the international parallel and distributed processing symposium (to appear)
Catalán S et al (2016) Architecture-aware configuration and scheduling of matrix multiplication on asymmetric multicore processors. Cluster Comput 19(3):1037–1051
Hennessy JL, Patterson DA (2003) Computer architecture: a quantitative approach. Morgan Kaufmann Pub, San Francisco
San Juan P, Castelló PS, Dolz MF, Alonso-Jordá P, Quintana-Ortí ES (2020) High performance and portable convolution operators for multicore processors. In: Proceedings of 32nd international Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp 91–98
BLIS Performance benchmarks (2020). https://github.com/flame/blis/blob/master/docs/Performance.md
[-]