Mostrar el registro sencillo del ítem
dc.contributor.author | Ramírez-Betancourth, Cristian![]() |
es_ES |
dc.contributor.author | Castelló, Adrián![]() |
es_ES |
dc.contributor.author | Quintana-Ortí, Enrique S.![]() |
es_ES |
dc.date.accessioned | 2023-07-21T18:04:35Z | |
dc.date.available | 2023-07-21T18:04:35Z | |
dc.date.issued | 2022-11 | es_ES |
dc.identifier.issn | 0920-8542 | es_ES |
dc.identifier.uri | http://hdl.handle.net/10251/195322 | |
dc.description.abstract | [EN] We address the efficient realization of matrix multiplication (gemm), with application in the convolution operator for machine learning, for the RISC-V core present in the GreenWaves GAP8 processor. Our approach leverages BLIS (Basic Linear Algebra Instantiation Software) to develop an implementation that (1) re-organizes the gemm algorithm adapting its micro-kernel to exploit the hardware-supported dot product kernel in the GAP8; (2) explicitly orchestrates the data transfers across the hierarchy of scratchpad memories via DMA (direct memory access); and (3) operates with integer arithmetic. | es_ES |
dc.description.sponsorship | This work was supported by the research project PID2020-113656RB-C22 of MCIN/AEI/10.13039/501100011033. C. Ramirez is a "Santiago Grisolia" fellow supported by Generalitat Valenciana. Adrian Castello is a FJC2019-039222-I fellow supported by MCIN/AEI/10.13039/501100011033. This project has received funding from the European High-Performance Computing Joint Undertaking (JU) under Grant Agreement No. 955558. The JU receives support from the European Union's Horizon 2020 research and innovation program, and Spain, Germany, France, Italy, Poland, Switzerland, Norway. | es_ES |
dc.language | Inglés | es_ES |
dc.publisher | Springer-Verlag | es_ES |
dc.relation.ispartof | The Journal of Supercomputing | es_ES |
dc.rights | Reconocimiento (by) | es_ES |
dc.subject | Matrix multiplication | es_ES |
dc.subject | High performance | es_ES |
dc.subject | RISC-V GAP8 | es_ES |
dc.subject.classification | ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES | es_ES |
dc.title | A BLIS-like matrix multiplication for machine learning in the RISC-V ISA-based GAP8 processor | es_ES |
dc.type | Artículo | es_ES |
dc.identifier.doi | 10.1007/s11227-022-04581-6 | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-113656RB-C22/ES/COMPUTACION Y COMUNICACIONES DE ALTAS PRESTACIONES CONSCIENTES DEL CONSUMO ENERGETICO. APLICACIONES AL APRENDIZAJE PROFUNDO COMPUTACIONAL - UPV/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/AGENCIA ESTATAL DE INVESTIGACION//FJC2019-039222-I//AYUDA JUAN DE LA CIERVA FORMACION-CASTELLO GIMENO, ADRIAN/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/EC/H2020/955558/EU | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/GENERALITAT VALENCIANA//GRISOLIAP%2F2020%2F086//AYUDA SANTIAGO GRISOLIA: COMPUTACION DE ALTAS PRESTACIONES CONSCIENTE DEL CONSUMO PARA REDES NEURONALES PROFUNDAS./ | es_ES |
dc.rights.accessRights | Abierto | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors | es_ES |
dc.description.bibliographicCitation | Ramírez-Betancourth, C.; Castelló, A.; Quintana-Ortí, ES. (2022). A BLIS-like matrix multiplication for machine learning in the RISC-V ISA-based GAP8 processor. The Journal of Supercomputing. 78(16):18051-18060. https://doi.org/10.1007/s11227-022-04581-6 | es_ES |
dc.description.accrualMethod | S | es_ES |
dc.relation.publisherversion | https://doi.org/10.1007/s11227-022-04581-6 | es_ES |
dc.description.upvformatpinicio | 18051 | es_ES |
dc.description.upvformatpfin | 18060 | es_ES |
dc.type.version | info:eu-repo/semantics/publishedVersion | es_ES |
dc.description.volume | 78 | es_ES |
dc.description.issue | 16 | es_ES |
dc.relation.pasarela | S\468501 | es_ES |
dc.contributor.funder | European Commission | es_ES |
dc.contributor.funder | GENERALITAT VALENCIANA | es_ES |
dc.contributor.funder | AGENCIA ESTATAL DE INVESTIGACION | es_ES |
dc.description.references | Hazelwood K et al (2018) Applied machine learning at Facebook: a datacenter infrastructure perspective. In: IEEE International Symposium on High Performance Computer Architecture, pp 620–629 | es_ES |
dc.description.references | Park J et al (2018) Deep learning inference in Facebook data centers: characterization, performance optimizations and hardware implications. arXiv:1811.09886 | es_ES |
dc.description.references | Wu C et al (2019) Machine learning at Facebook: understanding inference at the edge. In: International Symposium on High Performance Computer Architecture, pp 331–344 | es_ES |
dc.description.references | Yi S, Li C, Li Q (2015) A survey of fog computing: concepts, applications and issues. In: Proceedings of the 2015 Workshop on Mobile Big Data, ser. Mobidata’15, pp 37–42 | es_ES |
dc.description.references | Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS), vol 1, pp 1097–1105 | es_ES |
dc.description.references | Pouyanfar S et al (2018) A survey on deep learning: algorithms, techniques, and applications. ACM Comput Surv 51(5):92:1-92:36 | es_ES |
dc.description.references | Sze V et al (2017) Efficient processing of deep neural networks: a tutorial and survey. Proc IEEE 105(12):2295–2329 | es_ES |
dc.description.references | Chellapilla K, Puri S, Simard P (2006) High performance convolutional neural networks for document processing. In: Tenth International Workshop on Frontiers in Handwriting Recognition | es_ES |
dc.description.references | Georganas E et al (2018) Anatomy of high-performance deep learning convolutions on SIMD architectures. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, ser. SC ’18. IEEE Press | es_ES |
dc.description.references | San Juan P et al (2020) High performance and portable convolution operators for multicore processors. In: Proceedings of the 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp 91–98 | es_ES |
dc.description.references | Van Zee FG, van de Geijn RA (2015) BLIS: a framework for rapidly instantiating BLAS functionality. ACM Trans Math Softw 41(3):14:1-14:33 | es_ES |
dc.description.references | Flamand E et al (2018) GAP-8: A RISC-V SoC for AI at the edge of the IoT. In: IEEE 29th Interantional Conference on Application-Specific Systems, Architectures and Processors, pp 1–4 | es_ES |
dc.description.references | Ali M et al (2012) Level-3 blas on the ti c6678 multi-core dsp. In: 2012 IEEE 24th International Sympsoium on Computer Architecture and High Performance Computing | es_ES |
dc.description.references | Lavin A, Gray S (2016) Fast algorithms for convolutional neural networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4013–4021 | es_ES |
dc.description.references | Zlateski A, Jia Z, Li K, Durand F (2019) The anatomy of efficient FFT and Winograd convolutions on modern CPUs. In: Proceedings of the ACM International Conference on Supercomputing, ser. ICS ’19, pp 414–424 | es_ES |
dc.description.references | Low TM et al (2016) Analytical modeling is enough for high-performance BLIS. ACM Trans Math Softw 43(2) | es_ES |
dc.description.references | Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 | es_ES |
dc.description.references | Gunnels JA, Gustavson FG, Henry GM, van de Geijn RA (2004) A family of high-performance matrix multiplication algorithms. In: Proceedings of the 7th International Conference on Applied Parallel Computing: State of the Art in Scientific Computing, ser. PARA’04. Berlin, Heidelberg, pp 256–265. https://doi.org/10.1007/11558958_30 | es_ES |
dc.description.references | Smith TM, van de Geijn RA (2019) The MOMMS family of matrix multiplication algorithms. CoRR, vol. abs/1904.05717. arXiv:1904.05717 | es_ES |
dc.description.references | Castelló A, Igual FD, Quintana-Ortí ES (2022) Anatomy of the BLIS family of algorithms for matrix multiplication. In: 30th Euromicro Workshop on Parallel, Distributed and Networked Processing PDP 2022, to appear | es_ES |
upv.costeAPC | 2345 | es_ES |