Mostrar el registro sencillo del ítem
dc.contributor.author | Jozsa, Csaba M | es_ES |
dc.contributor.author | Domene Oltra, Fernando | es_ES |
dc.contributor.author | Vidal Maciá, Antonio Manuel | es_ES |
dc.contributor.author | Piñero Sipán, María Gemma | es_ES |
dc.contributor.author | González Salvador, Alberto | es_ES |
dc.date.accessioned | 2015-04-28T07:56:54Z | |
dc.date.available | 2015-04-28T07:56:54Z | |
dc.date.issued | 2014-11 | |
dc.identifier.issn | 0920-8542 | |
dc.identifier.uri | http://hdl.handle.net/10251/49342 | |
dc.description | The final publication is available at Springer via http://dx.doi.org/10.1007/s11227-014-1201-2 | es_ES |
dc.description.abstract | The lattice reduction (LR) technique has become very important in many engineering fields. However, its high complexity makes difficult its use in real-time applications, especially in applications that deal with large matrices. As a solution, the modified block LLL (MB-LLL) algorithm was introduced, where several levels of parallelism were exploited: (a) fine-grained parallelism was achieved through the cost-reduced all-swap LLL (CR-AS-LLL) algorithm introduced together with the MB-LLL by Jzsa et al. (Proceedings of the tenth international symposium on wireless communication systems, 2013) and (b) coarse-grained parallelism was achieved by applying the block-reduction concept presented by Wetzel (Algorithmic number theory. Springer, New York, pp 323-337, 1998). In this paper, we present the cost-reduced MB-LLL (CR-MB-LLL) algorithm, which allows to significantly reduce the computational complexity of the MB-LLL by allowing the relaxation of the first LLL condition while executing the LR of submatrices, resulting in the delay of the Gram-Schmidt coefficients update and by using less costly procedures during the boundary checks. The effects of complexity reduction and implementation details are analyzed and discussed for several architectures. A mapping of the CR-MB-LLL on a heterogeneous platform is proposed and it is compared with implementations running on a dynamic parallelism enabled GPU and a multi-core CPU. The mapping on the architecture proposed allows a dynamic scheduling of kernels where the overhead introduced is hidden by the use of several CUDA streams. Results show that the execution time of the CR-MB-LLL algorithm on the heterogeneous platform outperforms the multi-core CPU and it is more efficient than the CR-AS-LLL algorithm in case of large matrices. | es_ES |
dc.description.sponsorship | Financial support for this study was provided by grants TAMOP-4.2.1./B-11/2/KMR-2011-0002, TAMOP-4.2.2/B-10/1-2010-0014 from the Pazmany Peter Catholic University, European Union ERDF, Spanish Government through TEC2012-38142-C04-01 project and Generalitat Valenciana through PROMETEO/2009/013 project. | en_EN |
dc.language | Inglés | es_ES |
dc.publisher | Springer Verlag (Germany) | es_ES |
dc.relation.ispartof | Journal of Supercomputing | es_ES |
dc.rights | Reserva de todos los derechos | es_ES |
dc.subject | Lattice reduction | es_ES |
dc.subject | LLL | es_ES |
dc.subject | GPU | es_ES |
dc.subject | CUDA | es_ES |
dc.subject | OpenMP | es_ES |
dc.subject.classification | CIENCIAS DE LA COMPUTACION E INTELIGENCIA ARTIFICIAL | es_ES |
dc.subject.classification | TEORIA DE LA SEÑAL Y COMUNICACIONES | es_ES |
dc.title | High performance lattice reduction on heterogeneous computing platform | es_ES |
dc.type | Artículo | es_ES |
dc.identifier.doi | 10.1007/s11227-014-1201-2 | |
dc.relation.projectID | info:eu-repo/grantAgreement/PPCU//TAMOP-4.2.1./B-11/2/KMR-2011-0002/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/PPCU//TAMOP-4.2.2/B-10/1-2010-0014/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/AEI//TEC2012-38142-C04-01/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/Generalitat Valenciana//PROMETEO09%2F2009%2F013/ES/Computacion de altas prestaciones sobre arquitecturas actuales en porblemas de procesado múltiple de señal/ | es_ES |
dc.rights.accessRights | Abierto | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Instituto Universitario de Telecomunicación y Aplicaciones Multimedia - Institut Universitari de Telecomunicacions i Aplicacions Multimèdia | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Departamento de Comunicaciones - Departament de Comunicacions | es_ES |
dc.description.bibliographicCitation | Jozsa, CM.; Domene Oltra, F.; Vidal Maciá, AM.; Piñero Sipán, MG.; González Salvador, A. (2014). High performance lattice reduction on heterogeneous computing platform. Journal of Supercomputing. 70(2):772-785. https://doi.org/10.1007/s11227-014-1201-2 | es_ES |
dc.description.accrualMethod | S | es_ES |
dc.relation.publisherversion | http://dx.doi.org/10.1007/s11227-014-1201-2 | es_ES |
dc.description.upvformatpinicio | 772 | es_ES |
dc.description.upvformatpfin | 785 | es_ES |
dc.type.version | info:eu-repo/semantics/publishedVersion | es_ES |
dc.description.volume | 70 | es_ES |
dc.description.issue | 2 | es_ES |
dc.relation.senia | 279028 | |
dc.contributor.funder | Generalitat Valenciana | es_ES |
dc.contributor.funder | Agencia Estatal de Investigación | |
dc.contributor.funder | Pázmány Péter Catholic University | |
dc.description.references | Józsa CM, Domene F, Piñero G, González A, Vidal AM (2013) Efficient GPU implementation of lattice-reduction-aided multiuser precoding. In: Proceedings of the tenth international symposium on wireless communication systems (ISWCS 2013) | es_ES |
dc.description.references | Wetzel S (1998) An efficient parallel block-reduction algorithm. In: Buhler JP (ed) Algorithmic number theory. Lecture notes in computer science, vol 1423. Springer, Berlin, Heidelberg, pp 323–337 | es_ES |
dc.description.references | Wubben D, Seethaler D, Jaldén J, Matz G (2011) Lattice reduction. Signal Process Mag IEEE 28(3):70–91 | es_ES |
dc.description.references | Lenstra AK, Lenstra HW, Lovász L (1982) Factoring polynomials with rational coefficients. Math Ann 261(4):515–534 | es_ES |
dc.description.references | Bremner MR (2012) Lattice basis reduction: an introduction to the LLL algorithm and its applications. CRC Press, USA | es_ES |
dc.description.references | Wu D, Eilert J, Liu D (2008) A programmable lattice-reduction aided detector for MIMO-OFDMA. In: 4th IEEE international conference on circuits and systems for communications (ICCSC 2008), pp 293–297 | es_ES |
dc.description.references | Barbero LG, Milliner DL, Ratnarajah T, Barry JR, Cowan C (2009) Rapid prototyping of Clarkson’s lattice reduction for MIMO detection. In: IEEE international conference on communications (ICC’09), pp 1–5 | es_ES |
dc.description.references | Gestner B, Zhang W, Ma X, Anderson D (2011) Lattice reduction for MIMO detection: from theoretical analysis to hardware realization. IEEE Trans Circ Syst I Regul Pap 58(4):813–826 | es_ES |
dc.description.references | Shabany M, Youssef A, Gulak G (2013) High-throughput 0.13- $$\upmu $$ μ m CMOS lattice reduction core supporting 880 Mb/s detection. IEEE Trans Very Large Scale Integr (VLSI) Syst 21(5):848–861 | es_ES |
dc.description.references | Luo Y, Qiao S (2011) A parallel LLL algorithm. In: Proceedings of the fourth international C* conference on computer science and software engineering, pp 93–101 | es_ES |
dc.description.references | Backes W, Wetzel S (2011) Parallel lattice basis reduction—the road to many-core. In: IEEE 13th international conference on high performance computing and communications (HPCC) | es_ES |
dc.description.references | Ahmad U, Amin A, Li M, Pollin S, Van der Perre L, Catthoor F (2011) Scalable block-based parallel lattice reduction algorithm for an SDR baseband processor. In: 2011 IEEE international conference on communications (ICC) | es_ES |
dc.description.references | Villard G (1992) Parallel lattice basis reduction. In: Papers from the international symposium on symbolic and algebraic computation (ISSAC’92). ACM, New York | es_ES |
dc.description.references | Domene F, Józsa CM, Vidal AM, Piñero G, Gonzalez A (2013) Performance analysis of a parallel lattice reduction algorithm on many-core architectures. In: Proceedings of the 13th international conference on computational and mathematical methods in science and engineering | es_ES |
dc.description.references | Gestner B, Zhang W, Ma X, Anderson DV (2008) VLSI implementation of a lattice reduction algorithm for low-complexity equalization. In: 4th IEEE international conference on circuits and systems for communications (ICCSC 2008), pp 643–647 | es_ES |
dc.description.references | Burg A, Seethaler D, Matz G (2007) VLSI implementation of a lattice-reduction algorithm for multi-antenna broadcast precoding. In: IEEE international symposium on circuits and systems (ISCAS 2007), pp 673–676 | es_ES |
dc.description.references | Bruderer L, Studer C, Wenk M, Seethaler D, Burg A (2010) VLSI implementation of a low-complexity LLL lattice reduction algorithm for MIMO detection. In: Proceedings of 2010 IEEE international symposium on circuits and systems (ISCAS) | es_ES |