- -

High performance lattice reduction on heterogeneous computing platform

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

High performance lattice reduction on heterogeneous computing platform

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Jozsa, Csaba M es_ES
dc.contributor.author Domene Oltra, Fernando es_ES
dc.contributor.author Vidal Maciá, Antonio Manuel es_ES
dc.contributor.author Piñero Sipán, María Gemma es_ES
dc.contributor.author González Salvador, Alberto es_ES
dc.date.accessioned 2015-04-28T07:56:54Z
dc.date.available 2015-04-28T07:56:54Z
dc.date.issued 2014-11
dc.identifier.issn 0920-8542
dc.identifier.uri http://hdl.handle.net/10251/49342
dc.description The final publication is available at Springer via http://dx.doi.org/10.1007/s11227-014-1201-2 es_ES
dc.description.abstract The lattice reduction (LR) technique has become very important in many engineering fields. However, its high complexity makes difficult its use in real-time applications, especially in applications that deal with large matrices. As a solution, the modified block LLL (MB-LLL) algorithm was introduced, where several levels of parallelism were exploited: (a) fine-grained parallelism was achieved through the cost-reduced all-swap LLL (CR-AS-LLL) algorithm introduced together with the MB-LLL by Jzsa et al. (Proceedings of the tenth international symposium on wireless communication systems, 2013) and (b) coarse-grained parallelism was achieved by applying the block-reduction concept presented by Wetzel (Algorithmic number theory. Springer, New York, pp 323-337, 1998). In this paper, we present the cost-reduced MB-LLL (CR-MB-LLL) algorithm, which allows to significantly reduce the computational complexity of the MB-LLL by allowing the relaxation of the first LLL condition while executing the LR of submatrices, resulting in the delay of the Gram-Schmidt coefficients update and by using less costly procedures during the boundary checks. The effects of complexity reduction and implementation details are analyzed and discussed for several architectures. A mapping of the CR-MB-LLL on a heterogeneous platform is proposed and it is compared with implementations running on a dynamic parallelism enabled GPU and a multi-core CPU. The mapping on the architecture proposed allows a dynamic scheduling of kernels where the overhead introduced is hidden by the use of several CUDA streams. Results show that the execution time of the CR-MB-LLL algorithm on the heterogeneous platform outperforms the multi-core CPU and it is more efficient than the CR-AS-LLL algorithm in case of large matrices. es_ES
dc.description.sponsorship Financial support for this study was provided by grants TAMOP-4.2.1./B-11/2/KMR-2011-0002, TAMOP-4.2.2/B-10/1-2010-0014 from the Pazmany Peter Catholic University, European Union ERDF, Spanish Government through TEC2012-38142-C04-01 project and Generalitat Valenciana through PROMETEO/2009/013 project. en_EN
dc.language Inglés es_ES
dc.publisher Springer Verlag (Germany) es_ES
dc.relation.ispartof Journal of Supercomputing es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Lattice reduction es_ES
dc.subject LLL es_ES
dc.subject GPU es_ES
dc.subject CUDA es_ES
dc.subject OpenMP es_ES
dc.subject.classification CIENCIAS DE LA COMPUTACION E INTELIGENCIA ARTIFICIAL es_ES
dc.subject.classification TEORIA DE LA SEÑAL Y COMUNICACIONES es_ES
dc.title High performance lattice reduction on heterogeneous computing platform es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1007/s11227-014-1201-2
dc.relation.projectID info:eu-repo/grantAgreement/PPCU//TAMOP-4.2.1./B-11/2/KMR-2011-0002/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/PPCU//TAMOP-4.2.2/B-10/1-2010-0014/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/AEI//TEC2012-38142-C04-01/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/Generalitat Valenciana//PROMETEO09%2F2009%2F013/ES/Computacion de altas prestaciones sobre arquitecturas actuales en porblemas de procesado múltiple de señal/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Instituto Universitario de Telecomunicación y Aplicaciones Multimedia - Institut Universitari de Telecomunicacions i Aplicacions Multimèdia es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Comunicaciones - Departament de Comunicacions es_ES
dc.description.bibliographicCitation Jozsa, CM.; Domene Oltra, F.; Vidal Maciá, AM.; Piñero Sipán, MG.; González Salvador, A. (2014). High performance lattice reduction on heterogeneous computing platform. Journal of Supercomputing. 70(2):772-785. https://doi.org/10.1007/s11227-014-1201-2 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion http://dx.doi.org/10.1007/s11227-014-1201-2 es_ES
dc.description.upvformatpinicio 772 es_ES
dc.description.upvformatpfin 785 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 70 es_ES
dc.description.issue 2 es_ES
dc.relation.senia 279028
dc.contributor.funder Generalitat Valenciana es_ES
dc.contributor.funder Agencia Estatal de Investigación
dc.contributor.funder Pázmány Péter Catholic University
dc.description.references Józsa CM, Domene F, Piñero G, González A, Vidal AM (2013) Efficient GPU implementation of lattice-reduction-aided multiuser precoding. In: Proceedings of the tenth international symposium on wireless communication systems (ISWCS 2013) es_ES
dc.description.references Wetzel S (1998) An efficient parallel block-reduction algorithm. In: Buhler JP (ed) Algorithmic number theory. Lecture notes in computer science, vol 1423. Springer, Berlin, Heidelberg, pp 323–337 es_ES
dc.description.references Wubben D, Seethaler D, Jaldén J, Matz G (2011) Lattice reduction. Signal Process Mag IEEE 28(3):70–91 es_ES
dc.description.references Lenstra AK, Lenstra HW, Lovász L (1982) Factoring polynomials with rational coefficients. Math Ann 261(4):515–534 es_ES
dc.description.references Bremner MR (2012) Lattice basis reduction: an introduction to the LLL algorithm and its applications. CRC Press, USA es_ES
dc.description.references Wu D, Eilert J, Liu D (2008) A programmable lattice-reduction aided detector for MIMO-OFDMA. In: 4th IEEE international conference on circuits and systems for communications (ICCSC 2008), pp 293–297 es_ES
dc.description.references Barbero LG, Milliner DL, Ratnarajah T, Barry JR, Cowan C (2009) Rapid prototyping of Clarkson’s lattice reduction for MIMO detection. In: IEEE international conference on communications (ICC’09), pp 1–5 es_ES
dc.description.references Gestner B, Zhang W, Ma X, Anderson D (2011) Lattice reduction for MIMO detection: from theoretical analysis to hardware realization. IEEE Trans Circ Syst I Regul Pap 58(4):813–826 es_ES
dc.description.references Shabany M, Youssef A, Gulak G (2013) High-throughput 0.13- $$\upmu $$ μ m CMOS lattice reduction core supporting 880 Mb/s detection. IEEE Trans Very Large Scale Integr (VLSI) Syst 21(5):848–861 es_ES
dc.description.references Luo Y, Qiao S (2011) A parallel LLL algorithm. In: Proceedings of the fourth international C* conference on computer science and software engineering, pp 93–101 es_ES
dc.description.references Backes W, Wetzel S (2011) Parallel lattice basis reduction—the road to many-core. In: IEEE 13th international conference on high performance computing and communications (HPCC) es_ES
dc.description.references Ahmad U, Amin A, Li M, Pollin S, Van der Perre L, Catthoor F (2011) Scalable block-based parallel lattice reduction algorithm for an SDR baseband processor. In: 2011 IEEE international conference on communications (ICC) es_ES
dc.description.references Villard G (1992) Parallel lattice basis reduction. In: Papers from the international symposium on symbolic and algebraic computation (ISSAC’92). ACM, New York es_ES
dc.description.references Domene F, Józsa CM, Vidal AM, Piñero G, Gonzalez A (2013) Performance analysis of a parallel lattice reduction algorithm on many-core architectures. In: Proceedings of the 13th international conference on computational and mathematical methods in science and engineering es_ES
dc.description.references Gestner B, Zhang W, Ma X, Anderson DV (2008) VLSI implementation of a lattice reduction algorithm for low-complexity equalization. In: 4th IEEE international conference on circuits and systems for communications (ICCSC 2008), pp 643–647 es_ES
dc.description.references Burg A, Seethaler D, Matz G (2007) VLSI implementation of a lattice-reduction algorithm for multi-antenna broadcast precoding. In: IEEE international symposium on circuits and systems (ISCAS 2007), pp 673–676 es_ES
dc.description.references Bruderer L, Studer C, Wenk M, Seethaler D, Burg A (2010) VLSI implementation of a low-complexity LLL lattice reduction algorithm for MIMO detection. In: Proceedings of 2010 IEEE international symposium on circuits and systems (ISCAS) es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem