In the last decade, one of the most significant technological developments that led to the new broadband wireless generation is the communication via multiple-input multiple-output (MIMO) systems. MIMO technologies have been adopted by many wireless standards such as Long Term Evolution (LTE), Wordlwide interoperability for Microwave Access (WiMAX) and Wireless Local Area Network (WLAN). This is mainly due to their ability to increase the maximum transmission rates, together with the achieved reliability and coverage of current wireless communications, all without the need for additional bandwidth nor transmit power. Nevertheless, the advantages provided by MIMO systems come at the expense of a substantial increase in the cost to deploy multiple antennas and also in the receiver complexity, which has a major impact on the power consumption. Therefore, the design of low-complexity receivers is an important issue which is tackled throughout this thesis. First, the use of MIMO channel matrix preprocessing techniques to either decrease the computational cost of optimal sphere decoders or to improve the performance of suboptimal linear, successive interference cancellation (SIC) or tree-search detectors is investigated. A detailed overview of two widely employed preprocessing techniques, the Lenstra, Lenstra, Lovasz (LLL) lattice-reduction (LR) algorithm and the vertical Bell-Labs layered space-time zero forcing decision feedback equalization (VBLAST ZF-DFE) ordering, is presented. Both the complexity and performance of these methods are evaluated and compared. In addition, a low-complexity implementation of the VBLAST ZF-DFE is proposed and included in the evaluation. Second, a low-complexity tree-search MIMO detector, called the variablebreadth (VB) K-Best detector, is developed. The main idea of this method is to exploit the impact of the channel matrix condition number in data detection in order to decrease the complexity of already proposed detection schemes. In the VB K-Best method, the value of its K parameter is varied depending on the channel matrix condition number. The proposed approach includes a low-complexity condition number estimator stage and a threshold selection method. The results show that the proposed scheme has lower average complexity than a fixed K-Best detector of similar performance. In addition, a second detection scheme is proposed, which employs the idea of condition number thresholding to avoid carrying out a lattice-reduction stage when the channel has already good condition number. This way, a high number of LR calls is avoided while keeping good detection performance. In the third part of this thesis, several contributions which involve the use of LR for MIMO communications are presented. First, the combination of LR with the K-Best algorithm is investigated and alternative implementations that outperform previous proposals are developed. An extended LLL algorithm for LR is proposed to assist the preprocessing part of some lattice-reduction-aided (LRA) K-Best schemes. Finally, this extended LLL algorithm is exploited to decrease the computational cost of several LRA precoding methods. In addition, the most employed signal precoding approaches are evaluated and compared in terms of both computational cost and performance. Next, the problem of efficient soft detection in MIMO bit-interleaved coded-modulation (MIMO-BICM) systems is addressed. An efficient fixed-complexity demodulator for systems working with quantized reliability information is proposed. This approach reduces the complexity of previously proposed schemes through the combination of two strategies: a novel tree pruning based on quantization and a clipping-based pruning. Results after quantization reveal that a significant complexity reduction is achieved with negligible performance degradation. The last part of the thesis is devoted to the use of Graphic Processing Units (GPU) for the efficient implementation of MIMO receivers. Both a hard-output and a soft-output version of the fixed-complexity sphere decoder are implemented in GPU. Results show that the proposed implementations decrease the computational time required for the data detection stage in MIMO systems considerably, with respect to conventional CPU implementation. Moreover, a fully-parallel soft-output scheme with a GPU-aware preprocessing stage is developed. Again the execution time of the proposed GPU implementation is compared with its execution time on a high performance CPU, showing that the GPU outperforms the CPU. Furthermore, the throughputs of all the algorithms are shown to be higher than those of other recent implementations while ensuring nearly-optimal detection performance.