Document downloaded from:

http://hdl.handle.net/10251/189066

This paper must be cited as:

Kumar, V.; Mukherjee, M.; Lloret, J. (2020). A Hardware-Efficient and Reconfigurable UFMC Transmitter Architecture With its FPGA Prototype. IEEE Embedded Systems Letters. 12(4):109-112. https://doi.org/10.1109/LES.2019.2961850



The final publication is available at https://doi.org/10.1109/LES.2019.2961850

Copyright Institute of Electrical and Electronics Engineers

Additional Information

# A Hardware-efficient and Reconfigurable UFMC Transmitter Architecture with its FPGA Prototype

Vikas Kumar Member, IEEE, Mithun Mukherjee, Member, IEEE, and Jaime Lloret Senior Member, IEEE

Abstract-Universal-filtered multi-carrier (UFMC) is one of the potential candidates for 5G multicarrier waveforms due to its several attractive features such as suppressed out-ofband radiation to the nearby sub-band. However, the hardware realization of UFMC systems is limited by a large number of arithmetic units for inverse fast Fourier transform (IFFT) and pulse shaping filters. In this letter, we propose an architecture that presents a refreshing approach towards designing a lowcomplexity architecture for the baseband UFMC transmitter with Dolph-Chebyshev filter. Compared to the ROM-based state-ofthe-art, the proposed architecture requires less number of ROM locations and has the flexibility to externally select the inverse discrete Fourier transform (IDFT)-size, number of sub-bands, and number of subcarriers in a sub-band. Moreover, we implement the proposed architecture on commercially available Virtex-5 field-programmable gate array (FPGA) device for testing and analyzing the baseband UFMC signal. Finally, the XILINX postroute results are found comparable with MATLAB simulations.

### I. INTRODUCTION

Recently, the fifth generation (5G) wireless communication is rapidly approaching towards its deployment stage with several field trials. Universal-filtered multi-carrier (UFMC) [1], [2] is an attractive choice to support the diverse requirements in 5G wireless communications with a relaxed time-frequency alignment [3] due to its low out-of-band spectrum leakage as well as short filter length for a series of successive subcarrier filtering. However, similar to the other multicarrier systems, the complex arithmetic units for the inverse discrete Fourier transform/inverse fast Fourier transform (IDFT/IFFT) as well as pulse shaping filter are the primary hardware consuming requirement in UFMC systems [4]-[7]. An end-to-end hardware platform for multicarrier waveform intended for 5G system was demonstrated in [8]. Moreover, a reduced hardware complexity-based transmitter architecture was proposed in [4] where the IFFT size was reduced to 64-point instead of commonly used 1024-point. Moreover, in [6], the 64-point IFFT block was used and then the outputs were upsampled by zero padding the remaining points to reach 1024-point IFFT. Most recently, a hardware-efficient architecture is suggested in [9] using radix-2 decimation in time-based IFFT block. To meet the timing requirements of 10-MHz LTE channelization, a field-programmable gate array (FPGA) implementation of the UFMC transmitter was discussed in [10]. However, the ROMbased architecture suggested in [9], [10] where sine/cosine



Fig. 1. UFMC baseband transmitter model.

terms of the twiddle factors and filter sample points are stored, lacks in flexibility to change the number of subcarriers in a sub-band, number of sub-bands and IDFT-size used in baseband UFMC signal generation.

*Motivation:* To support multi-service provisioning [3] in 5G, the air interface must have a flexibility to select the number of subcarriers and desired filter-length in a sub-band for the UFMC systems without any significant change in hardware resources. The previous architecture, e.g., Read-Only-Memory (ROM)-based approach [9] does not provide the above flexibility. Thus, there is a increasing motivation to design a hardware-efficient *reconfigurable* architecture for UFMC transmitter even at the baseband level.

Main contributions of this paper are summarized as follows:

- We aim to design a reconfigurable architecture for the baseband UFMC transmitter. The proposed UFMC architecture can be used for variable length of IDFT size up to 2<sup>(D-1)</sup>, where D is word size, by an external input selection line. Besides, the number of sub-band and number of subcarriers in each sub-band can be changed using the external selection lines.
- From the hardware point of view, the COordinate-Rotation-DIgital-Computer (CORDIC) [11] algorithm, one of the well-known algorithms to determine In-phase (I) and quadrature (Q) values of any angle, has been used for all trigonometrical computation in the proposed UFMC architecture.
- We further prototype the proposed architecture on the commercially available Virtex-5 FPGA device for analyzing the baseband UFMC signal.

# II. SYSTEM MODEL: UFMC BASEBAND TRANSMITTER

As illustrated in Fig. 1, we consider an uplink baseband UFMC transmitter with total N number of subcarriers. Let  $B_k$  be the total number of sub-bands for the kth user. We consider that each lth sub-band contains  $N_l$  subcarrier. Let  $\mathbf{b}_k^l = \left[ b_k^l(0), b_k^l(1), \dots, b_k^l(N_l - 1) \right]^\top \in \mathbb{C}^{N_l \times 1}$  be the data for the kth user in the lth sub-band with  $\mathbb{E} \left[ \mathbf{b}_k^l \left( \mathbf{b}_k^l \right)^{\dagger} \right] = \mathbf{I}_{N_l}$  and  $\mathbb{E} \left[ \mathbf{b}_k^l(\rho) \left( \mathbf{b}_k^l(\rho') \right)^{\dagger} \right] = \mathbf{0}_{N_l}, \forall \rho \neq \rho'$ , where  $(\cdot)^\top, (\cdot)^{\dagger}$ 

V. Kumar is with the Bharat Sanchar Nigam Limited, Patna 800001, India (e-mail: vikas.kumar@bsnl.co.in).

M. Mukherjee is with the Guangdong Provincial Key Laboratory of Petrochemical Equipment Fault Diagnosis, Guangdong University of Petrochemical Technology, Maoming 525000, China (e-mail: m.mukherjee@ieee.org).

J. Lloret is with the Universitat Politecnica de Valencia, 46022 Valencia, Spain (e-mail: jlloret@dcom.upv.es)



Fig. 2. Proposed IDFT and filtering unit for UFMC baseband symbol generator.

denote the transpose and conjugate transpose, respectively;  $\mathbb{E}[\cdot]$  represents the mathematical expectation;  $\mathbf{I}_N$  and  $\mathbf{0}_N$  are the  $N \times N$  identity and zero matrix, respectively. Dolph-Chebyshev filter [1], [12] is selected as a pulse shaping filter to suppress the side-lobe power to the adjacent sub-bands. The baseband UFMC transmitted signal for the *k*th user in the *l*th sub-band is  $\mathbf{s}_k^l = (\mathbf{W}_k^l)^\top \otimes (\mathbf{V}^l)^\dagger \mathbf{b}_k^l \in \mathbb{C}^{N \times 1}$ , where  $\otimes$  denotes the Hadamard product,  $\mathbf{W}_k^l \in \mathbb{C}^{N_l \times N}$  and  $\mathbf{V}^l \in \mathbb{C}^{N_l \times N}$  are the filter matrix for the *k*th user in the *l*th sub-band and Fourier matrix used in the *l*th sub-band, respectively.

CORDIC Algorithm: For the large number of computations in Fourier coefficient  $\mathbf{V}^l$  and filter coefficient  $\mathbf{W}^l_k$ , we use the well-known CORDIC algorithm [11], [13] that requires only shift and add operations during the iterative vector rotation algorithm implementation. The CORDIC algorithm is carried out by an iterative micro-rotation (called as prefixed angle  $\alpha_i$ ) stages and evaluated by only add and shift operation. The basic equation of trigonometric function computation for microrotation stages are presented as follows:  $x_{i+1} = \cos \alpha_i (x_i - 1)$  $s_i y_i \tan \alpha_i, y_{i+1} = \cos \alpha_i (s_i x_i \tan \alpha_i + y_i), z_{i+1} = z_i - z_i$  $s_i \alpha_i$ , where  $(x_{i+1}, y_{i+1})$  is resulting vector when a vector  $(x_i, y_i)$  is rotated through an angle  $\alpha_i = \tan^{-1}(2^{-i})$ ,  $s_i \in (+1, -1)$  represents the sign bit and equals to sign bit of  $z_i$ , *i* denotes the iteration stages varied from 0 to (m-1), where m is the integer equal to the bit-precision or the number of micro-rotations. In general, the factor  $\cos \alpha_i$ is neglected during CORDIC iteration stages implementation. Factor  $\mu = \prod_{i=0}^{i=(m-1)} \cos \alpha_i \approx 0.6073$  is compensated by the compensated CORDIC unit [13].

## III. PROPOSED RECONFIGURABLE UFMC BASEBAND TRANSMITTER ARCHITECTURE

The proposed transmitter architecture has two main units as: a) IDFT and filtering unit and b) angle generator unit.

### A. IDFT and Filtering Unit

As illustrated in Fig. 2, the IDFT unit combined with filtering unit has an quadrature amplitude modulation (QAM) mapper, three compensated-CORDIC [13] blocks (namely, CORDIC0, CORDIC1, and CORDIC2) and a clock generator unit. The QAM mapper has user information bits (represented

as user data) as input. The outputs of QAM mapper are in polar-form of the mapped user\_data, i.e., amplitude and argument. The mapped user data is further sent to the CORDIC2 unit as shown in Fig. 2. The CORDIC2 unit computes the corresponding IDFT sample points as well as modulates the subcarriers with mapped-symbol generated by the QAM mapper. The sample point of a QAM modulated quadrature and in-phase (Re-real parts and Im- imaginary part, shown in Fig. 2) component of a sub-band is stored in registers REG3 and REG4, respectively, with help of multiplexures MUX1, MUX2, MUX3 and MUX4, registers REG1 and **REG2**, selection line SEL1 and clock RD1. Further, during filtering, instead of storing windowed time sample w(n), we store  $\beta(n) = \arccos(w(n) \mu)$  in *L*-size ROM, where *L* is filter length. This allows to use the same CORDIC0 and CORDIC1 units for the multiplication of IDFT sample points, filter coefficient, and compensation for scale factor  $\mu$ , in addition to shift filtered UFMC symbol in frequency domain using z4shift, avoiding use of direct multipliers.

Afterward, we perform the convolution array operation where the convoluted data are stored in (N + L - 1)-size RAM. Each cell of storage elements either RAM or ROM used in proposed architecture has a size equals to word size D. For convolution operation, the in-phase and quadrature component of filtered sample points, i.e outputs of CORDICO and CORDIC1 units, are stored in two  $(L \times L)$ -size RAM. During convolution, the filter coefficients  $\{w(0), w(1), \ldots\}$  are multiplied by each sample point of the sub-band. Afterward, convoluted samples are stored in next stage (N+L-1)size RAMs<sup>1</sup>. After completion of computation of UFMC symbol, clk4dac helps to convert these (N+L-1) data to analog I&Q-channel outputs with help of DAC (digital to analog converter). Moreover, clock generator unit has input as Master clock that acts as primary clock for the proposed architecture and has outputs as clk4band and clk4dac.

### B. Angle generator unit

The angle generator unit has four down counters, one hardwired shifter [13], and several control signals, as shown in Fig. 3(a). The downcounter1 has the number of subcarriers  $N_l$  as an input that maintains the computation of a sample point of all subcarriers in a sub-band. This down counter has RD1 and SEL1 as outputs. The filter-length L is input to the downcounter2. This down counter keeps track of the multiplication of a sample point with the filter coefficient in a given sub-band. The output of this unit is RD2. The IDFTlength N and clock signal RD3 are the input and output, respectively, of the down counter3. This counter maintains the computation of all the sample points in a single sub-band require for the IDFT-length, i.e., N. The number of sub-bands, i.e.,  $B_k$  that is required for the computation of all the sample points of all the sub-bands is input to the downcounter4. The control signal NXT FR is the output from this down counter. Furthermore, a hard-wired shifter is used that has input as

<sup>&</sup>lt;sup>1</sup>Higher throughput can be achieved by using the state-of-the-art pipelined and high-radix CORDIC-based FFT architecture proposed in [14] for IDFT and filtering unit, also high speed convolution operation can be carried out by using reconfigurable instruction-based multi-core parallel convolution application [15], with the expense of additional hardware resources.

3



Fig. 3. Proposed (a) angle generator unit for UFMC baseband transmitter (b) argument generator for angle increment for subcarrier sample point and (c) argument generator for frequency domain shift to sub-band.

IDFT size N and output as  $2\pi/N$  as in [13]. The *ang\_incr* for row element unit generates the argument equals to the angle increment between two sample points of a subcarrier. Detailed architecture of this unit is shown in Fig. 3(b). As illustrated in Fig. 3(c), the argument z4shift is generated by *ang\_incr* for frequency shift to shift the filtered sub-band symbols in frequency domain. The rest of the control signals and their functionality are summarized in Table I.

# IV. FPGA PROTOTYPE AND EXPERIMENTAL RESULTS

We prototype the proposed reconfigurable UFMC baseband transmitter architecture using Verilog hardware description language on XILINX platform with 16-bit word size. We evaluate the FPGA prototyping of the proposed architecture with a master clock frequency of 120 MHz using Tektronix arbitrary function generator AFG3252. We use Digilent Pmod-DA2 digital-to-analog converter (DAC), a 12-bit DAC powered by the Texas Instruments DAC121S101, for analog conversion of the baseband UFMC digital signal for both I- and Qchannel simultaneously. From Fig. 4, we have observed that the time domain waveform (both I- and Q-channel) simulated in MATLAB corroborates with the waveform generated by XILINX14.2 post route simulation data. Corresponding worstcase absolute error in I- and Q-channel amplitude as compared to MATLAB for this experiment are found  $0.8415 \times 10^{-3}$ and  $0.5808 \times 10^{-3}$ . The power spectral density (PSD) of the baseband UFMC signal from the post route simulations data is shown in Fig. 5. We obtain the following figure-of-merit (FOM) parameters as: 23.053 dBm main-lobe power, 697.749 kHz occupied bandwidth, and 1.105 dBm maximum sidelobe power for the spectrum of the baseband signal measured by Tektronix 3303B. Our proposed architecture (with 16-bit word size) uses only 16×16-bit ROM for CORDIC block, 64×16-bit ROM for filter coefficients, 64×64 array of 16-bit word-size RAM and a 319 of 16-bit word-size RAM for convolution operation with a maximum throughput of 15 Mbps. The device utilization report is summarized in Table II.

The main insights are discussed as follows:

• Compared to the state-of-the-art ROM-based architecture [9], [10] where the direct multiplier and large storage elements are used for the IDFT operation ( $4 \times 5120$  direct multipliers with IDFT size N = 1024) and for

the frequency domain filter coefficient shifting  $(2 \times 86 \text{ ROM location})$ , the proposed architecture requires only 16 ROM locations to store the micro-rotation argument while avoiding any direct multipliers. The filtering as well as the frequency domain shifting, are performed online without any previously stored spectrum shifting coefficients. Moreover, the architecture has the flexibility to select IDFT size up to  $2^{(D-1)}$ .

- Without any significant change in device utilization, the proposed architecture can select any number of subbands and sub-band size, obviously subject to IDFT size. This benefits the multi-service provisioning with different subcarriers in a sub-band for UFMC systems.
- By using CORDIC algorithm, the proposed architecture avoids the direct multipliers. Thus, no DSP slices have been used as in [9] for direct multiplication with finite impulse response (FIR) filtering coefficients.
- Finally, argument z4shift from *ang\_incr for frequency shift* unit enables the spectrum shifting for the individual subband of the baseband UFMC signal.

#### V. CONCLUSION

In this letter, we have proposed a reconfigurable hardware architecture for UFMC baseband transmitter and prototyped the architecture on commercially available FPGA. The reconfigurable architecture exhibits flexibility to select the number of sub-bands as well as the subcarriers in each sub-band of the UFMC system resulting multi-service provisioning with different subcarrier in a sub-bands for UFMC systems. Nevertheless, the receiver architecture is also worth investigation, which is left for our future work. Further research is needed to design a real-time and high-speed pipelined architecture combined with high-speed DAC for UFMC systems.

#### REFERENCES

- F. Schaich, T. Wild, and Y. Chen, "Waveform contenders for 5G– suitability for short packet and low latency transmissions," in *Proc. 79th IEEE VTC-Spring*, May 2014, pp. 1–5.
- [2] Z. Zhang, H. Wang, G. Yu, Y. Zhang, and X. Wang, "Universal filtered multi-carrier transmission with adaptive active interference cancellation," *IEEE Trans. Commun.*, vol. 65, no. 6, pp. 2554–2567, June 2017.
- [3] L. Zhang, A. Ijaz, P. Xiao, and R. Tafazolli, "Multi-service system: An enabler of flexible5G air interface," *IEEE Commun. Mag.*, vol. 55, no. 10, pp. 152–159, Oct. 2017.

 TABLE I

 Control signals with their functionality

| Signal   | Source                                                                   | Function                                                                                                                                                                                                                                                                                                                                                                          |  |  |
|----------|--------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| clk4symb | Clock generator                                                          | This clock drives the filtering unit (i.e., CORDIC0 and CORDIC1 block).                                                                                                                                                                                                                                                                                                           |  |  |
| clk4dac  | Clock generator                                                          | This clock signal helps to convert I- & Q-channel digital output signals to analog signal using DAC.                                                                                                                                                                                                                                                                              |  |  |
| clk4band | Clock generator                                                          | This clock operates IDFT unit (in CORDIC2) to generate modulated-OFDM symbol.                                                                                                                                                                                                                                                                                                     |  |  |
| EOC2     | NOR-ing the output of<br>internal down counter of<br>compensated CORDIC2 | EOC stands for the end of computation. If $EOC2$ high-state (i.e., 1), then it sends new value of argument $Z_{in}$ corresponding to the subcarriers sample point to CORDIC2 unit. It updates the registers <i>REG1</i> and <i>REG2</i> with new cumulative value of sin and cos. It works as the clock for down counter1.                                                        |  |  |
| EOC0     | NOR-ing the output of<br>internal down counter of<br>compensated CORDIC0 | If high-state (i.e., 1), then it sends new value of argument z4shift to CORDIC0 and CORDIC1 block. It also<br>shows completion of multiplication of a sample point of UFMC sub-band with one filter coefficient and result is<br>stored in one of the cell of $L \times L$ RAM row-wise. It works as the clock for Down counter2.                                                 |  |  |
| RD1      | OR-ing the output of the<br>Down Counter1                                | It is the clock signal for the registers REG3, REG4 and Down Counter3. If high-state (i.e., 1), this signal updates the content of registers REG3 and REG4 with content of registers REG1 and REG2 that contain the cumulative value of a sample of all subcarriers in a sub band. If low-state (i.e., 0), it shows the computation of a sample point of sub-band is in-progress. |  |  |
| SEL1     | NOR-ing the output of the<br>Down Counter1                               | The low-state (i.e., 0), it shows a sample point of a sub-band is being computed. The high-state (i.e., 1), it clears the registers REG1 and REG2 to zero with the help of multiplexers MUX1, MUX2, MUX3, and MUX4 for the accumulation of next sample point of the sub-band.                                                                                                     |  |  |
| RD2      | OR-ing the output of the<br>Down Counter2                                | If high-state (i.e., 1), it shows that OFDM sub-band sample point is being multiplied with the L number of filter coefficients. If low-state (i.e., 0) one row-wise 'write' operation in $L \times L$ -size RAM is complete and one convoluted data is stored in one of the cells of $(N + L - 1)$ -size RAM.                                                                     |  |  |
| RD3      | OR-ing the output of the<br>Down Counter3                                | It acts as clock signal for down counter4. If high-state (i.e., 1), it shows that OFDM symbol is computed for a sub-band. If low-state (i.e., 0), it shows completion of one sub-band computation.                                                                                                                                                                                |  |  |
| SEL2     | NOR-ing the output of the<br>Down Counter3                               | If high-state (i.e., 1), it shows that OFDM symbol for each sub-band is being computed. If low-state (i.e., 0), it shows completion of one sub-band computation. It also sets the offset value for the next sub-band and accordingly angle generator unit generates the arguments for subcarriers i.e Z_in and for the frequency shift to filter i.e z4shift.                     |  |  |
| NXT_FR   | OR-ing the output of<br>Down Counter4                                    | High-state of this signal shows computation of UFMC symbols is in-process, if low-state (i.e, 0), then it shows that the UFMC symbol of current set of user_data is complete. It also clears the content of all RAM cells to zero.                                                                                                                                                |  |  |



Fig. 4. Absolute error at each sample point of UFMC baseband signal of XILINX post route simulation and MATLAB outputs for (a) I-channel and (b) Q-channel, (c) proposed architecture outputs taken from vertex5, interfaced with DAC TI DAC121S101 and captured through Tektronix MSO2024. IDFT-size N = 256, filter-length L = 64, the sidelobe attenuation level = 60 dB, number of sub-bands  $B_k = 5$ , and  $N_l = 15$  subcarriers.

 TABLE II

 Device Utilization, Target Device: XC5VLX110T-2FF1136

| Logic Utilization    | Used | Available | % of Utilization |
|----------------------|------|-----------|------------------|
| Slice Registers      | 722  | 69,120    | 1                |
| Slice look-up tables | 2119 | 69,120    | 3                |
| LUT-FF pairs         | 640  | 2,201     | 29               |
| Block RAM            | 4    | 148       | 2.7              |



Fig. 5. Spectrum of 16-QAM modulated baseband UFMC transmitted signal.

- [4] T. Wild and F. Schaich, "A reduced complexity transmitter for UF-OFDM," in Proc. IEEE 81st VTC-Spring, May 2015, pp. 1–6.
- [5] M. Saad, M. Alawieh, A. C. A. Ghouwayel, H. Hijazi, and S.-M. Omar, "On the hardware implementation of a reduced complexity UFMC chain," in *Proc. IEEE Int. Conf. Computer and Applications (ICCA)*, Aug. 2018.
- [6] R. Knopp, F. Kaltenberger, C. Vitiello, and M. Luise, "Universal filtered

multicarrier for machine type communications in 5G," in *Proc. Eur. Conf. Netw. Commun. (EUCNC)*, June 2016, pp. 27–30.

- [7] V. Kumar, M. Mukherjee, and J. Lloret, "Reconfigurable architecture of UFMC transmitter for 5G and its FPGA prototype," *IEEE Systems J.*, pp. 1–11, 2019.
- [8] J. Nadal, C. A. Nour, and A. Baghdadi, "Flexible and efficient hardware platform and architectures for waveform design and proof-of-concept in the context of 5G," AEU - Int. J. Electron. Commun., vol. 97, pp. 85–93, Dec. 2018.
- [9] A. R. Jafri, J. Majid, M. A. Shami, M. A. Imran, and M. Najam-Ul-Islam, "Hardware complexity reduction in universal filtered multicarrier transmitter implementation," *IEEE Access*, vol. 5, pp. 13401–13408, Aug. 2017.
- [10] A. R. Jafri, J. Majid, L. Zhang, M. A. Imran, and M. N. ul Islam, "FPGA implementation of UFMC based baseband transmitter: Case study for LTE 10MHz channelization," *Wireless Communications and Mobile Computing*, vol. 2018, pp. 1–12, 2018.
- [11] J. S. Walther, "A unified algorithm for elementary functions," in *Proc.* ACM Spring Joint Computer Conf., May 1971, pp. 379–385.
- [12] M. Mukherjee, L. Shu, V. Kumar, P. Kumar, and R. Matam, "Reduced out-of-band radiation-based filter optimization for UFMC systems in 5G," in *IEEE IWCMC*, Aug 2015, pp. 1150–1155.
  [13] V. Kumar, K. C. Ray, and P. Kumar, "CORDIC-based VLSI architecture
- [13] V. Kumar, K. C. Ray, and P. Kumar, "CORDIC-based VLSI architecture for real time implementation of flat top window," *Microprocessors and Microsystems*, vol. 38, no. 8, Part B, pp. 1063 1071, 2014.
  [14] X. Chen, Y. Lei, Z. Lu, and S. Chen, "A variable-size FFT hardware ac-
- [14] X. Chen, Y. Lei, Z. Lu, and S. Chen, "A variable-size FFT hardware accelerator based on matrix transposition," *IEEE Trans. on VLSI Systems*, vol. 26, no. 10, pp. 1953–1966, Oct. 2018.
  [15] Q. Zhou, L. Yang, and X. Yan, "Reconfigurable instruction-based
- [15] Q. Zhou, L. Yang, and X. Yan, "Reconfigurable instruction-based multicore parallel convolution and its application in real-time template matching," *IEEE Trans. Comput.*, vol. 67, no. 12, pp. 1780–1793, Dec. 2018.