Document downloaded from:

http://hdl.handle.net/10251/187689

This paper must be cited as:

Kumar, V.; Mukherjee, M.; Lloret, J. (2020). Reconfigurable Architecture of UFMC Transmitter for 5G and Its FPGA Prototype. IEEE Systems Journal. 14(1):28-38. https://doi.org/10.1109/JSYST.2019.2923549



The final publication is available at https://doi.org/10.1109/JSYST.2019.2923549

Copyright Institute of Electrical and Electronics Engineers

Additional Information

# Reconfigurable Architecture of UFMC Transmitter for 5G and its FPGA Prototype

Vikas Kumar Member, IEEE, Mithun Mukherjee, Member, IEEE, and Jaime Lloret Senior Member, IEEE

Abstract—Universal-filtered multi-carrier (UFMC) system that is a generalization of filtered orthogonal frequency division multiplexing (OFDM) and filter bank-based multicarrier (FBMC) is being considered as a potential candidate for 5G due to its robustness against inter-carrier interference as in cyclic-prefixbased OFDM systems. However, real-time hardware realization of the multicarrier systems is limited by a large number of arithmetic units for inverse fast Fourier transform (IFFT) and pulse shaping filters. In this paper, we aim to propose a low-complexity and reconfigurable architecture for baseband UFMC transmitter. To the best of our knowledge, the proposed architecture is the first reconfigurable architecture that has the flexibility to choose the number of subcarriers in a subband without any change in hardware resources. In addition, the proposed architecture selects the filter from a group filters with a single selection line. Moreover, we use a commercially available field-programmable gate array (FPGA) device for real-time testing and analyzing the baseband UFMC signal. From the extensive experiments, we study the occupied bandwidth, main-lobe power, and sidelobe power of the baseband signal with different filters in realtime scenarios. Finally, we measure the quantization error in baseband signal generation for the proposed UFMC transmitter architecture and find comparable with the error-bound.

*Index Terms*—UFMC, Pulse shaping filters, Reconfigurable architectures, Error analysis.

# I. INTRODUCTION

Recently, we are witnessing the deployment stages of fifth generation (5G) wireless communication with several field trials. In fact, the air interface plays a significant role in the diverse types of use-cases in the 5G deployment, where the use-cases are broadly divided into massive machine-type communications (mMTC), evolved mobile broadband (eMBB), and ultra-reliable and low-latency communications (uRLLC) [1]. In one hand, the symbol duration can be large for delay-tolerant mMTC, on the other hand, the latency-sensitive uRLLC requires short symbol duration. In addition, the symbol duration in eMBB is limited by the doubly dispersive channel. Thus, orthogonal/non-orthogonal and synchronous/asynchronous multicarrier waveforms are currently investigated to support the diverse requirements in 5G wireless communications with a relaxed time-frequency alignment [2].

In multi-service provisioning, the multiple services require different number of subbands and different subband filtering over the same air interface. The traditional cyclic-prefix-based OFDM is not suitable for multi-service provisioning, due to its high out-of-band radiation to the nearby sidebands. Thus, filter-bank multi-carrier (FBMC) communication is preferred with a reduced sidelobe suppression, however, the long filterlength limits its applicability for short burst traffic [3]. On the other hand, UFMC [3]–[6] is an attractive choice due to its low out-of-band spectrum leakage as well as short filter length for a series of successive subcarrier filtering. In fact, UFMC retains the benefits of OFDM and leverages the advantages of FBMC with reduced filter-length. Moreover, UFMC reduces the time-frequency misalignment – that makes UFMC as a preferred candidate for 5G multicarrier waveform.

However, the selection of pulse shaping filter [6]–[9] is one of the critical factors in UFMC to suppress the side-lobe power to the adjacent subbands. Dolph-Chebyshev filter [5] was initially adopted as a pulse shaping filter for UFMC systems, however, the side-lobe fall rate is very slow (0 dB/octave in Dolph-Chebyshev filter), resulting in a significant spectrum leakage to the nearby subbands. To relax the selection of pulse shaping filters, an adaptive interference cancellation technique was suggested in [6] at the expense of cancellation subcarriers inserted at the narrow guard bands. Moreover, based on the channel-state-information at the transmitter, Han et al. [8] suggested several waveforms for the UFMC pulse shaping filters. The aforementioned study suggests that the design of selected pulse shaping filter is an challenging issue in UFMC systems due to the flexibility in terms of subcarrier and filterlength selection as well as architecture reconfigurability.

Moreover, the computational complexity [10]–[13] in hardware implementation becomes an important factor in UFMC systems due to the power and resource constraints in the end-devices. In fact, the complex arithmetic units for the inverse discrete Fourier transform/inverse fast Fourier transform (IDFT/IFFT) and pulse shaping filters are the primary hardware consuming requirement in UFMC systems. To this end, a reduced hardware complexity-based transmitter architecture was proposed in [11] where the IFFT size was reduced to 64-point instead of commonly used 1024-point. Moreover, in [13], the 64-point IFFT block was used and then the outputs were upsampled by zero padding the remaining points to reach 1024-point IFFT. In this way, the computational complexity was further reduced in [13] compared to [11].

In recent years, Jarfi et al. [12] suggested a hardwareefficient architecture with the additional flexibility to select IFFT-size, filter length, and parameters for spectrum shifting in UFMC transmitter. In the above architecture, the redundant radix-2 decimation in time (DIT) butterfly operations were avoided, thereafter, a significant reduction on hardware complexity was observed in filtering scheme. It is worthwhile to note that the spectrum shifting operation required only a few memory units, one multiplier and one adder with a 10MHz LTE channelization specification. Taking into account real-time FPGA implementation of UFMC transmitter, they provided a more detailed architecture to meet the timing requirements of 10MHz LTE channelization in [14]. A notable reduction in hardware complexity and processing time to generate the subcarriers was observed in [14] compared to previous work [3]. Although, the aforementioned work [12], [14]

V. Kumar was with Electrical Engineering Department, Indian Institute of Technology, Patna, India, 801103, now he is with Bharat Sanchar Nigam Limited, Patna, Bihar, India, 800001, e-mail: vikas.kumar@bsnl.co.in

M. Mukherjee is with the Guangdong Provincial Key Laboratory of Petrochemical Equipment Fault Diagnosis, Guangdong University of Petrochemical Technology, Maoming 525000, China, e-mail: m.mukherjee@ieee.org

J. Lloret is with Universitat Politecnica de Valencia, Spain, e-mail: jlloret@dcom.upv.es

laid a strong foundation of real-time FPGA implementation for UFMC transmitter, due to the Read-Only-Memory (ROM)based approaches [12], [14] that directly *store* the sine/cosine values for IDFT/IFFT unit and pulse shaping filter sample points, the design methodologies for changing the filter-type and filter-length in the UFMC transmitter remain unknown.

# A. Motivation

We observe that Dolph-Chebyshev filter that was adopted as a pulse shaping filter in [5] has a prolonged side-lobe fall rate, i.e., 0 dB/octave. From the signal processing literature [15], [16], we observe that Hanning filter has very sharp side-lobe fall rate (-18 dB/octave), however, this filter has high highest side-lobe power (-32 dB). On the other hand, Hamming filter has lower highest side-lobe power (-43 dB), however, the side-lobe fall rate is about -6 dB/octave. Although Blackman filter has similar side-lobe fall rate as Hanning filter, -3 dBmain-lobe bandwidth is narrower than the Flat Top filter. On the other hands, Blackman-Harris filter has low highest sidelobe power, however, the side-lobe fall-rate is slower than Blackman filter. Therefore, it is worthwhile to investigate the Figure-Of-Merits (FOM) such as -3dB main-lobe power, highest side-lobe power, and side-lobe roll-off factor with these above-mentioned filters for the UFMC systems in realtime scenarios.

Furthermore, UFMC systems require a large number of computational operations for the IDFT/IFFT unit and the pulse shaping filters even at the baseband signal processing. The most recent architecture in [14] focused to reduce the computational complexity with the use of data- and processlevel pipelining to obtain highest operational frequency in UFMC transmitter. However, the sine/cosine terms of twiddle factors and filter sample points are stored in the ROM resulting in limited flexibility to select the number of subcarriers in a subband and the selection of variable length pulse shaping filter for each subbands. Thus, a design of a reconfigurable, real-time, and hardware-efficient architecture for the baseband transmitter associated with the required pulse shaping filter is one of the critical challenges in the UFMC systems for multi-service provisioning in 5G. Nevertheless, studies on the quantitative evaluation of the quantization error of a reconfigurable UFMC architecture are still lacking.

## B. Our Contribution

The main contributions of this paper include:

- We aim to design a reconfigurable architecture for the baseband UFMC transmitter. We prototype the proposed architecture on the commercially available Field-Programmable Gate Array (FPGA) device for real-time testing. We further analyze the baseband UFMC signal using Digital-to-Analog-Converter (DAC) and obtain the FOMs for the UFMC baseband transmitter with different pulse shaping filters.
- The proposed architecture has the flexibility to select the pulse shaping filters from a group of widely used filters based on the FOMs such as -3dB main-lobe power, highest side-lobe power, and side-lobe roll-off factor.
- Moreover, in the proposed reconfigurable architecture, the maximum number of available subcarriers can be extended to  $2^{(d-1)}$ , where d is the data-length of the architecture<sup>1</sup>. It is important to note that the number

 $^1\mathrm{As}$  an example, for a 16-bit data-length architecture, the number of subcarriers can take any value up to  $2^{15}=32768.$ 



Fig. 1. System model of uplink baseband UFMC transmitter for the kth user.

of subcarrier in a subband can take a value below the maximum limit, i.e.,  $2^{(d-1)}$ , without any change in hardware resources of the proposed architecture. This allows additional flexibility to assign the number of subcarriers in a subband for the UFMC transmitter.

• We further calculate the quantization error for the proposed baseband UFMC transmitter architecture. We consider the approximation and the truncation error due to the fixed-size representation of data-path width for the baseband UFMC signal generation.

The rest of the paper is organized as follows. Section II presents the UFMC transmitter model and well-known pulse shaping filters. We propose the reconfigurable baseband UFMC architecture in Section III. The FPGA prototyping and real-time experimental results are presented in Section IV. Finally, conclusions are drawn in Section VI.

# **II. SYSTEM MODEL**

### A. UFMC Transmitter

As illustrated in Fig. 1, we consider a uplink baseband UFMC transmitter with total N number of available subcarriers. We assume that these N subcarriers are divided into B subbands. Let  $B_k$  be the total number of subbands for the kth user. Each lth subband contains  $N_l$  subcarrier. For the subband-basis filtering, a sequence of  $N_l$  complex symbols is converted into a block of  $N_l$  parallel symbols, duration of each block is  $T_d$ . Let  $\mathbf{b}_k^l = [b_k^l(0), b_k^l(1), \dots, b_k^l(N_l - 1)]^\top \in \mathbb{C}^{N_l \times 1}$  be the data for the kth user in the lth subband with  $\mathbb{E}[\mathbf{b}_k^l(\mathbf{b}_k^l)^{\dagger}] = \mathbf{I}_{N_l}$  and  $\mathbb{E}[\mathbf{b}_k^l(\rho)(\mathbf{b}_k^l(\rho'))^{\dagger}] = \mathbf{0}_{N_l}, \forall \rho \neq \rho'$ , where  $(\cdot)^{\top}$ ,  $(\cdot)^{\dagger}$  denote the transpose and conjugate transpose, respectively;  $\mathbb{E}[\cdot]$  represents the mathematical expectation;  $\mathbf{I}_N$  and  $\mathbf{0}_N$  are the  $N \times N$  identity and zero matrix, respectively. Finally, the baseband UFMC signal at the transmitter can be expressed as

$$s_k(t) = \sum_{l=1}^{B_k} \sum_{u=0}^{N_l-1} b_k^l(u) \exp(j2\pi u\Delta ft) w_k^l(t-uT_d), \quad (1)$$

where  $w_k^l(t)$  is the pulse shaping filter used in the *l*th subband for the *k*th user. The subcarrier spacing  $\Delta f$  maintains the subcarrier orthogonality such that  $\Delta f = 1/T_d$  [17]. In vector-matrix form, (1) can be rewritten as  $\mathbf{s}_k^l = (\mathbf{W}_k^l)^\top \otimes$  $(\mathbf{V}^l)^{\dagger} \mathbf{b}_k^l \in \mathbb{C}^{N \times 1}$ , where  $\otimes$  denotes the Hadamard product,  $\mathbf{W}_k^l \in \mathbb{C}^{N_l \times N}$  and  $\mathbf{V}^l \in \mathbb{C}^{N_l \times N}$  are the filter matrix for the *k*th user in the *l*th subband and Fourier matrix used in the *l*th subband, respectively.



Fig. 2. Proposed reconfigurable UFMC transmitter architecture.

# B. Pulse Shaping Filters

We incorporate the popular pulse shaping filters such as Flat Top  $(w_{\rm ft}(n))$ , Blackman-Harris  $(w_{\rm bh}(n))$ , Blackman  $(w_{\rm bl}(n))$ , Hamming  $(w_{\rm hm}(n))$ , and Hanning  $(w_{\rm hn}(n))$  filters for the proposed hardware-efficient UFMC architecture, where  $n = \{0, 1, 2, \ldots, (L-1)\}$  is the discrete time index and L represents the filter-length. The above filter functions are expressed as follows:

$$w_{\rm ft}(n) = a_0 - a_1 \cos(2\pi n/L) + a_2 \cos(4\pi n/L) - a_3 \cos(6\pi n/L) + a_4 \cos(8\pi n/L)$$
(2a)

$$w_{\rm bh}(n) = a_5 + a_6 \cos(2\pi n/L) + a_7 \cos(4\pi n/L) + a_8 \cos(6\pi n/L), \qquad (2b)$$

$$w_{\rm bl}(n) = a_9 - a_{10}\cos(2\pi n/L) + a_{11}\cos(4\pi n/L),$$
 (2c

$$w_{\rm hm}(n) = a_{12} - a_{13}\cos(2\pi n/L),$$
 (2d)

$$w_{\rm hn}(n) = a_{14} - a_{15}\cos(2\pi n/L), \tag{2e}$$

where  $a_0, a_1, a_2, a_3, a_4, a_5, a_6, a_7, a_8, a_9, a_{10}, a_{11}, a_{12}, a_{13}$ , and  $a_{14}$  are the filter coefficient values [18]. In the proposed architecture, we introduce a unified pulse shape filtering approach that enables to select the appropriate pulse shaping filter (say, type-x) from the above mentioned filters as per the FOM, such as -3 dB main-lobe power, highest side-lobe power, and side-lobe roll-off rate.

### C. CORDIC Algorithm

For the large number of computations in Fourier coefficient  $\mathbf{V}^l$  and filter coefficient  $\mathbf{W}_k^l$ , we use the well-known CORDIC algorithm [19], [20] that requires only shift and add operations during the iterative vector rotation algorithm implementation. The CORDIC algorithm is carried out by an iterative microrotation (called as prefixed angle  $\alpha_i$ ) stages and evaluated by only add and shift operation. The basic equations of trigonometric function computation for micro-rotation stages are discussed as follows:  $x_{i+1} = \cos \alpha_i (x_i - s_i y_i \tan \alpha_i), y_{i+1} =$  $\cos \alpha_i (s_i x_i \tan \alpha_i + y_i), z_{i+1} = z_i - s_i \alpha_i$ , where  $(x_{i+1}, y_{i+1})$ is resulting vector when a vector  $(x_i, y_i)$  is rotated through an angle  $\alpha_i = \tan^{-1}(2^{-i}), s_i \in (+1, -1)$  represents the sign bit and equals to sign bit of  $z_i$ , *i* denotes the iteration stages varied from 0 to (m-1), where m is the integer equal to the bit-precision or the number of micro-rotations. In general, the factor  $\cos \alpha_i$  is neglected during CORDIC iteration stages implementation. The scale factor  $\mu = \prod_{i=0}^{i=(m-1)} \cos \alpha_i \approx$ 0.6073 is further compensated by the compensated CORDIC unit [19].

# III. PROPOSED RECONFIGURABLE UFMC TRANSMITTER ARCHITECTURE

As illustrated in Fig. 2, the proposed architecture contains two parts as: 1) the angle generator unit for the baseband UFMC transmitter and 2) reconfigurable IDFT integrated with reconfigurable unified filtering.



Fig. 3. Proposed (a) angle generator unit and (b) reconfigurable IDFT and filtering unit for the UFMC transmitter.

## A. Angle Generator Unit for the UFMC Transmitter

The angle generator unit as shown in Fig. 3(a), generates the cosine and sine argument for the UFMC baseband signal in (2). The registers *REG1*, *REG2*, and *REG3* store the arguments required in (2). Specifically, this unit consists of two Hardwired Shifters (HSs), say *HS1* and *HS2*, three down counters, adders, and multiplexers. The *HS1* and *HS2* generate the angle increment required for the angle sequence in (2). In the following, we detail the three down counters as:

- 1) Down Counter1: This counter has the number of active subcarriers in the subband,  $N_l$ , as input. Rest of the subcarriers  $(N-N_l)$  are zero-padded. The signals SEL1 and RD1 are the outputs from this counter.
- 2) Down Counter2: The number of subbands, i.e.,  $B_k$  is the input to this counter. This counter has the signals SEL2 and RD2 as the outputs.
- Down Counter3: This counter has the IDFT-size, i.e., N as an input. It has an output as NXT\_FR that shows the completion of one UFMC symbol.

# B. Reconfigurable architecture for IDFT and unified filtering

Fig. 3(b) illustrates the proposed reconfigurable architecture for the IDFT and unified pulse shaping filtering. The

 TABLE I

 Control Signals with Their Sources and Functionality

| Signal        | Source                                                                                  | Function                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
|---------------|-----------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| clk4_symb_gen | Clock generator unit                                                                    | - Clock generator unit, with the help of Master_clk, generates this clock that drives the IDFT and filtering unit.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| clk4_filter   | Clock generator unit                                                                    | - This clock is generated by clock generator unit with the help of Master_clk and responsible for generating sample point of a selected pulse shaping filter.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| EOC           | NOR-ing the output of<br>internal counter of<br>compensated-circular<br><i>CORDIC 0</i> | - It stands for the end of computation.<br>- If high-state (i.e., 1), then it updates <i>REG3</i> with new value of argument $Z_{in}$ . It also updates the registers <i>REG4</i> and <i>REG5</i> with new cumulative value of sin and cos of the current argument $Z_{in}$ .<br>- if low-state (i.e., 0), then computation of input argument $Z_{in}$ is in-progress. Moreover, it also acts as driving clock for the <i>Down Counter1</i> .                                                                                                                                                                                                                                                                                              |
| SEL1          | NOR-ing the output of the Down Counter1                                                 | – The low-state (i.e., 0), it helps the registers $REG4$ and $REG5$ to accumulate a sample point of all the subcarriers available in a subband. It denotes that sample point of one subband of the available $B_k$ subbands is in-progress.<br>– The high-state (i.e., 1), it clears the registers $REG4$ and $REG5$ to zero for the accumulation of a sample point of all the subcarriers for next subband. Note that before clearing $REG4$ and $REG5$ to zero, the content of these registers is transferred to registers $REG6$ and $REG7$ respectively by clock RD1.                                                                                                                                                                  |
| RD1           | OR-ing the output of the<br>Down Counter1                                               | - It is the clock signal for the registers { <i>REG1, REG6, REG7, REG8, REG9</i> } and for the <i>Down Counter2</i> as shown in Fig. 3.<br>- If high-state (i.e., 1), it updates the register <i>REG1</i> with a new argument corresponding to the subcarriers sample point. This signal updates the content of registers <i>REG6</i> and <i>REG7</i> with corresponding value of registers <i>REG4</i> and <i>REG5</i> that contains the cumulative value of all subcarriers in a subband. It also updates registers <i>REG8</i> and <i>REG9</i> with new cumulative value a sample point of a subband among available $B_k$ subbands.<br>- if low-state (i. e, 0), it shows the computation of sample point of a subband is in-progress. |
| SEL2          | NOR-ing the output of the<br>Down Counter2                                              | <ul> <li>The low-state (i.e., 0), it helps the registers REG8 and REG9 to accumulate a sample point of all the subbands of<br/>the UFMC symbol. It denotes that a sample point of UFMC symbol computation is in-progress.</li> <li>The high-state (i.e., 1), it clears the registers REG8 and REG9 to zero for the accumulation of next sample point<br/>of UFMC symbol. Note that before clearing REG8 and REG9 to zero, the content of these registers is transferred to<br/>registers REG10 and REG11 respectively by clock RD2.</li> </ul>                                                                                                                                                                                             |
| RD2           | OR-ing the output of the <i>Down Counter2</i>                                           | - If high-state (i.e., 1), then it updates the register <i>REG2</i> with new argument for the computation of sample of the UFMC symbol and updates the registers <i>REG10</i> and <i>REG11</i> with new cumulative value of a sample point of all $N_l$ subcarriers of all $B_k$ subbands of UFMC symbol and - if low-state (i.e., 0), it shows the computation of sample point of the UFMC symbol is in-progress.                                                                                                                                                                                                                                                                                                                         |
| NXT_FR        | NOR-ing the output of<br>Down Counter3                                                  | <ul> <li>If high-state (i.e., 1), next set of user's data is sent to the IDFT and filtering unit and</li> <li>if low-state (i.e., 0), then it shows that the UFMC symbol generation for the current set of Data_in is in-process.</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| Reset         | External signal                                                                         | <ul> <li>If high-state (i.e., 1), the proposed architecture gets reset and</li> <li>if low-state (i.e, 0), then it depicts that UFMC symbol generation is in-process.</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| enab          | External signal                                                                         | <ul> <li>If high-state (i.e., 1), then it shows that UFMC symbol generation is in-process and</li> <li>if low-state (i.e, 0), then the proposed architecture is in an inactive-state,</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |

Master\_clock acts as the primary clock that drives the proposed architecture. We take Z<sub>in</sub> as the input angle argument to the CORDIC 0 unit generated by the angle generator unit. The information bit Data\_in modulates the available orthogonal subcarriers  $N_l$ . Moreover, the output signal EOC shows one complete cycle of the CORDIC iteration in CORDIC 0 unit for each new argument  $Z_{in}$ . Note that the CORDIC 0 unit in the proposed architecture is controlled by an internal counter that keeps track of number of CORDIC iterations. The other control signals associated with the reconfigurable IDFT and filtering unit and their functionalities are summarized in Table I. At different stages during the hardware computation of (2), the group of registers  $\{REG4, REG6, REG8, REG10\}$  and {*REG5*, *REG7*, *REG9*, *REG11*} store the In-phase component (I-channel) and Quadrature component (Q-channel) of the UFMC signal, respectively.

# C. Reconfigurable Pulse Shaping Filters Architecture

Fig. 4 illustrates the proposed architecture for the unified pulse shaping filter in the baseband UFMC transmitter. This architecture has four compensated-circular CORDIC units and an angle generator unit. The proposed architecture has the flexibility in a) generating an arbitrary filter-length up to L and b) choosing the filter type of from five filters<sup>2</sup> (such as Flat top, Blackman-Harris, Blackman, Hamming, and Hanning filter). The user selects a filter among these above-mentioned five filters by a filter-selection input, i.e., ws. Afterward, the decoder



Fig. 4. The proposed reconfigurable popular pulse shaping filters architecture.

output controls the clock and coefficients to the CORDIC units. The co-efficient selection unit sends the corresponding co-efficients of pulse shaping filters  $\{a, \beta 1, \beta 2, \beta 3, \beta 4\}$  to the five CORDIC units based on the decoder output. The gated-clock unit controls the clock input to the CORDIC units and keeps the unused CORDIC unit in an idle-state minimizing

<sup>&</sup>lt;sup>2</sup>To obtain the corresponding time samples of Dolph-Chebyshev filter, we need to apply a DFT on the samples of frequency-domain Dolph-Chebyshev filter, afterwards scaling to unity peak amplitude. Thus, the design of reconfigurable Dolph-Chebyshev filter in the time-domain that itself a challenging task is a part of our future work.



Fig. 5. Angle generator unit for the pulse shaping filters.



Fig. 6. Hard-wired shifter.

the dynamic power dissipation in the hardware.

The signal EOC0 represents the end of a computation of the CORDIC iteration within a pulse shaping filter unit and acts as a clock signal for the angle generator unit of popular pulse shaping filters. We take the initial CORDIC input vector  $(x_0, y_0)$  as (1, 0) that are represented as 16' d1 and 16' d0, respectively, considering 16-bit data-length (see Fig. 4). Further, the angle generator unit for the selected filter generates the cosine arguments for (2).

1) Angle generator for pulse shaping filters: This unit has the pulse shaping filter-length L as input. Note that the cosine terms in (2) have the arguments as  $\{2\pi n/L, 4\pi n/L, 6\pi n/L, 8\pi n/L\}$ . The angle sequences for the sample points are expressed as follows:  $\theta_1(n+1) = \theta_1(n) + \theta_2(n)$  $2\pi/L, \theta_2(n+1) = \theta_2(n) + 4\pi/L, \theta_3(n+1) = \theta_3(n) + 6\pi/L,$ and  $\theta_4(n+1) = \theta_4(n) + 8\pi/L$ , where the arguments  $\theta_1(n)$ ,  $\theta_2(n), \theta_3(n), \text{ and } \theta_4(n)$  are zero for n = 0 and take values from  $[0, 2\pi]$  for n > 0. A hard-wired shifter, *HSO* generates the angle increment required in the angle sequence of  $\theta_1$  at every sample point of the pulse shaping filter. The input to the HSO is the filter-length L and the output is  $2\pi/L$ . The input and output binary sequence of hard-wired shifter is weighted as shown in Fig. 6. The angle sequences  $\theta_2$  and  $\theta_4$  are obtained by the left shifting  $\theta_1$  through 1-bit and 2-bit, respectively. Moreover, the angle sequence  $\theta_3 = \theta_1 + \theta_2$  is shown in Fig. 5.

2) Parallel scale-factor compensation for the CORDIC architecture: Scale-factor has been compensated using parallel scale-factor compensation architecture [19]. We consider  $\beta = \cos^{-1}(a_x\mu)$ , where  $a_x \in \{a_1, a_2, a_3, a_4, a_6, a_7, a_8, a_{10}, a_{11}, a_{13}, a_{15}\}$ . Therefore, the output vector of an individual compensated-CORDIC unit is generated as  $a_x \cos \theta$ , where  $\theta$  is input argument to the CORDIC unit. By this way, we avoid the use of direct multiplier as in [19] to compensate the scale factor as well as multiplication of cosine terms with  $a_x$ . As shown in Fig. 4, the coefficient selection control unit in the proposed architecture has the following outputs as  $\{a, \beta_1, \beta_2, \beta_3, \beta_4\}$ , where  $a \in \{a_0, a_5, a_9, a_{12}, a_{14}\}$ ,  $\beta_1 \in$ 



Fig. 7. Architecture for Digilent Pmod-DA2 DAC interfacing with the UFMC transmitter output.



Fig. 8. Time diagram for the Digilent Pmod-DA2 DAC interfacing.



Fig. 9. Experimental set up for the FPGA prototyping of the proposed architecture for UFMC transmitter.

 $\{\cos^{-1}(-a_1\mu), \cos^{-1}(a_6\mu), \cos^{-1}(-a_{10}\mu), \cos^{-1}(-a_{13}\mu), \cos^{-1}(-a_{15}\mu)\}, \beta_2 \in \{\cos^{-1}(a_2\mu), \cos^{-1}(a_7\mu), \cos^{-1}(a_{11}\mu)\}, \beta_3 \in \{\cos^{-1}(-a_3\mu), \cos^{-1}(a_8\mu)\}, \text{ and } \beta_4 \in \{\cos^{-1}(a_4\mu)\}.$ This arrangement of coefficient multiplication within the pulse shaping filter reduces the additional latency compared to the direct multipliers.

### D. Digital-to-Analog (DAC) interfacing

We use Digilent Pmod-DA2 digital-to-analog converter  $(DAC)^3$ , a 12-bit DAC powered by the Texas Instruments DAC121S101<sup>4</sup>, for analog conversion of the baseband UFMC digital signal for both I-channel (i.e., D\_I) and Q-channel, (i.e., D\_Q) simultaneously. The Pmod-DA2 interfacing is performed by two 16-bit control-word registers and two parallel-to-serial converter. A clock generator unit is designed to synchronize all the modules in the proposed architecture. The serial clock to the DAC, i.e., SCLK4DAC and the signal SYNC\_BAR control the timing for the conversion of the digital

<sup>&</sup>lt;sup>3</sup>DIGILENT Pmod DA2 Reference Manual, Accessed 20 Feb. 2018, [Online]. https://reference.digilentinc.com/reference/pmod/pmodda2/referencemanual,

<sup>&</sup>lt;sup>4</sup>TEXAS INSTRUMENTS–DAC121S101/-Q1 12-Bit Micro Power, RRO Digital-to-Analog Converter, Accessed 20 Feb. 2018, [Online]. https://reference.digilentinc.com/reference/pmod/pmodda2/reference-manual



Fig. 10. MATLAB simulation and experimental output of the baseband UFMC signal captured through Tektronix MSO2024 for the I- and Q- component with blackman window.

signal to the analog signal. When the SYNC\_BAR signal goes to low-state, then the serial data enters to DAC Pmod-DA2. Besides, when the SYNC\_BAR signal goes to high-state, then the DAC converts the stored binary digits to the analog output.

# **IV. FPGA PROTOTYPING RESULTS**

We prototype the proposed UFMC transmitter architecture using Verilog hardware description language on XILINX platform with 16-bit data-length. The architecture can handles the maximum value of N and L equal to  $2^{16-1} = 2^{15} = 32768$ , nevertheless, the increase in data-length-per bit doubles the maximum value. Based on the first order analysis without considering DAC interfacing, we observe that the proposed architecture for 16-bit data-length requires 1314 flip-flops, 80 full-adders of size 16-bit, 3 hard-wired shifters, 2 ROM of  $16 \times 16$ -bit size and 36 variable shifters of 16-bit data-length. The interested reader may refer to [19] for the first-order analysis and device utilization of basic CORDIC architecture used in the proposed UFMC architecture. Furthermore, the critical path delay of the proposed architecture only consists of the adder/subtracter delay and the shifter delay due to the use of CORDIC algorithm.

*FPGA prototyping:* We evaluate the FPGA prototyping of the proposed architecture with a master clock frequency of 120 MHz using Tektronix arbitrary function generator AFG3252 with an assumption that kth user has  $B_k = 4$  subbands, each subband contains  $N_l = 8$  subcarriers, and N = 1024. Fig. 9 shows experiment setup consists of a FPGA board, Tektronix AFG3252 function generator, Tektronix MSO2024 mixed signal oscilloscope, Tektronix RSA3303B spectrum analyzer, and Digilent Pmod-DA2 digital to analog converter (DAC), Texas Instruments DAC121S101.

We consider the Binary Phase Shift-Keying  $(BPSK)^5$ . For the experimental purpose, the input data for the *k*th user is

| TA                 | BLE II        |        |
|--------------------|---------------|--------|
| IGURE-OF-MERIT FOR | UFMC BASEBAND | SIGNAL |

| Pulse shaping<br>filter used   | Main-lobe<br>Power<br>(dBm)                                                       | Occupied<br>bandwidth<br>(KHz)  | Highest<br>Side-lobe<br>Power (dBm) | Main-lobe and<br>maximum<br>side-lobe<br>difference (dBm) |
|--------------------------------|-----------------------------------------------------------------------------------|---------------------------------|-------------------------------------|-----------------------------------------------------------|
| Rectangular<br>Flat-top        | 6.05<br><b>6.11</b>                                                               | <b>370.743</b><br>380.996       | $-19.78 \\ -21.32$                  | 25.83<br><b>27</b> .43                                    |
| Blackman-<br>Harris            | 6.07                                                                              | 378.377                         | -21.20                              | 27.27                                                     |
| Blackman<br>Hamming<br>Hanning | $     \begin{array}{r}       6.03 \\       6.05 \\       6.04     \end{array}   $ | $380.675 \\ 374.460 \\ 377.783$ | $-21.18 \\ -20.97 \\ -20.89$        | $27.21 \\ 27.02 \\ 26.93$                                 |

#### TABLE III DEVICE UTILIZATION SUMMARY

| Units                  | Used | Available | % of Utilization |
|------------------------|------|-----------|------------------|
| # Slice Registers      | 1000 | 12480     | 8                |
| # Slice look-up tables | 3002 | 12480     | <b>24</b>        |
| # Flip-flop pairs      | 929  | 3070      | 30               |
| # Block RAM            | 4    | 148       | 3                |

assumed as:

F

$$\mathbf{b}_{k} = \begin{bmatrix} \mathbf{b}_{k}^{1}, \mathbf{b}_{k}^{2}, \mathbf{b}_{k}^{3}, \mathbf{b}_{k}^{4} \end{bmatrix}, \quad \begin{bmatrix} \mathbf{b}_{k}^{1}, \mathbf{b}_{k}^{2}, \mathbf{b}_{k}^{3}, \mathbf{b}_{k}^{4} \end{bmatrix} \dots \\ = \begin{bmatrix} 00001000, 11000110, 10100100, 00101000 \end{bmatrix}, \\ \begin{bmatrix} 01001010, 01000010, 10000100, 00100000 \end{bmatrix} \dots$$
(3)

The BPSK-modulated FPGA output of the baseband signal and the MATLAB simulation results are found comparable as shown in Fig. 10. The I- and Q-channel outputs of the baseband signal are acquired for the bit-pattern  $b_k$  shown in (3). Furthermore, we examine the real-time testing of I- and Qchannel outputs of UFMC baseband signal at transmitter with different pulse shaping filters. The FOM parameters for the spectrum of the baseband signal are depicted in Figs. 11(a), 11(b), and 11(c). Moreover, the main-lobe power (in dBm), Occupied Bandwidth (OBW) (in KHz), and maximum sidelobe power (in dBm) for the spectrum of the baseband UFMC signal with different pulse shaping filters are measured and summarized in Table II.

Hardware utilization in the proposed architecture: The total hardware utilization based on the physical synthesis report generated by XILINX XST for the implementation of proposed architecture with Digilent Pmod-DA2 DAC interfacing targeting the FPGA device XC5VLX20T-2FF323 is presented in Table III. Moreover, Table IV summaries the comparison between state-of-the-art [12] and proposed architecture for the baseband UFMC transmitter. The throughput of proposed architecture is 7.5 MSps (mega sample per second) or 120 Mbps at 120 MHz frequency of operation. Note that the speed can be further enhanced with pipelined and other variant of CORDIC architecture with fast adders. For example, the pipelined architecture can achieve 16 times higher speed for experimented 16-bit architecture [21]. Moreover, we aim to extend the proposed architecture to design a real-time and high-speed pipelined architectures combined with highspeed DAC [22], [23] for the UFMC transceivers. It will be an interesting extension on how the data and process-level pipelining [14] can be leveraged to increase the operational frequency of the proposed architecture.

# V. ERROR ANALYSIS FOR THE PROPOSED ARCHITECTURE

Approximation Error: In the CORDIC algorithm, the value of  $\sum_{i=0}^{m-1} s_i \alpha_i$  aims to reach the initial argument  $z_0$  with

<sup>&</sup>lt;sup>5</sup>However, our proposed architecture can support M-ary phase-shift keying (PSK) modulation scheme. In addition, the proposed architecture supports quadrature amplitude modulation (QAM). To explain, for QAM, we need a QAM mapper with input as Data\_in and outputs in the polar form. Then, the argument part can be added to  $Z_{in}$  in place of M-ary PSK mapper phase in the angle generator unit. The amplitude part can be kept with scale factor compensation of CORDICO unit using updated  $\hat{\beta} = \arccos$  (amplitude of mapped data  $\times \mu$ ).



Fig. 11. FOM for the spectrum of baseband UFMC signal with blackman pulse shaping filter.

| COMPARISON WITH STATE-OF-THE-ART UFMC HARDWARE ARCHITECTURE |                                                                                                                                  |                                                                                                                                                                                                                        |  |
|-------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
|                                                             | ROM-based UFMC Transmitter Architecture [12]                                                                                     | Proposed Reconfigurable UFMC Transmitter Architecture                                                                                                                                                                  |  |
| Storage for filter coefficients                             | L (i.e., filter-length) number of ROM are required to store the filter coefficients.                                             | Only d (i.e., data-length) number of ROM are required to store the micro-rotation angles $\alpha_i$ and only additional 16 ROM location are needed to store filter coefficients.                                       |  |
| Frequency shifting coefficients storage                     | For $N=1024,2\times 86$ (86 location for each of In phase and Quadrature part) ROM locations required.                           | No storage requires, frequency shifting and filtering is done by CORDIC unit online, only 16 ROM location which is internal to CORDIC block considering $d = 16$ .                                                     |  |
| IDFT<br>implementation                                      | 5120 butterflies require for $N = 1024$ that results in $4 \times 5120$ direct multipliers for both in-phase and quadrature part | CORDIC algorithm avoids use of direct multiplier and produces sine and cosine outputs.                                                                                                                                 |  |
| Filtering                                                   | Requires multiplication with subcarriers and filter coefficients at each sample point after IDFT stage.                          | No separate multiplication required with filter coefficients. Online filtering has been achieved at each sample point of subcarriers, filter coefficient multiplication and frequency shifting is done simultaneously. |  |
|                                                             | The filter sample points are stored in ROM location resulting in fixed <i>L</i> -length filter.                                  | Filter length is flexible and the user can choose any filter-length up to $L = 2^{(d-1)}$ .                                                                                                                            |  |
| Reconfigurability                                           | Only Dolph-chebyshev filter was considered.                                                                                      | We can select any filter among five pulse shaping filters (i.e. Flat Top, Blackman, Blackman-Harris, Hann, and Hamming filters).                                                                                       |  |
|                                                             | The total number of used subcarriers in UFMC system is fixed due to fixed IDFT-size.                                             | The total number of subcarriers in reconfigurable UFMC system is flexible, the user can choose IDFT-size upto $N = 2^{(d-1)}$ .                                                                                        |  |

TADIEIV

the iteration stages. An angle approximation error is caused due to the residual angle through which vector still has to be rotated after the predefined micro rotations. For an ideal output, the residual angle that leads to approximation error must be zero. Then, the angle approximation error [21], [24] is expressed as  $e_{ap} = \theta - \sum_{i=0}^{m-1} s_i \alpha_i$ , where  $\theta = z_0$ . Without considering the truncation error in an angle representation, the error-bound for the 16-bit circular CORDIC architecture is [24]  $|e_{ap}| \leq 3.051 \times 10^{-5} = \tan^{-1}(2^{-15})$ . Using (4), resulting vector after the *i*th iteration becomes  $\mathbf{v}_{(i+1)} = \mathbf{p}_i \mathbf{v}_i$ , where  $\mathbf{v}_i = [x_i \quad y_i]^{\top}$  is the CORDIC input vector and  $\mathbf{p}_i$  is the rotation matrix [24] during the *i*th CORDIC iteration stage and is expressed as

$$\mathbf{p}_i = \begin{bmatrix} 1 & -\eta \, s_i \, 2^{-i} \\ s_i \, 2^{-i} & 1 \end{bmatrix}. \tag{4}$$

Let  $\mathbf{v}_m^*|_{e_{ap}=0} = \Theta \mathbf{v}_m$  be the actual value of the approximated final vector  $\mathbf{v}_m$  after the *m*th rotation, where

$$\mathbf{\Theta} = \begin{bmatrix} \cos e_{\mathrm{ap}} & s_{(m-1)} \sin e_{\mathrm{ap}} \\ -s_{(m-1)} \sin e_{\mathrm{ap}} & \cos e_{\mathrm{ap}} \end{bmatrix}.$$
 (5)

Therefore, the approximation error is expressed as  $\mathbf{v}_m^* - \mathbf{v}_m = (\mathbf{\Theta} - \mathbf{I}_2) \mathbf{v}_m$  and its absolute value becomes

$$|\mathbf{v}_m^* - \mathbf{v}_m| = \|(\mathbf{\Theta} - \mathbf{I}_2)\| \, |\mathbf{v}_m| = e_{\mathrm{ap}} \mathbf{v}_m \tag{6}$$

*Truncation error:* Truncation error occurs due to the fixed size representation of the data-path width. We consider the error propagation mathematical model as in [24]. The truncation error is expressed as  $2^{-(d'+1)}$ , where d' is the number of bits for the fractional representation of  $x_i$  and  $y_i$ . In our 16-bit architecture, two MSB represent the sign and integer



Fig. 12. Analytical and experimental absolute error of the envelope of UFMC signal at each sample point for N = 1024,  $N_l = 8$ , and  $B_k = 4$ .

bit, respectively, rest of the bits corresponds to fraction representation. Thus, the truncation error in vector's representation becomes  $2^{-(d-2+1)} = 2^{-15}$ . We consider that  $e_{x_i}$  and  $e_{y_i}$  are the truncation error occurred at the *i*th iteration due to the quantized representation of  $x_i$  and  $y_i$ , respectively. Thereafter, we define the quantization operator Q[.] as  $Q[\mathbf{v}_i] = \mathbf{v}_i + \mathbf{e}_i$ , where  $\mathbf{e}_i = [e_{x_i} e_{y_i}]^{\mathsf{T}}$ . Finally, the overall truncation error

TABLE V Worst-case Absolute Error in Magnitude of Envelope of Baseband UFMC Signal

| Filters used    | I-component error ( $\times 10^{-4}$ ) | <b>Q-component error</b> ( $\times 10^{-4}$ ) |
|-----------------|----------------------------------------|-----------------------------------------------|
| Rectangular     | 5.3                                    | 5.3                                           |
| Flat Top        | 10.7                                   | 7.9                                           |
| Blackman-Harris | 8.2                                    | 9.2                                           |
| Blackman        | 7.7                                    | 7.0                                           |
| Hamming         | 7.0                                    | 6.5                                           |
| Hanning         | 7.9                                    | 5.7                                           |

after the *m*th rotation is expressed as [24]:

$$f(m) = Q\left[\mathbf{v}_m\right] - \mathbf{v}_m = \mathbf{e}_m + \sum_{j=1}^{m-1} \left(\prod_{i=j}^{m-1} \mathbf{p}_i \mathbf{e}_j\right).$$
 (7)

Note that f(m) includes all truncation errors occurred during the previous iterations as well as in the *m*th rotation. Taking  $\eta = 1$  and without considering the scale-factor compensation effect, the total quantization error due to the angle approximation error and truncation error in the circular CORDIC using (6) and (7) becomes  $e_{\text{circular}} = e_{ap}\mathbf{v}_m + f_{\text{circular}}(m)$ , where  $f_{\text{circular}}(m)$  is the truncation error in circular CORDIC unit after the *m*th rotation.

As shown in [19], the scale-factor has been compensated by two parallel circular CORDIC units and an additional hard-shifter. The two parallel circular CORDIC units have the individual input angle as  $(\theta + \beta)$  and  $(\theta - \beta)$ , respectively. We compute the total quantization error in above two individual circular CORDIC units outputs as  $e_{\beta_+}$  and  $e_{\beta_-}$ , respectively. Thus, the absolute quantization error in a compensated circular CORDIC becomes  $\hat{e}_{\text{circular}} = \frac{1}{2} (e_{\beta_+}^2 + e_{\beta_-}^2)^{\frac{1}{2}}$ . Moreover, we calculate the truncation error due to fix size representation in  $a \in \{a_0, a_5, a_9, a_{12}, a_{14}\}$  as  $e_a = 2^{-(d-1)}$ .

Analytical error in pulse shaping filters: In the proposed unified pulse shaping filter architecture, there are four compensated circular CORDIC unit. Denote the absolute quantization error of individual compensated circular CORDIC unit as  $\hat{e}_{circular,\sigma}$ ,  $\sigma = \{1, 2, 3, 4\}$ . We denote  $e_{\rm ft}$ ,  $e_{\rm bh}$ ,  $e_{\rm bl}$ ,  $e_{\rm hm}$  and  $e_{\rm hn}$  as the analytical total absolute error in the Flat Top, Blackman-Harris, Blackman, Hamming and Hanning pulse shaping filters, respectively. These above-mentioned errors are expressed as:  $e_{\rm ft}(n) = (e_a^2 + \sum_{\sigma=1}^4 \hat{e}_{\rm circular,\sigma}^2)^{\frac{1}{2}}$ ,  $e_{\rm bh}(n) = (e_a^2 + \sum_{\sigma=1}^3 \hat{e}_{\rm circular,\sigma}^2)^{\frac{1}{2}}$ ,  $e_{\rm bh}(n) = (e_a^2 + \sum_{\sigma=1}^2 \hat{e}_{\rm circular,\sigma}^2)^{\frac{1}{2}}$ ,  $e_{\rm hm}(n) = (e_a^2 + \hat{e}_{\rm circular,\sigma}^2)^{\frac{1}{2}}$ , and  $e_{\rm hn}(n) = (e_a^2 + \hat{e}_{\rm circular,\sigma}^2)^{\frac{1}{2}}$ . Comparison of analytical error with experimental error

in the proposed UFMC architecture: Let s(l, u, n) be the amplitude of the *n*th sample point for the *u*th subcarrier in the *l*th subband of the IDFT unit. Moreover, we denote  $w_x(n)$ as the amplitude of the *n*th sample point of the type-x pulse shaping filter. To obtain the absolute error in s(l, u, n), the relative errors are computed at each sample point of IDFT unit and type-x pulse shaping filtering unit. Therefore, the total absolute error for the baseband UFMC signal is expressed as

$$|e(n)| = \frac{1}{\sqrt{N_l}} \sum_{l=1}^{B_k} \sum_{u=0}^{N_l-1} |s(l, u, n)w_{\mathbf{x}}(n)| \\ \times \sqrt{(e_s(l, u, n)/s(l, u, n))^2 + (e_{\mathbf{x}}(n)/w_{\mathbf{x}}(n))^2}, \quad (8)$$

where  $e_s(l, u, n)$  is the quantization error in the *n*th sample point of the *u*th subcarrier of *l*th subband of the IDFT unit and  $e_x(n)$  is the absolute error of the *n*th sample point for the type-X pulse shaping filtering unit. Fig. 12 compares the analytical and experimental errors in the I- and Q-components for the envelope of the baseband UFMC signal with the proposed architecture with blackman window. We compute the analytical error using (8). Besides, the experimental error is computed as the difference between the envelope of MATLAB output and FPGA simulation output at each of the 1024 UFMC sample points. Moreover, Table V summarizes the worst-case error in the magnitude of the baseband UFMC signal with the proposed 16-bit architecture compared to MATLAB simulations with different pulse shaping filters.

## A. Discussions and Take away message

For multi-service provisioning in 5G, the air interface must be capable to handle different subband filtering and a different number of subcarriers. Nevertheless, each pulse shaping filter requires a different number of cosine terms. To address the above issue, our proposed pulse shaping filter architecture can be used for all five type filters. As obvious, at the same time, all the four compensated CORDIC units are not always used for each of the pulse-shaping filters. For example, in one hand, the flat top filter requires all the four compensated CORDIC units, on the other hand, both Hanning and Hamming filters require only one CORDIC unit to generate the cosine terms. In our proposed architecture, interestingly, the CORDIC units that are not used for the selected pulse shaping filter are kept in idle-state with the help of gated clock, thus, the dynamic power dissipation in the architecture is avoided.

Furthermore, we do not use any digital signal processing (DSP) block in the proposed architecture. Moreover, it is interesting to note that once we implement the above architecture on the FPGA, same hardware can be used for any filter-type, number of subband  $(B_K)$ , number of subcarriers in each subband  $(N_l)$ , filter length (L), and IDFT size (N) that are the external inputs to the proposed architecture. In fact, changing these above values with the external select line does not affect the device utilization, resulting in multi-service provisioning with different subcarrier in a subbands for UFMC systems.

# VI. CONCLUSION

In this paper, we have proposed a hardware-efficient reconfigurable architecture for baseband UFMC transmitter. The proposed architecture has the flexibility to chose the number of subcarriers in a subband and the pulse shaping filter from a group of pulse shaping filters based on the required figure-of-merits without any significant changes in hardware resources. The experimental baseband signal corroborates the simulations. Moreover, we have performed the error analysis for the proposed architecture and compared them as the error-bound. The proposed reconfigurable architecture for UFMC transmitter is suitable for the 5G systems due to its reconfigurability, hardware efficiency, and reusing the several hardware components compared to the state-of-the-art. The proposed architecture can be further extended to apply dataand process-level pipelining for increased operational frequency in UFMC systems. As a part of future work, we aim to design a pipelined architecture for high-speed reconfigurable multicarrier systems.

#### REFERENCES

 C. Li, C.-P. Li, K. Hosseini, S. B. Lee, J. Jiang, W. Chen, G. Horn, T. Ji, J. E. Smee, and J. Li, "5G-based systems design for tactile Internet," *Proc. of the IEEE*, pp. 1–18, 2018.

- [2] L. Zhang, A. Ijaz, P. Xiao, and R. Tafazolli, "Multi-service system: An enabler of flexible5G air interface," IEEE Commun. Mag., vol. 55, no. 10, pp. 152-159, Oct. 2017.
- [3] R. Gerzaguet, N. Bartzoudis, L. G. Baltar, V. Berg, J.-B. Doré, D. Kténas, O. Font-Bach, X. Mestre, M. Payaró, M. Färber, and K. Roth, "The 5G candidate waveform race: A comparison of complexity and performance," EURASIP J. on Wireless Commun. and Netw., vol. 2017, no. 1, pp. 1–13, Jan. 2017.
- [4] F. Schaich, T. Wild, and Y. Chen, "Waveform contenders for 5Gsuitability for short packet and low latency transmissions," in Proc. 79th *IEEE VTC-Spring*, May 2014, pp. 1–5. V. Vakilian, T. Wild, F. Schaich, S. Brink, and J.-F. Frigon, "Universal-
- [5] filtered multi-carrier technique for wireless systems beyond LTE," in Proc. IEEE GLOBECOM (Wksp.), Dec. 2013, pp. 223-228.
- Z. Zhang, H. Wang, G. Yu, Y. Zhang, and X. Wang, "Universal filtered multi-carrier transmission with adaptive active interference cancellation,'
- [7] J. Wen, J. Hua, W. Lu, Y. Zhang, and D. Wang, "Design of waveform shaping filter in the UFMC system," *IEEE Access*, pp. 1–9, May 2018.
  [8] D.-J. Han, J. Moon, J.-Y. Sohn, S. Jo, and J. H. Kim, "Combined
- window-filter waveform design with transmitter-side channel state information," IEEE Trans. on Vehi. Technol., pp. 1-5, June 2018.
- [9] M. Mukherjee, L. Shu, V. Kumar, P. Kumar, and R. Matam, "Reduced out-of-band radiation-based filter optimization for UFMC systems in 5G," in *Proc. IWCMC*, Aug 2015, pp. 1150–1155. [10] Y.-P. Lin and S. M. Phoong, "Window designs for DFT-based multicar-
- rier systems," IEEE Trans. Signal Process., vol. 53, no. 3, pp. 1015-1024, Mar. 2005.
- T. Wild and F. Schaich, "A reduced complexity transmitter for UF-[11] OFDM," in *Proc. IEEE 81st VTC-Spring*, May 2015, pp. 1–6. [12] A. R. Jafri, J. Majid, M. A. Shami, M. A. Imran, and M. Najam-Ul-
- Islam, "Hardware complexity reduction in universal filtered multicarrier transmitter implementation," *IEEE Access*, vol. 5, pp. 13401–13408, Aug. 2017.
- [13] R. Knopp, F. Kaltenberger, C. Vitiello, and M. Luise, "Universal filtered

multicarrier for machine type communications in 5G," in Proc. Eur.

- [14] A. R. Jafri, J. Majid, L. Zhang, M. A. Imran, and M. N. ul Islam, "FPGA implementation of UFMC based baseband transmitter: Case study for LTE 10mhz channelization," Wireless Communications and Mobile Computing, vol. 2018, pp. 1–12, July 2018.
- [15] F. J. Harris, "On the use of windows for harmonic analysis with the discrete fourier transform," Proc. of the IEEE, vol. 66, no. 1, pp. 51-83, Jan. 1978
- [16] A. D. Poularikas, Handbook of Formulas and Tables for Signal Process*ing*, 1st ed. CRC Press, 1999. V. Kumar, K. C. Ray, and P. Kumar, "Low-complexity CORDIC-based
- [17] VLSI design and FPGA prototype of CI-OFDMA system for next-generation," in Proc. IEEE 12th Int. Colloquium on Signal Process. Its *Appli.* (*CSPA*), Mar. 2016, pp. 22–27. [18] A. V. Oppenheim, R. W. Schafer, and J. R. Buck, *Discrete-time Signal*
- Processing (2Nd Ed.). Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 1999.
- V. Kumar, K. C. Ray, and P. Kumar, "CORDIC-based VLSI architecture [19] for real time implementation of flat top window," *Microprocessors and Microsystems*, vol. 38, no. 8, Part B, pp. 1063 – 1071, 2014.
- [20] J. S. Walther, "A unified algorithm for elementary functions," in Proc. ACM Spring Joint Computer Conf., May 1971, pp. 379–385. [21] T. Kulshreshtha and A. S. Dhar, "CORDIC-based high throughput
- sliding DFT architecture with reduced error-accumulation," Circuits, Systems, and Signal Process., Apr. 2018.
- [22] B. Hu, Y. Du, R. Huang, J. Lee, Y. K. Chen, and M. C. F. Chang, "An R2R-DAC-based architecture for equalization-equipped voltagemode PAM-4 wireline transmitter design," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 25, no. 11, pp. 3260–3264, Nov. 2017.
  [23] W. Yuan and J. S. Walling, "A switched-capacitor-controlled digital-
- current modulated class-E transmitter," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 25, no. 11, pp. 3218–3226, Nov. 2017. K. Kota and J. Cavallaro, "Numerical accuracy and hardware tradeoffs
- [24] for CORDIC arithmetic for special-purpose processors," *IEEE Trans. Comp.*, vol. 42, no. 7, pp. 769–779, July 1993.