# Simulation of Memory-Based 1024-Point Fast Fourier Transform for Broadband Wireless Access Technology on FPGA 

Dinda Pramanta ${ }^{\text {a }}$, Denny Darlis ${ }^{\mathrm{b}, *}$, Iswahyudi Hidayat ${ }^{\mathrm{c}}$<br>${ }^{\text {b }}$ Diploma of Telecommunication Engineering, Telkom University, Indonesia<br>${ }^{\text {a,c }}$ Electrical Engineering, Telkom University, Indonesia

## ARTICLE INFO

Received June 7, 2017
Revised July 24, 2017
Accepted August 8, 2017
Available online August 14, 2017

Keywords
BWA, FFT, FPGA, VHDL


#### Abstract

The limited radio frequency spectrum that can be used for transmission leads to bandwidth and power efficiency being a key requirement in the development of wireless access technology from 3G to 5G today. Data communication technology also requires this as mentioned on high speed network standards such as DSL, WLAN and WMAN with its products ADSL, WiFi and Wimax. In the last few decades we have seen the evolution of the Orthogonal Frequency Division Multiplexing (OFDM) modulation technique used in the technologies mentioned earlier to this day. This technique is regarded as a standard technology for broadband wireless access technology. In hardware implementation, the most preferred by many researchers is the Field Programmable Gate Array chip, as it can be reconfigured. The OFDM technique can be easily implemented because it uses Fast Fourier Transform (FFT) algorithms that are coding and programming capable of reducing the computational time of Discrete Fourier Transform. This paper discusses the implementation of the memory-based 1024-point IFFT / FFT for BWA communications. The design is focused on synthesizing and implementing the system block FFT 1024-point radix-4 using Decimation in Frequency (DIF) method. Implementation for IFFT / FFT 1024-point resource usage slice number $1 \%$, the number of slice flip-flop $1 \%$, the number 4 LUT (Look Up Table) $1 \%$, and the number of IOB $27 \%$. of the FPGA are used.


[^0]ORCID ID:
Second Author: 0000-0001-9242-9522

[^1]
## 1. Introduction

Digital communication has grown so rapidly along with the growth and development of the era of computerization and telecommunications. To facilitate the integration of reliable telecommunications systems, digital communication has many advantages more than the analog communication such as the generation of reset signals is much easier, more resistant to distortion and interference, as well as smaller error rate number.

The current BWA standards can now be operated and is widely used standards issued by the Institute of Electrical and Electronics Engineering (IEEE), such as the 802.15 standard for Personal Area Network (PAN), 802.11 for Wireless Fidelity networks (Wi-Fi), and 802.16 for the network WiMAX. WiMAX is a broadband solution to blank spot address areas. WiMAX technology for broadband services is not only used as a fixed but also mobile access that has speeds up to 70 Mbps.

OFDM is an access method that is used or applied to BWA. This technique first applied on wireline technology such as ADSL and Broadband Powerline [1]. OFDM has become the preferred modulation scheme for spectral efficiency and robustness against various interferences in wireless technology such as mention above $[3,5]$. Although the components and the overall structure are different, OFDM protocols are functionally similar [6]. FFT block is an important part of an OFDM, because the process of sharing the high-speed data stream into several low speed sub stream underway [8]. This division process aims to provide high efficiency in the use of the frequency spectrum. Therefore, the ability of sending and receiving data on the process of communication depends on the performance of the FFT processor [2,4].

In this research, a VHDL-based FFT 1024 points using the radix-4 as the basis for calculation has been designed. By using the Cooley-Tukey algorithm, the calculation of the DFT on which the basic implementation of FFT on FPGA becoming fewer and simpler so that the process is faster and require less memory than using the usual FFT transformation [7]. The calculation complexities of FFT make a huge manufacturing cost in a hardware application. Therefore, in this research, the process of calculating the FFT 1024-point radix-4 performed using the Cooley-Tukey algorithm by using Xilinx Virtex-4 XCVLX25, so it can be easier for us when designing and implementing it on FPGA board.

## 2. Theoretical Background

### 2.1. Orthogonal Frequency Division Multiplexing

Orthogonal Frequency Division Multiplexing (OFDM) modulation technique known as commonly used in the wireless channel that always changing due to fading. When information is transmitted over the wireless channel, the signal may be distorted due to multipath fading.

Unlike single carrier systems, OFDM communication system does not rely on an increase in symbol rate to achieve a higher data rate. This makes the Inter Symbol Interference (ISI) task becomes easier, because the data is transmitted in parallel would be better than the series, then the symbol of the OFDM symbol is longer than a single carrier with a data rate that is commensurate.

In addition to providing the advantages, OFDM has two relative susceptibilities to the single carrier system, which is a weakness of the error signal carrier frequency and the extent of peak-to-average power ratio (PAPR) [4].

### 2.2. FFT Radix-4 [12]

FFT to calculate the DFT by reducing the number of calculations and leaving direct DFT calculation. However, this requires additional steps for the preparation of the data back, to determine the result. However, the additional steps will not add a significant step in the computational complexity of the calculations, as it is implemented efficiently. As a result, the FFT is a very efficient way that produces the best implementation in hardware.

As mentioned above, the general equation of the FFT is:

$$
\begin{equation*}
\mathrm{X}[\mathrm{k}]=\sum_{n=0}^{N-1} x[n] . W_{N}^{n k} ; k=0,1,2, \ldots, N-1 . \tag{1}
\end{equation*}
$$

where $W_{N}^{n k}$ referred to twiddle factor that has the value $e^{\frac{j 2 \pi n k}{N}}$, with the result that:

$$
\begin{equation*}
\mathrm{X}[\mathrm{k}]=\sum_{n=0}^{N-1} x[n] \cdot e^{\frac{j 2 \pi n k}{N}} ; k=0,1,2, \ldots, N-1 \tag{2}
\end{equation*}
$$

Accordingly, for a 4-point FFT we can do a simple calculation, using the Decimation in frequency (DIF). On the basis of the above calculation, we can perform calculations with a larger 4-point as radix which is for 1024 points using the radix-4 which requires 5 steps $\left(1024=4^{5}\right)$ as shown in Figure 1 for 16 point FFT. The application of the most widely used for this equation is the signal analysis. Signal $x(n)$ is the signal being analyzed, amounting to $N$ points. Signal $x(n)$ is the signal cycle length N 1 point. DFT of $x(n)$ produces $x(k)$. At $k=1, \mid x$ (k) | has a high peak, meaning that $\mathrm{x}(\mathrm{n})$ has an element of one cycle per frequency component N points. In addition, at $\mathrm{k}=\mathrm{N}-1$, other peaks were observed. Output N point DFT has a characteristic cycle.


Figure 1 DFT Radix-4 with 16 Point. [9-12]
Then, the index $\mathrm{k}=\mathrm{N}-1$ so that $\mathrm{k}=-1$, the negative frequency. Since the signal $\mathrm{x}(\mathrm{n})$ consists of frequency components $\mathrm{k}=1$ (positive) and $\mathrm{k}=-1$ (negative), then the components of $\mathrm{x}(\mathrm{k})$ has a component $\mathrm{k}-1$ and $\mathrm{k}=-1$.

## 3. Design and Simulation

### 3.1. Design of Memory-based 1024-point FFT Radix-4

Designing 1024 point FFT is using the FFT algorithm 4-points or more commonly known as algorithm radix-4. Designing is done in several stages as
shown on Figure 2. First, the mathematical calculation of FFT 1024 point using radix-4 algorithm to realize the designed block diagram [10].


Figure 2 Design Flow Diagram for Simulation of 1024-Point FFT

The next stage is simulation using MATLAB that can be tested on mathematical calculations and algorithms that have been specified. MATLAB output will then be used as a comparison to the output encoding is done in ModelSim VHDL language. Elaboration about the stages of 1024 point FFT is described as follow.

1. Step 1; By defining X1 variable as the result of FFT calculation on the first stage, then from the equation (3) obtained:

$$
\begin{equation*}
X_{1}\left(n_{0}+4 n_{1}+16 n_{2}+64 n_{3}+256 k_{0}\right)=W_{1024}^{64 n_{3} k_{0}} \sum_{n_{4}=0}^{3} x(n) W_{1024}^{256 n_{4} k_{0}} \tag{3}
\end{equation*}
$$

Multiplication with twiddle factor $W_{1024}^{64 n_{3} k_{0}}$ is a type of non-trivial multiplication. Twiddle factor can be simplified to $W_{16}^{n_{3} k_{0}}$, then obtaining 16 complex value twiddle costants. This value then is translated into the value of sine and cosine and then stored in the ROM.
2. Step 2; By defining the variable $X 2$ as a result of the second stage of the FFT calculation for the value $n_{-} 3=0,1,2,3$ then obtained:
$X_{2}\left(n_{0}+4 n_{1}+16 n_{2}+64 k_{1}+256 k_{0}\right)=W_{1024}^{16 n_{2} k_{0}} W_{1024}^{64 n_{2} k_{1}} \sum_{n_{3}=0}^{3} X_{1}\left(n_{0}+4 n_{1}+16 n_{2}+64 n_{3}+\right.$
$\left.256 k_{0}\right) W_{1024}^{256 n_{3} k_{1}} \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots$

Nontrivial multiplication re-acquired at multiplication with twiddle factor $W_{1024}^{256 n_{3} k_{1}}$ and by elaborating twiddle factor, will be obtained 15 twiddle constant for non-trivial multiplication. In line with the previous stage, the twiddle constants stored in ROM.
3. Step 3; By defining the variable X 3 as a result of the third stage of the FFT calculation for the value $n \_2=0,1,2,3$ then obtained:

$$
\begin{align*}
& X_{3}\left(n_{0}+4 n_{1}+16 k_{2}+64 k_{1}+256 k_{0}\right)=W_{1024}^{4 n_{1} k_{0}} W_{1024}^{16 n_{1} k_{1}} W_{1024}^{64 n_{1} k_{2}} \sum_{n_{2}=0}^{3} X_{2}\left(n_{0}+4 n_{1}+16 n_{2}+64 k_{1}+\right. \\
& \left.256 k_{0}\right) W_{1024}^{256 n_{2} k_{2}} \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \tag{5}
\end{align*}
$$

Nontrivial multiplication re-acquired at multiplication with twiddle factor $W_{1024}^{256 n_{2} k_{2}}$ and by elaborating twiddle factor, will be obtained 63 twiddle constant for non-trivial multiplication. In line with the previous stage, the twiddle constants stored in ROM.
4. Step 4; By defining the variable X 4 as a result of the fourth stage of the FFT calculation for the value $n \_1=0,1,2,3$ then obtained :

$$
\begin{align*}
& X_{4}\left(n_{0}+4 k_{3}+16 k_{2}+64 k_{1}+256 k_{0}\right)=W_{1024}^{n_{0} k_{0}} W_{1024}^{4 n_{0} k_{1}} W_{1024}^{16 n_{0} k_{2}} W_{1024}^{64 n_{0} k_{3}} \sum_{n_{1}=0}^{3} X_{3}\left(n_{0}+4 n_{1}+\right. \\
& \left.16 k_{2}+64 k_{1}+256 k_{0}\right) W_{1024}^{256 n_{1} k_{3}} \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \tag{6}
\end{align*}
$$

Nontrivial multiplication re-acquired at multiplication with twiddle factor $W_{1024}^{256 n_{1} k_{3}}$ and by elaborating twiddle factor, will be obtained 255 twiddle constant for non-trivial multiplication. In line with the previous stage, the twiddle constants stored in ROM.
5. Step 5; After going through the fourth stage calculation, the FFT equation becomes:

$$
\begin{equation*}
X(k)=\sum_{n_{0}=0}^{3} X_{4}\left(n_{0}+4 k_{3}+16 k_{2}+64 k_{1}+256 k_{0}\right) W_{1024}^{256 n_{0} k_{4}} \tag{7}
\end{equation*}
$$

By defining the variable X 3 as a result of the third stage of the FFT calculation, then from the equation (3.10) is obtained:

At this stage, there is no non-trivial multiplication. Multiplication that happening is multiplication by twiddle factor $W_{1024}^{256 n_{0} k_{4}}$. Equation (7) ends the 1024 point FFT computation and output worth valid.
6. Step 6- The Reorder; Because the index value at the FFT output stage is not in correct order, where the index is generated at step 5 is $X_{5}\left(k_{4}+4 k_{3}+16 k_{2}+64 k_{1}+256 k_{0}\right)$, while the output index should be $\mathrm{X}\left(k_{0}+4 k_{1}+16 k_{2}+64 k_{3}+256 k_{4}\right)$. Therefore, it takes one more step, namely the sorting stage (reorder).

In overall conclusion, the calculation of 1024 point FFT computation requires 5 stages and one stage of preparation or sequencing. Architectural details can be seen in the Figure 3.


Figure 3 Architecture of 16 Point DIF FFT Calculation [9]


Figure 4 Architecture of I/FFT 1024 Point Radix 4
Figure 4 explains the data processing at each stage of 1024 point FFT. There are 5 stages of multiplication 4 point FFT butterfly. The output of each stage will be directly stored and processed for the next stage. Complex value which is the result of butterfly $W_{1024}^{n k}$, calculated by determining the cosine and sine values by
conventional calculation. The results of the calculation of complex values are then stored in ROM to meet the needs of each of its state.

### 3.2. MATLAB-based Simulation

In the process of designing a processor FFT 1024 point radix-4, the first stage is done by simulating the system in MATLAB. System design process in MATLAB refers to the design of the block FFT. Fourier transformation block is split into five main stages of derivation results FFT 1024 points. Furthermore, each stage of the FFT block is processed by the method of radix-4, and randomly pulled out to fit with the frequency decimation techniques. Finally, the data output of the third stage rearranged so that the order of the data returned to normal. The results of the function FFT is designed and then compared with the results of the function of the FFT in MATLAB 2012a. The following are the results of simulation in MATLAB with 4 times the input testing techniques. Figure 5 (a) and (b) shows the simulation results of FFT in MATLAB by providing appropriate input in the Table 1.

Table 1 Sample of Input Feed for In phase and Quadrature

| Index Number | $\mathbf{0}$ | $\mathbf{1}$ | $\mathbf{2}$ | $\mathbf{3}$ | $\mathbf{4}$ | $\mathbf{5}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| I | 1 | 1 | 1 | 1 | 0 | 0 |
| Q | 0 | 0 | 0 | 0 | 0 | 0 |



Figure 5 Value Plot of FFT 1024-Point in MATLAB for Data on Table I. (a) In-Phase Component, (b) Quadrature Component

## 4. Result

Figure 6 shows the result of the simulation on timing using ModelSim for 1024FFT system with appropriate input. The simulation as designed before take 4 stage to finish the calculation and reorder the results as shown in Figure 7.


Figure 6 First Stage Input Simulation System FFT 1024 Point Radix-4


Figure 7 Simulation Result FFT system 1024 points Radix-4
The first data output from the simulation results will be issued at 615.15 ns as show in Figure 7. This will be the processing delay on the FPGA if we simulate the system with 100 MHz clock as we targeted it on FPGA.


Figure 8. Dataflow Simulation of 1024-Point FFT Radix-4
Figure 8 shows the process of sending data from input to system output. So the data in the form of bits that were entered in the input will be processed in each block in accordance with their respective functions and then issued to the output. From the simulation results above, it was found that the design of FFT 1024 point system with radix-4 which has been designed has the same shape and value with modeling result in MATLAB and manual calculation on Ms. Excel. The system simulation to be implemented in this FPGA board requires a total clock cycle of 7177 clocks to run this system.

## 5. Conclusion

Design of FFT 1024-point radix-4 Cooley-Tukey algorithm using the method of Decimation in Frequency has been designed using VHDL language and verified using MATLAB. For system simulation, ModelSim is used to verify logic timing. This research will become our background for synthesize and implement the
system to FPGA. The FFT block also can be develop to implement the inverse Fast Fourier Transform.

## Bibliography

[1] Denny Darlis, Achmad Ali Muayyadi, Sony Sumaryo., "Perancangan dan Implementasi Prosesor OFDM Baseband untuk Prototipe Modem PLC pada FPGA", Jurnal Penelitian dan Pengembangan Telekomunikasi, Volume 15, No.2, Desember 2010, Institut Teknologi Telkom, ISSN No. 1410 - 7066
[2] Manalu, Manoto J.F. 2011. Perancangan dan Implementasi Prosesor I/FFT 512 Titik Radiks-8 Pada FPGA. Research on Telkom Institute of Technology: Unpublished.
[3] Irawan, Ade. 2008. Model dan Perancangan FFT/IFFT 256 Titik Radix-4 untuk Chipset Baseband WiMAX. Jurnal thesis pada Institut Teknologi Bandung: Unpublished.
[4] Kadiran, Kamaru A. 2005. Design and Implementation of OFDM Transmitter adn Receiver on FPGA Hardware. Thesis on Universitiy Technology of Malaysia : Unpublished.
[5] Khairuddin, Labib Ahmad. 2011. Perancangan dan Implementasi Prosesor FFT 256 Titik-OFDM Baseband Berbasis Pengkodean VHDL pada FPGA. Research on Telkom Institute of Technology: Unpublished.
[6] Maesa, Siska Cucu. 2011. Perancangan dan Implementasi Sistem Multicarrier Code Division Multiple Access (MC-CDMA) Menggunakan VHDL pada FPGA. Research on Telkom Institute of Technology: Unpublished.
[7] Nilsson, Magnus. 2001. FFT Realization and Implementation on FPGA. Griffith University.
[8] Orfanidis J., Sophocles. 2010. Introduction to Signal Processing. Rutgers University.
[9] Oppenheim, Alan V., 1998. Discrete-Time Signal Processing. Massachusetts Institute Of Technology.
[10] Pedroni, Volney A. 2004. Circuit Design With VHDL. MIT Press Cambridge, Massachusetts London, England.
[11] Uwe Meyer-Bease. Digital Signal Processing with Field Programmable Gate Array. Springle-Verlag, 2001
[12] Wada, Tomohisa. The 10th International LSI Design Contest. The University of The Ryukyus. [Online] October 2006. Available : http://www.ie.uryukyu. ac.jp/~wada/design07/spec_e.html.


[^0]:    * Corresponding author at:

    Diploma of Telecommunication Engineering, Telkom University,
    Jl. Telekomunikasi No. 1, Terusan Buah Batu, Bandung, 40257
    Indonesia.
    E-mail address: denny.darlis@tass.telkomuniversity.ac.id

[^1]:    https://doi.org/10.25124/ijait.v1i02.872
    Paper registration number IJAIT000010204 2017 © The Authors. Published by School of Applied Science, Telkom University.
    This is an open access article under the CC BY-NC license (https://creativecommons.org/licenses/by-nc/4.0/).

