3. Architectural components of FPGA devices

The MS decoding, in both layered and flooded strategies, comprises of simple arithmetic operations, performed on small operands (3–8 bits). The variations of the MS algorithms target decoding performance improvement. OMS and NMS are based on the fact that the minimum computation at the check node level represents an overestimation of the check node message [10]. Therefore, both approaches try to reduce the value of the check node message computed

The OMS approach uses a -1 addition from the absolute value of the βi,<sup>j</sup> in order to reduce its

The NMS approach uses scaling of the absolute value of the βi,<sup>j</sup> in order to reduce its value, by a normalization factor λ (usually with the values of 0.75 or 0.875) multiplication. The check

j ¼ λ � j β

SCMS represents an approach which aims at improving the error correction capability by erasing the variable node messages which change their sign after an iteration [11]. The erasure process cannot be performed in two consecutive iterations. The modification of the variable

−βi,<sup>j</sup>

<sup>i</sup>,<sup>j</sup> ,enew <sup>i</sup>,<sup>j</sup> ¼ 0

0,e new <sup>i</sup>,<sup>j</sup> ¼ 1

FAID decoding aims at improving the error floor region of the LDPC decoding. It changes the variable node operations, by implementing nonlinear dedicated function for the variable node message update, based on the channel information and the check node messages [12]. For a

<sup>i</sup>,<sup>j</sup> <sup>Þ</sup>&ðsignðαnew

<sup>α</sup>i,<sup>j</sup> <sup>¼</sup> <sup>α</sup>new

� i,j

<sup>i</sup>,<sup>j</sup> <sup>Þ</sup>⊕signðαold

signðαi, <sup>k</sup>Þ, ∀j∈VðiÞ (9)

j−1 (11)

signðαi, <sup>k</sup>Þ, ∀j∈VðiÞ (12)

j (14)

, ∀j∈VðiÞ (15)

<sup>i</sup>,<sup>j</sup> ÞÞ (16)

(17)

j ¼ minðαi, <sup>k</sup>Þ, ∀j∈VðiÞ, kϵfVðiÞ\jg (10)

j ¼ minðαi, <sup>k</sup>Þ, ∀j∈VðiÞ, kϵfVðiÞ\jg (13)

Þ ¼ ∏ kϵfVðiÞ\jg

> jβi,<sup>j</sup> j¼j β � i,j

Þ ¼ ∏ kϵfVðiÞ\jg

jβi,<sup>j</sup>

value. The check node computation in the OMS algorithm becomes:

signðβi,<sup>j</sup>

signðβi,<sup>j</sup>

j β � i,j

node update for a layered scheduling for the SCMS algorithm is:

e new <sup>i</sup>,<sup>j</sup> ¼ ð!e

flooded scheduling, the variable node processing becomes:

αnew <sup>i</sup>,<sup>j</sup> ¼ γ~<sup>i</sup>

old

j β � i,j

node computation in the NMS algorithm becomes:

by the check node unit.

110 Field - Programmable Gate Array

FPGAs are digital devices with a programmable structure. This programmable structure provides FPGAs with very high flexibility, which makes them the ideal candidates for prototyping, as well as products with very low time-to-market constraints or applications which require high degree of flexibility. Furthermore, FPGAs have a built-in structure which allows a high degree of parallelization for applications that rely on fixed-point computations.

The main digital building blocks of modern FPGA devices are the configurable logic block (CLB), the embedded memory block RAM (BRAM), and the DSP block. DSP blocks implement 18 bit or wider multiplication, multiply-accumulate or multiply-add fused, and addition operations [14]. Because they are optimized for operand sized of 18 bit or more, and mainly for multiplication-based operations, they are of little use for the implementation of LDPC decoders.

CLBs are the main logic resource, which implement both sequential and combinational logic elements [15]. Usually, CLBs are composed of several slices, each of the slice being composed of a look-up table (LUT) and a D flip-flop, plus additional dedicated logic, such as logic and dedicated wire for ripple carry addition. The combinational logic is implemented using LUT, with modern FPGAs having six-input LUTs. Therefore, in a LUT and flip-flop pair, six-input combinational functions have the same cost as one or two input combinational functions. For specific families, the LUT can also be used as a memory circuit such as the distributed RAM in Xilinx FPGAs. The D flip-flop is used as the basic sequential logic. Because the combinational logic is paired with the D flip-flop in the same structural unit, pipelining can be easily and without significant resource consumption implemented in modern FPGA devices.

Another important feature of modern FPGAs is represented by the built-in memory blocks [16]. For large memories, FPGAs include the block RAM, which is block of 9 or 18 kbits. They have configurable width (9, 18, 36, or 72 bit), with the depth of the BRAM being determined by the width (for an 18 kbit BRAM and 72 bit word, the depth is 512 words). The number of BRAMs for a design is highly dependent on the width and the depth of required memory. For example, a memory which requires 96 bit words, and only 64 words, will consume 2 BRAM blocks, although the number of memory bits is significantly less with respect to the number of memory bits in a BRAM. Another important issue of the BRAM block is the number of read/ write ports: it is optimized for 1 read and 1 write port. The maximum number of memory ports for a BRAM is 2 read and 2 writes, but with limitations in the size of the word. For memories with few bits, and/or memories with a high number of ports, the distributed RAM implemented in CLBs is used.

From an LDPC decoder perspective, the FPGA implementation will make use of the CLBs for the implementation of the processing nodes and the routing network, and memories, either BRAM or distributed RAM.
