2. Theoretical background of LDPC decoding

LDPC codes are a class of linear algebraic codes, defined by a sparse parity check matrix H [1]. The LDPC code can also be represented by a bipartite graph, called the Tanner graph [6]. This graph contains two types of nodes: variable or bit nodes—corresponding to the columns in the H matrix and the codeword bits—and check nodes—corresponding to the rows in the H matrix and the parity check equations. A check node is connected to a variable node if the corresponding value in the parity check matrix is nonzero. Figure 1 depicts a simple parity check matrix and its associated Tanner graph. LDPC decoding is performed in an iterative manner, consisting of the message exchange between the check and variable nodes along the edges of the Tanner graphs in several rounds or iterations. This type of decoding is called message passing (MP) decoding [4]. LDPC codes defined in communication or storage standards use parity check matrices consisting of thousands of columns, such as the 2304 columns for WiMAX, 64800 columns for DVB-S2, or 1944 columns for WiFi. The number of nonzero entries on each column represents the variable node degree—dv —, while the number of the nonzero elements on each row in the H matrix represents the check node degree—dc. An LDPC code is said to be regular if all the rows/columns in the parity check matrix contain an equal number of nonzero entries; otherwise, the LDPC code is irregular.

In order to enable efficient hardware implementations, quasi-cyclic LDPC (QC-LDPC) codes are used in most of the standards [7]. These subclasses of LDPC codes present highly structured parity check matrices, defined by blocks of circulant matrices. A QC-LDPC code is defined by a base matrix B, consisting of -1 elements and nonnegative elements. The parity check matrix H is obtained from the matrix B in the following way: -1 elements are expanded by z z all 0 matrix, while nonnegative elements within the matrix B are expanded by the z z identity matrix permutated with the nonnegative element value. The coefficient z is known as the expansion factor for the QC-LDPC code. Figure 2 depicts the B matrix for the WiMAX

Design Trade‐Offs for FPGA Implementation of LDPC Decoders http://dx.doi.org/10.5772/66085 107

Figure 1. Parity check matrix and its associated Tanner graph.

architectures [2]. These trade-offs take into account the throughput, error correction capability,

In this chapter, we will present the most important architectural options for both flooded and layered LDPC decoders implemented on field programmable gate array (FPGA) devices. The implementation of LDPC decoders on such devices is motivated by the increased flexibility of FPGAs, which make them suitable to implement highly versatile solutions in both wireless communications—such as software defined radios—and storage systems—such as software defined storage—as well high level of parallelism degree for fixed-point computations, which

This book chapter is organized as follows: Section2 presents the algorithms for the LDPC decoding, as well as the strategies for it; Section3 summarizes the main features and building blocks of modern FPGA devices; Section4 presents the implementation and design trade-offs for FPGA-based flooded LDPC decoding architectures; layered architectures are detailed in

LDPC codes are a class of linear algebraic codes, defined by a sparse parity check matrix H [1]. The LDPC code can also be represented by a bipartite graph, called the Tanner graph [6]. This graph contains two types of nodes: variable or bit nodes—corresponding to the columns in the H matrix and the codeword bits—and check nodes—corresponding to the rows in the H matrix and the parity check equations. A check node is connected to a variable node if the corresponding value in the parity check matrix is nonzero. Figure 1 depicts a simple parity check matrix and its associated Tanner graph. LDPC decoding is performed in an iterative manner, consisting of the message exchange between the check and variable nodes along the edges of the Tanner graphs in several rounds or iterations. This type of decoding is called message passing (MP) decoding [4]. LDPC codes defined in communication or storage standards use parity check matrices consisting of thousands of columns, such as the 2304 columns for WiMAX, 64800 columns for DVB-S2, or 1944 columns for WiFi. The number of nonzero entries on each column represents the variable node degree—dv —, while the number of the nonzero elements on each row in the H matrix represents the check node degree—dc. An LDPC code is said to be regular if all the rows/columns in the parity check matrix contain an equal

In order to enable efficient hardware implementations, quasi-cyclic LDPC (QC-LDPC) codes are used in most of the standards [7]. These subclasses of LDPC codes present highly structured parity check matrices, defined by blocks of circulant matrices. A QC-LDPC code is defined by a base matrix B, consisting of -1 elements and nonnegative elements. The parity check matrix H is obtained from the matrix B in the following way: -1 elements are expanded by z z all 0 matrix, while nonnegative elements within the matrix B are expanded by the z z identity matrix permutated with the nonnegative element value. The coefficient z is known as the expansion factor for the QC-LDPC code. Figure 2 depicts the B matrix for the WiMAX

cost of the hardware implementation, and power consumption.

106 Field - Programmable Gate Array

ensure the possibility of obtaining high throughputs for the decoders.

Section5; last section is dedicated to the concluding remarks.

2. Theoretical background of LDPC decoding

number of nonzero entries; otherwise, the LDPC code is irregular.


Figure 2. Base matrix for WiMAX rate ½ LDPC code [2].

LDPC code, rate ½, with 2304 columns and 1152 rows, and an expansion factor of 96. A horizontal layer of H matrix is defined as the set of z consecutive rows which correspond to one row within the base matrix. Composite layers, consisting of integer multiples of z rows within the parity check matrix, may be also used.

MP LDPC decoding may be performed using different scheduling strategies. These strategies indicate the order in which the check node and variable node computations are performed during the decoding iterations [13]. Two types of strategies may be employed: flooded and layered. Flooded decoding represents the conventional approach for decoding: each iteration consists of the update of messages at the check nodes, which subsequently pass their output messages to the variable nodes, which, in turn, update their corresponding messages [5]. Using this strategy, both the variable nodes and the check nodes are updated once per iteration. The layered scheduling consists of splitting the parity check matrix in horizontal layers; these layers are processed in a serial manner, while the check node updates within the same layer are processed in a similar manner with the flooded scheduling [8]. The variable node updates are performed after each layer processing. Therefore, in layered scheduling, the updates per iteration at variable node level are equal to the number of layers. Layered scheduling has two major advantages with respect to flooded: (i) faster convergence and (ii) reduced memory requirement [8]. The flooded approach has the advantage of increase resilience to faults in the hardware architectures [9], as well as the possibility for very high throughputs due to the high level of parallelism at the decoder level.

LDPC decoding can be performed by different types of algorithms, with different error correction capabilities. These can be split into two major classes [13]:


In this chapter, we will discuss the implementation aspects related to the soft-decision-based LDPC decoders. The most important class of soft-decision LDPC decoding is represented by the min-sum (MS) algorithm [13] and its variants: offset MS (OMS) [10], normalized MS (NMS) [10], self-correcting MS (SCMS) [11], and finite alphabet iterative decoding (FAID) [12]. In these algorithms, the following messages are used [13]:


Flooded MS decoding of LDPC codes consists of several iterations, where each variable node message—and check node message—is updated once. Each iteration consists of the following steps [5, 13]:

1. Variable node update

layered scheduling consists of splitting the parity check matrix in horizontal layers; these layers are processed in a serial manner, while the check node updates within the same layer are processed in a similar manner with the flooded scheduling [8]. The variable node updates are performed after each layer processing. Therefore, in layered scheduling, the updates per iteration at variable node level are equal to the number of layers. Layered scheduling has two major advantages with respect to flooded: (i) faster convergence and (ii) reduced memory requirement [8]. The flooded approach has the advantage of increase resilience to faults in the hardware architectures [9], as well as the possibility for very high throughputs due to the high

LDPC decoding can be performed by different types of algorithms, with different error correc-

1. Hard-decision algorithms: These algorithms rely on 1-bit messages exchanged between the processing units. Such algorithms include bit-flipping, gradient descent and probabilistic gradient descent bit-flipping, Gallagher-A and Gallagher-B. The advantage of these algorithms is represented by the low requirements in terms of resource usage and power consumption. Their main drawback is represented by their low error correction capability with respect to soft-decision algorithms, for both BSC and BIAWGN channel models. 2. Soft-decision algorithms: These algorithms use messages quantized on several bits (usually between 3 and 7), which are exchanged between the variable nodes and check nodes. The hardware implementations for soft-decision algorithms are significantly more costly with respect to the hard-decision versions. However, using soft decoding, LDPC codes are able to have the capacity approaching error correction capabilities which make them suitable

In this chapter, we will discuss the implementation aspects related to the soft-decision-based LDPC decoders. The most important class of soft-decision LDPC decoding is represented by the min-sum (MS) algorithm [13] and its variants: offset MS (OMS) [10], normalized MS (NMS) [10], self-correcting MS (SCMS) [11], and finite alphabet iterative decoding (FAID) [12]. In these

1. Input log-likelihood-ratio (LLR): These messages represent the input from the communication channel. For BSC channel model—used in storage systems—the input LLR is on 1 bit, while for BIAWGN channel model—used in wireless communication—the input LLR is quantized on several bits. The input LLR is denoted as γ and is quantized on quantðγÞ bits. 2. Variable node messages: These messages are the outputs of the check node units and serve as inputs for the variable node units. These messages are denoted as α and are quantized

3. Check node messages: These messages represent the output of the variable nodes and are the inputs for the check nodes. These messages are denoted as β and are quantized on

4. A posteriori LLR (AP-LLR): These messages represent the output of each decoding iteration/layer. The output of the decoder is given by the sign of the AP-LLR. It is denoted as γ~.

level of parallelism at the decoder level.

108 Field - Programmable Gate Array

tion capabilities. These can be split into two major classes [13]:

candidates for a wide range of communication standards.

algorithms, the following messages are used [13]:

on quantðαÞ bits.

quantðβÞ bits.

$$\alpha\_{i,j} = \gamma\_i + \sum\_{k \in \{\mathcal{C}(i) \nparallel\}} \beta\_{i,k}, \forall j \in \mathcal{C}(i) \tag{1}$$

$$
\tilde{\boldsymbol{\gamma}}\_i = \boldsymbol{\gamma}\_i + \sum\_{k \in \mathcal{C}(j)} \boldsymbol{\beta}\_{i,k}, \forall j \in \mathbf{C}(i) \tag{2}
$$

2. Check node update

$$\text{sign}(\beta\_{l,j}) = \prod\_{k \in \{V(l) \neq \underline{l}\}} \text{sign}(\alpha\_{l,k}), \forall j \in V(l) \tag{3}$$

$$|\beta\_{l,j}| = \min(\alpha\_{l,k}), \forall j \in V(l), k \in \{V(l) | j\} \tag{4}$$

CðiÞ denotes all the check node messages connected to the variable node i, while VðlÞ denotes all the variable node messages connected to the check node l. The number of variable nodes is equal to number of columns in the parity check matrix, while the number of check nodes is equal to the number of rows in the H matrix.

Layered decoding is performed layer by layer, each layer consisting of the following steps [8, 13]:

1. Variable node update

$$
\alpha\_{i,j} = \tilde{\gamma}\_i \neg \beta\_{i,j}, \forall j \in V(i) \tag{5}
$$

2. Check node update

$$\operatorname{sign}(\beta\_{i,j}) = \prod\_{k \in \{V(i) \neq j\}} \operatorname{sign}(\alpha\_{i,k}), \forall j \in V(i) \tag{6}$$

$$|\beta\_{i,j}| = \min(\alpha\_{i,k}), \forall j \in V(i), k \in \{V(i) | j\}\tag{7}$$

#### 3. AP-LLR update

$$
\tilde{\boldsymbol{\gamma}}\_{i} = \alpha\_{i,j} - \boldsymbol{\beta}\_{i,j}, \forall j \in V(i) \tag{8}
$$

Both for flooded and layered scheduling, decoding is stopped either when a codeword is found—all the parity check equations are satisfied—or when the maximum number of iterations is reached.

The MS decoding, in both layered and flooded strategies, comprises of simple arithmetic operations, performed on small operands (3–8 bits). The variations of the MS algorithms target decoding performance improvement. OMS and NMS are based on the fact that the minimum computation at the check node level represents an overestimation of the check node message [10]. Therefore, both approaches try to reduce the value of the check node message computed by the check node unit.

The OMS approach uses a -1 addition from the absolute value of the βi,<sup>j</sup> in order to reduce its value. The check node computation in the OMS algorithm becomes:

$$\operatorname{sign}(\beta\_{i,j}) = \prod\_{k \in \{V(i) \mid i\}} \operatorname{sign}(\alpha\_{i,k}), \forall j \in V(i) \tag{9}$$

$$|\vec{\beta\_{i,j}}| = \min(\alpha\_{i,k}), \forall \mathbf{j} \in V(i), k \in \{V(i) \mathbf{j}\} \tag{10}$$

$$|\beta\_{i,j}| = |\beta\_{i,j}| \text{-1} \tag{11}$$

The NMS approach uses scaling of the absolute value of the βi,<sup>j</sup> in order to reduce its value, by a normalization factor λ (usually with the values of 0.75 or 0.875) multiplication. The check node computation in the NMS algorithm becomes:

$$\text{sign}(\beta\_{i,j}) = \prod\_{k \in \{V(i) \neq j\}} \text{sign}(\alpha\_{i,k}), \forall j \in V(i) \tag{12}$$

$$|\vec{\beta\_{i,j}}| = \min(\alpha\_{i,k}), \forall \mathbf{j} \in V(i), k \in \{V(i) \mathbf{j}\} \tag{13}$$

$$|\beta\_{i,j}| = \lambda \ast |\bar{\beta}\_{i,j}|\tag{14}$$

SCMS represents an approach which aims at improving the error correction capability by erasing the variable node messages which change their sign after an iteration [11]. The erasure process cannot be performed in two consecutive iterations. The modification of the variable node update for a layered scheduling for the SCMS algorithm is:

$$
\alpha\_{i,j}^{new} = \check{\gamma}\_i \neg \beta\_{i,j}, \forall j \in V(i) \tag{15}
$$

$$\mathfrak{e}\_{i,j}^{new} = (!e\_{i,j}^{old}) \mathfrak{e}(\operatorname{sign}(\alpha\_{i,j}^{new}) \oplus \operatorname{sign}(\alpha\_{i,j}^{old})) \tag{16}$$

$$\alpha\_{i,j} = \begin{cases} \alpha\_{i,j}^{new}, e\_{i,j}^{new} = 0\\ 0, e\_{i,j}^{new} = 1 \end{cases} \tag{17}$$

FAID decoding aims at improving the error floor region of the LDPC decoding. It changes the variable node operations, by implementing nonlinear dedicated function for the variable node message update, based on the channel information and the check node messages [12]. For a flooded scheduling, the variable node processing becomes:

$$\alpha\_{i,j} = \text{FAID}(\gamma\_i, \beta\_{i,k}), \forall j \in \mathbb{C}(i), \forall k \in \{\mathbb{C}(i) \backslash j\} \tag{18}$$

$$
\tilde{\mathcal{V}}\_i = \mathcal{V}\_i + \sum\_{k \in \mathcal{C}(j)} \beta\_{i,k}, \forall j \in \mathcal{C}(i) \tag{19}
$$

The implementation of the FAID function is done using dedicated look-up tables (LUT). The complexity of these tables is dependent on the check node message quantization and the variable node degree dv.
