**2. Review of RS decoding**

According to the coding theory in [4], the procedure for decoding RS code contains three main steps: **syndrome computation** (SC), **key equation solving** (KES) and **Chien search & error evaluation** (CSEE). Therefore, the decoding procedure of RS code is summarized as below:

**Step 1. (Syndrome computation)**: For an (*n*, *k*) RS code defined over GF(2*m*) whose primitive element is *α* in reference [4], let **C**(*x*) and **R**(*x*) be the transmitted and received codeword polynomial respectively, and then assumes **R**(*x*) = **C**(*x*) + **E**(*x*), where **E**(*x*) is the error polynomial which reflects the errors induced by transmission channel noise. Then, the syndrome polynomial **S**(*x*) is computed as follows:

$$\mathbf{S}(\mathbf{x}) \mathbf{\overline{s}} \mathbf{v} \mathbf{s} \mathbf{x} \mathbf{x} + \mathbf{s} \mathbf{x}^2 \mathbf{\overline{s}} \dots \mathbf{+} \mathbf{s} \mathbf{z} \mathbf{x}^{2t+1}, \text{ where } \mathbf{s} \equiv \mathbf{R}(\mathbf{a}^{i+1}) \text{ and } \mathbf{t} \models \mathbf{(n-k)}/2. \tag{1}$$

The architecture of SC block of an example RS (255, 239) decoder is shown in Fig. 1. Here **R**(*x*)=*rn*-1*xn*-1+*rn*-2*xn*-2+…+*r*1*x*+*r*0 is serially transmitted to SC block with the sequence of *rn*-1 , *rn*-2 ,…,*r*0. Every partial syndrome is calculated with shown multiply-accumulate circuits (MAC) in every clock cycle. After *n*=255 clock cycles, the 2*t*=16 syndromes are computed and serially transmitted to the next KES block.

**Figure 1.** The block diagram of syndrome computation for example RS (255, 239) code.

268 Optical Communication

application (beyond 100Gbps).

Section 6 draws the conclusion.

**2. Review of RS decoding** 

below:

There are several reasons for the wide application of RS code. First, modern long-distance optical network, especially long-haul network, is very high-speed system (10Gbps and beyond). For other promising FEC codes, such as LDPC or Turbo code, the corresponding decoding throughput usually can not meet such stringent requirement on data rate, or with the penalty of very high hardware complexity. Instead, RS decoder can achieve such high throughput with affordable hardware resource. Second, for local optical network, such as Ethernet network, the real-time response is an important metric for system design. Compared with its counterpart, RS code has the particular advantage on low decoding latency. Therefore, RS codes are widely employed in modern optical transmission system

Considering the importance of RS code, its efficient implementation is quite important for the optical transmission system. A low-complexity high-speed RS encoding/decoding system will improve the overall performance significantly. Particularly, since RS decoding is the most complex procedure in the RS-based FEC system, efficient RS decoder design should be well-studied. Therefore, targeted to different level of optical communication ranging from short-distance Ethernet network to long-haul backbone system, this chapter fully introduces efficient VLSI design of RS decoder. In addition, to meet the requirement of 100Gbps era, this chapter also discusses some new FEC schemes for ultra high-speed

The chapter is organized as follows. Section 2 reviews the RS decoding. The low-complexity high-speed RS decoders for short-distance network are discussed in Section 3. Section 4 analyzes performance-improved RS burst-error decoder for medium-distance system. Some recent FEC schemes targeted to 100Gbps long-haul network are introduced in Section 5.

According to the coding theory in [4], the procedure for decoding RS code contains three main steps: **syndrome computation** (SC), **key equation solving** (KES) and **Chien search & error evaluation** (CSEE). Therefore, the decoding procedure of RS code is summarized as

**Step 1. (Syndrome computation)**: For an (*n*, *k*) RS code defined over GF(2*m*) whose primitive element is *α* in reference [4], let **C**(*x*) and **R**(*x*) be the transmitted and received codeword polynomial respectively, and then assumes **R**(*x*) = **C**(*x*) + **E**(*x*), where **E**(*x*) is the error polynomial which reflects the errors induced by transmission channel noise.

**S**(*x*)=*s*0+*s*1*x*+*s*2*x*2+…+*s*2*t*-1*x*2*<sup>t</sup>*-1, where *si*=*R*(*αi*+1) and *t*=(*n*-*k*)/2. (1)

The architecture of SC block of an example RS (255, 239) decoder is shown in Fig. 1. Here **R**(*x*)=*rn*-1*xn*-1+*rn*-2*xn*-2+…+*r*1*x*+*r*0 is serially transmitted to SC block with the sequence of *rn*-1 , *rn*-2 ,…,*r*0. Every partial syndrome is calculated with shown multiply-accumulate circuits (MAC)

Then, the syndrome polynomial **S**(*x*) is computed as follows:

and are believed to play an important role in next generation optical networks.

**Step 2. (Key equation solving):** With the help of inputted **S**(*x*), in this step, Key equation solver (KES) block will calculate error evaluator polynomial **Ω**(*x*) and error locator polynomial **Λ**(*x*) by solving key equation: **Λ**(*x*)**S**(*x*)≡**Ω**(*x*) mod *x*2*<sup>t</sup>* . This part is the most important step in the whole RS decoding procedure, which usually dominates the performance of the overall decoder. Therefore, in this chapter, we focus on the algorithm and architecture optimization of KES block.

Generally, Berlekamp-Massey (BM) algorithm or modified Euclidean (ME) algorithm can be employed to solve key equation. To data, many efforts have addressed for efficient VLSI implementation of the above two algorithms. In [5], BM algorithm was reformulated as RiBM with the same regular architecture format compared with conventional ME algorithm in [6] and [7], and a folded BM algorithm based on RiBM was introduced in [8]. Reference [6] and [7] implemented conventional ME algorithm with systolic and recursive architecture. In Section 3 and Section 4, based on the above efforts, some improved KES algorithms and their corresponding hardware implementations will be discussed for efficient RS decoder design.

**Step 3. (Chien search & error evaluation):** After KES block finishes its computation for the current codeword, the calculated error locator polynomial **Λ**(*x*) and the error evaluator polynomial **Ω**(*x*) will be outputted to CSEE block to generate the error positions and magnitudes.

Chien search is a widely employed approach to look for error position. Its basic idea is simple but efficient: If Λ(*α*-*<sup>i</sup>* )=0 for current *i*, it indicates that the *i*-th symbol of the received codeword is wrong and needs to be corrected. After obtaining the position of error, the following Forney algorithm is applied to determine the error value:

$$Y\_i = -\frac{\Omega(\alpha)}{\varkappa \Lambda'(\alpha)}\bigg|\_{\mathfrak{x}=\alpha^{-i}}\tag{2}$$

where *Yi* is the error magnitude for the *i*-th erroneous symbol.

Based on the above described Chien search and Forney algorithm, the architecture of CSEE block for example RS (255, 239) code is illustrated in Fig. 2. It consists of several unit cells (shown in Fig. 2(a)). Both of the sub-blocks that carry out Chien search and Forney algorithm consist of these basic cells. In the beginning, *λi*, and *ωi*, (represented by *Ui* in the figure), as the coefficients of **Λ**(*x*) and **Ω**(*x*), are parallel loaded into these basic cells (enable=1). Then, during the next 255 cycles, those basic cells will carry out multiply iteratively. Fig. 2(b) is the overall architecture for CSEE block. Once a zero is detected in Chien search, the corresponding error magnitude will be computed via executing the above Forney algorithm.

**Figure 2.** (a) The diagram of CSEE cell. (b) The block diagram of CSEE.

The overall architecture of RS decoding is summarized in Fig. 3.

**Figure 3.** The overall architecture of RS decoder.

As mentioned in previous paragraph, since KES is the dominating step in the whole RS decoding, Section 3 and 4 will focus on the algorithm and architecture optimization of KES block.
