**4. High-performance low-complexity RS burst-error decoders for mediate distance network**

278 Optical Communication

1 2 21 , ... *<sup>t</sup> SS S*

( ) *<sup>j</sup> r*

> *r*

**Figure 8.** The diagram of overall PI-iBM architecture.

**(μm)** 

**Design Tech.**

**Table 2.** Implementation results and comparisons

**Retimed iBM** 

**Multi-mode**

**Systolic ME**

*j*

**Figure 7.** The diagram of PI-DC block.

( ) *r*

SEL( )*r*

**in [17]** 0.18 8423 654 5.23 288

**RiBM in [18]** 0.18 9566 400 3.20 128

**Folded ME in [7]** 0.13 17000 625 5.00 256 **PI-iBM** 0.18 10951 980 12.5 160

**in [19]** 0.13 102500 770 6.16 80

**Total of gates** 

*r* <sup>6</sup> ( ) *<sup>j</sup>*

( ) *r*

<sup>3</sup> ( ) *<sup>j</sup> r*

*j*

SEL( ) *r*

( ) *<sup>j</sup> r* <sup>3</sup> ( ) *<sup>j</sup>*

> **fmax (MHz)**

ctrl1

( ) *r*

( )*r* ( )*r*

0

**Throughput (Gb/s)** 

**Latency (cycles)** 

<sup>6</sup> ( ) *<sup>j</sup> r*

*j*

For mediate distance optical network, such as Metro Ethernet network, traditional RS decoding can not provide enough coding gain for the data transmission in this scenario. Instead, enhanced FEC scheme should be employed for improved error-correcting capability. Notice that in this kind of systems, long burst error is the major error pattern in transmission procedure; therefore, burst-error decoding algorithm and architecture are attractive solutions for this case. In this chapter, we introduce efficient burst-error-correcting RS decoder to meet the requirement in this type of application.

#### **4.1. Reformulated inversionless burst-error correcting (RiBC) algorithm**

As excellent Maximum Distance Separable (MDS) code [20], RS code is very effective in correcting long burst errors. However, previous RS burst decoding algorithms in [21] and [22] are infeasible for hardware implementation due to their high computation complexity. In [20], Wu proposed a new approach to track the position of burst of errors. By introducing a new polynomial that is a special linear function of syndromes, this approach can correct a long burst of errors with length up to 2*t*-1-2*β* plus a maximum of *β* random errors. Here *β* is a pre-chosen parameter that determines the specific error correcting capability. In this case, the miscorrection probability is upper bounded by (*n*-2*f*)(*n*-*f*)*β*2*<sup>m</sup>*(*β*+*f-*2*t*) .

Although the approach in [20] has reduced computation complexity, it still contains inversion operation and long data path, which impedes its efficient VLSI implementation, therefore, the algorithm in [20] was reformulated to the RiBC algorithm. The RiBC algorithm is a kind of list decoding algorithm. 8 polynomials are updated simultaneously in each iteration. After every 2*<sup>β</sup>* inner iterations, (2 ) ( ) *x* , as the candidate of the error locator polynomial of the random errors, is computed for current *l*-th outer iteration. When *l* reaches *n*, we track the (2 ) ( ) *x* that is identical for longest consecutive *l*, and record the last element *l* \* of the consecutive *l*'s. Then the corresponding (2 )( ) *x* and (2 ) ( ) *x* at the *l*\* -th loop are marked as overall error locator polynomial \* ( ) *x* and error evaluator polynomial \* ( ) *x* respectively. Finally Forney algorithm is used to calculate the error value in each error position with the miscorrection probability up to (*n*-2*f*)(*n*-*f*)*β*2*<sup>m</sup>*(*β*+*f-*2*<sup>t</sup>* .

The RiBC algorithm is targeted for correcting burst error plus some random errors. By observing step2.3 and step 2.4, it can be founded that both of them are quite similar to the essential update equations in RiBM algorithm (see [5]). Therefore, it inspires us that both of the RiBC algorithm for burst-error correction and RiBM for random error correction can be implemented one the same hardware. Furthermore, considering single burst error correcting algorithm in [20] is a specific instance of RiBC algorithm with *β*=0, so it can also be implemented on the RiBC architecture. Accordingly, a unified hybrid RS decoder, which can be configured to the above three types of error correcting mode, is introduced in the next subsection.

#### 012 2 Reformulated Inversionless Burst-Error Correcting (RiBC) Algorithm -length ( 2 - 2 ) burst of errors plus maximum random errors ): Input: Syndromes , , ,..., *<sup>t</sup> f ft SSS S* 1 1 (2 2 ) 2 (2 2 ) 1 2 2 2 1 2 22 ; step1: Compute ( ) (1 )(1 )...(1 )(1 ) 1 ... ; step2: For 0 Step 1 Until 1 do step2.1: Comp *t t t t x x x xx xx x l n* 2 2 2 1 2 22 2 2 2 1 0 1 21 2 2 0 (0) (0) (0) 0 1 ute ( ) ( ) 1 ... ; step2.2: Compute ( ) ... ,where ; step2.3: Initialize ( ) .. *l t t <sup>t</sup> <sup>t</sup> t i ji t j j x x xx x xx x S x x* (0) 2 2 2 2 (0) (0) (0) (0) 2 2 0 1 22 (0) (0) (0) (0) 2 2 0 1 2 2 . ( ); () ... ( ); () ... 1; *t t t t <sup>t</sup> <sup>t</sup> x x B x b bx b x x x xx* (0) (0) (0) (0) 2 2 0 1 22 (0) (0) (0) (0) 2 1 0 1 2 1 () ... 1; () ... ( ); *<sup>t</sup> <sup>t</sup> <sup>t</sup> <sup>t</sup> B x b bx b x x x xx* (0) (0) (0) (0) 2 1 0 1 2 1 \*(0) \*(0) \*(0) \*(0) 2 1 0 1 2 1 \*(0 () ... ( ); () ... ( ); *<sup>t</sup> <sup>t</sup> <sup>t</sup> <sup>t</sup> x x xx x x x Sx* ) \*(0) \*(0) \*(0) 2 1 0 1 21 (0) (0) ( ) ... ( ); = 1, 0; step2.4: For 0 Step 1 Until 2 1 do *<sup>t</sup> x x xS <sup>t</sup> x k r* ( 1) ( ) ( ) ( ) ( ) 1 0 \*( 1) \*( ) ( ) \*( ) ( ) 1 0 step2.4.1: Compute ; ; *r rr <sup>r</sup> <sup>r</sup> i ii r rr <sup>r</sup> <sup>r</sup> i ii* ( ) ( 1) ( ) ( ) ( ) <sup>0</sup> <sup>1</sup> ( 1) ( ) ( ) ( ) ( ) 0 1 ; ; *<sup>r</sup> r rr r i ii r r rr <sup>r</sup> i ii b b* ( ) ( ) <sup>0</sup> () () ( 1) <sup>1</sup> () () ( 1) <sup>1</sup> ( 1) \*( 1) step2.4.2: If 0 and 0 then 1; else 0; step2.4.3: *<sup>r</sup> <sup>r</sup> r r <sup>r</sup> i i <sup>i</sup> r r <sup>r</sup> i i <sup>i</sup> r i i r k aa b b b b* ( ) ( ) 1 \*( ) \*( ) *r r i r r a*

$$\begin{array}{c|cc} \text{4.3:} & \partial\_{i} & = & \partial^{r+1} & \partial\_{l} \\ & \widetilde{\partial}\_{i}^{\*(r+1)} & = & \widetilde{\partial}\_{i+1}^{\*(r)} & \widetilde{\partial}\_{i}^{\*(r)} \\ & \widetilde{\mathcal{V}}^{(r+1)} & = & \widetilde{\partial}\_{0}^{(r)} & \mathcal{V}^{(r)} \\ & k^{(r+1)} & = & k^{(r)} - 1 & k^{(r)} + 1 \end{array}$$

 (2 ) \* step3: Track the longest consecutive ( ) that are identical,recorded the last element of t *x l* \* (2 ) \* he consectuive 's, then the overall error locator polynoimal ( ) = ( ) at the -th outer loop. *l x xl* 

 (2 ) \* \* The overall evaluator polynomial ( ) is corresponding ( ) at he -th *x x l* \* \* outer loop. Output: ( ), ( ) *x x*

#### **4.2 Unified hybrid decoding (UHD) architecture**

280 Optical Communication

*f ft*

step2.1: Comp

012 2

1 ... ;

step2.3: Initialize ( ) ..

step2.4: For 0 Step 1 Until 2 1 do

*r*

= 1, 0;

step2.4.3:

\*

\* (2 ) \*

*x xl*

( ) = ( ) at the -th outer loop.

*l*

\* \*

element of t

Output: ( ), ( ) *x x*

\*(0

(0) (0)

;

;

 

*k*

*l*

( 1) \*( 1) ( 1) ( 1)

*r*

 

*r i r r*

 

*k*

*SSS S*

Input: Syndromes , , ,..., *<sup>t</sup>*

step2: For 0 Step 1 Until 1 do

*l n*

Reformulated Inversionless Burst-Error Correcting (RiBC) Algorithm -length ( 2 - 2 ) burst of errors plus maximum random errors ):

1

; step1: Compute ( ) (1 )(1 )...(1 )(1 )

1 2 22

 () ... ( ); () ... 1;

 () ... 1; () ... ( );

() ... ( );

*xx x*

 

> 

1 2 22

*l t*

*xx x S*

*<sup>t</sup> <sup>t</sup> <sup>t</sup> <sup>t</sup>*

 

*<sup>t</sup> <sup>t</sup> <sup>t</sup> <sup>t</sup>*

 

0 1 21 2 2

. ( );

*x x*

*t t t t <sup>t</sup> <sup>t</sup>*

 

(0) 2 2 2 2

2 2 2

 

*t <sup>t</sup> <sup>t</sup>*

2 2 2 1

 

*t i ji t j j*

0

 

 

ute ( ) ( ) 1 ... ;

 

 

 

 

   

( 1) ( ) ( ) ( ) ( ) 1 0

 

 

> 

 

1 1

   

*b b*

*k aa*

*a*

 

*a*

outer loop.

(2 ) \* \*

*x x l*

*r rr <sup>r</sup> <sup>r</sup> i ii*

*x x xx x*

1 (2 2 ) 2 (2 2 ) 1

 

*x x x xx*

step2.2: Compute ( ) ... ,where ;

*t*

 

2 2 2

 

(0) (0) (0) 0 1

 

 

 

step2.4.1: Compute ;

;

*x x*

(0) (0) (0) (0) 2 2 0 1 22 (0) (0) (0) (0) 2 2 0 1 2 2

(0) (0) (0) (0) 2 2 0 1 22 (0) (0) (0) (0) 2 1 0 1 2 1

(0) (0) (0) (0) 2 1 0 1 2 1 \*(0) \*(0) \*(0) \*(0) 2 1 0 1 2 1

( ) ( ) <sup>0</sup>

*i i*

(2 )

The overall evaluator polynomial ( ) is corresponding ( ) at he -th

step3: Track the longest consecutive ( ) that are identical,recorded the last

*<sup>r</sup> <sup>r</sup>*

 ) \*(0) \*(0) \*(0) 2 1 0 1 21

() ... ( );

*B x b bx b x*

*B x b bx b x x x xx*

*x x xx*

*x x xx x x x Sx*

( ) ... ( );

*<sup>r</sup> r rr r i ii r r rr <sup>r</sup> i ii*

 

 

 

 

 

*x*

*k k*

he consectuive 's, then the overall error locator polynoimal

( ) ( ) 1 \*( ) \*( ) 1

*r r i r r i i <sup>r</sup> <sup>r</sup> r r*

 

> ( ) ( ) <sup>0</sup> () ()

() () ( 1) <sup>1</sup> () () ( 1) <sup>1</sup>

 

*r r <sup>r</sup> i i <sup>i</sup> r r <sup>r</sup> i i <sup>i</sup>*

*b b b b*

step2.4.2: If 0 and 0 then 1; else 0;

 ( ) ( 1) ( ) ( ) ( ) <sup>0</sup> <sup>1</sup> ( 1) ( ) ( ) ( ) ( ) 0 1

 

\*( 1) \*( ) ( ) \*( ) ( ) 1 0

 

*r rr <sup>r</sup> <sup>r</sup> i ii*

*<sup>t</sup> x x xS <sup>t</sup> x*

 

*t*

 

*t t*

The overall UHD architecture is shown in Fig. 9. Here different blocks are used to process different steps in algorithm. Since excluding KES and PT blocks, other blocks are quite straightforward to be implemented; in this section we only introduce the architectures of KES and PT blocks and focus the discussion for the case of RiBC work mode. Interested readers can refer to [23] for the introduction of other blocks and other modes.

**Figure 9.** The overall architecture of the UHD decoder. Three types of lines illustrate data flows for different work modes: solid line (mode-1) for burst combined with random error correction RiBC algorithm, dashed line (mode-2) for only burst-error correction and dotted line (mode-3) for only random error correction.

#### *4.2.1. KES block architecture*

For RiBC algorithm, KES block is employed to carry out steps 2.4. Fig. 10 presents the overall architecture of KES block and the internal structure of its two types of processing elements (PE): PE0 and PE1. As shown in Fig. 10(a), the KES block consists of 2*t*-1 PE0's and 2*t* PE1's. In the *r*-th iteration, each register in PE0*i*/PE1*i* stores the corresponding coefficients of different polynomials (Fig. 10(b) (c)). For each outer iteration, it takes 2*β* cycles to compute (2 ) *i* and (2 ) *i* as the coefficients of (2 )( ) *x* and (2 ) ( ) *x* . Meanwhile, (2 ) *i* will also be computed and outputted into PT block to track the longest consecutive (2 ) ( ) *x* that are identical.

#### *4.2.2. Position track (PT) block architecture*

PT block is used to track the longest consecutive polynomials that are identical (step 3). Fig. 11 illustrates the architecture of PT block. The input (2 ) *i* , (2 ) *i* and (2 ) *i* from KES block at the *l-*th outer iteration are denoted as *<sup>i</sup>* ( )*l* , ( ) *<sup>i</sup> <sup>l</sup>* and *<sup>i</sup>* ( )*l* . In addition, *<sup>i</sup>* ( ) *temp* represents *<sup>i</sup>* ( 1) *l* , while *<sup>i</sup>* ( ) *store* are the coefficients of current continuously identical (2 ) ( ) *x* . Moreover, *<sup>i</sup>* ( ) *longest* stores the coefficients of current *longest* continuously identical (2 ) ( ) *x* . Control signals *shift* and *equal* are generated from the signal generation schedule. After *l* reaches n, ( ) *<sup>i</sup> longest* and *<sup>i</sup>* ( ) *longest* are outputted as the coefficients of overall error locator polynomial \* ( ) *x* and overall error evaluator polynomial \* ( ) *x* .

**Figure 10.** (a) The overall architecture of KES block. (b) The block diagram of PE0*i*. (c) The block diagram of PE1*i*.

**Figure 11.** The architecture of PT block for mode-1.

Table 3 presents the comparison between UHD and RiBM decoder. Here for the example RS (255, 239) code, *n*=255, *t*=8 and *m*=8. The hardware complexity is estimated based on the work in [24]. Although the area requirement of the UHD decoder is about 1.7 times of that of the RiBM decoder, the UHD decoder can achieve significantly enhanced burst-error


( )*r i b*

( ) 0 *r* 

( )*r* 

( ) *r bi*

> <sup>0</sup> ( )*l* <sup>0</sup> ( ) *temp*

<sup>1</sup> ( )*l* <sup>1</sup> ( ) *temp*

2 2 *t* ( )*l* 2 2 *t* ( ) *temp*

2 2 *<sup>t</sup>* ( )*l* 2 2 *<sup>t</sup>* ( ) *temp*

Ctrl

(c) The block diagram of PE1*i*.

2 2 *t* ( ) *store* 2 2 *t* ( ) *longest*

 <sup>0</sup> ( ) *store*

<sup>1</sup> ( ) *store*

2 2 *<sup>t</sup>* ( ) *store*

**Figure 11.** The architecture of PT block for mode-1.

<sup>0</sup> ( ) *longest*

<sup>1</sup> ( ) *longest*

2 2 *<sup>t</sup>* ( ) *longest*

Group A Ctrl1

( ) 0 *r* 

( )*<sup>r</sup> i* ( ) *r i*

> ( )*r*

> > ( ) 0 *r*

*i* 1 

> ( ) 1 *r i b* ( ) 1 *r b i*

**Figure 10.** (a) The overall architecture of KES block. (b) The block diagram of PE0*i*.

 0( )*l* 

 1( )*l* 

 2 2 *t* ( )*l* 2 2 *t* ( ) *store* 2 2 *t* ( ) *longest*

 2 1 *<sup>t</sup>* ( )*l* 

0( ) *store*

1( ) *store*

2 1 *<sup>t</sup>* ( ) *store*

Table 3 presents the comparison between UHD and RiBM decoder. Here for the example RS (255, 239) code, *n*=255, *t*=8 and *m*=8. The hardware complexity is estimated based on the work in [24]. Although the area requirement of the UHD decoder is about 1.7 times of that of the RiBM decoder, the UHD decoder can achieve significantly enhanced burst-error

0( ) *longest*

1( ) *longest*

2 1 *<sup>t</sup>* ( ) *longest*

( ) 0 *r* 

( )*r* 

( ) 0 *r* 

( )*r* 

Ctrl

( )*r* 

( )*r* 

0 *r* 

Group C

( )*r i*

\*( )*<sup>r</sup> i*

> 0 ( )*l* <sup>0</sup>

1 ( )*l* <sup>1</sup>

2 2 ( ) *<sup>t</sup> l* 

2 2 ( ) *<sup>t</sup>* 

 2 2 ( ) *<sup>t</sup> store* 

 *l* 2 2( ) *<sup>t</sup>* 

Ctrl

 *<sup>i</sup>* ( )*<sup>r</sup> i*

\*( )*<sup>r</sup> i* \*( )*<sup>r</sup> i*

( )*<sup>r</sup>*

( ) *store* <sup>0</sup>

( ) *store* <sup>1</sup>

( ) *longest*

( ) *longest*

 2 2 ( ) *<sup>t</sup> longest* 

 *store* 2 2( ) *<sup>t</sup> longest*

( )

*<sup>i</sup>* ( )*<sup>r</sup>*

Group B

 \*( ) 1 *r i*

> ( ) 0 *r*

( ) 1 *r* 

 1 2 (2 ) Initiliazation: 0, 1; Step S1: Input: for current ,denote them as ( ) for 0,1...2 - 2 ; Step S2: If ( ) *i i i i l l l li t l* 2 2 2 1 2 1 1 2 ( ) for all 0,1...2 - 2 then 1, 1; else 0, ; Step S3: If then 1; else 0; Step S4: If 1 then ; else *temp i t equal l l equal l l l l shift shift shift l l l* <sup>1</sup> remains; Step S5: Output , ; Step S6: Goto Step S1 *equal shift*


**Table 3.** Comparisons of performance on hardware and error correction capability.

correcting capability. In the channel environments that likely generate long burst of errors (*f*>8), the traditional RiBM decoder fails to decode the codewords for its limited error correcting capability, while UHD decoder can be still effective. In short, the UHD design provides an efficient and attractive unified solution for multi-mode RS decoding in optical applications that demands enhanced error correcting capability.

**Figure 12.** The timing charts for RiBC architecture.
