**2. Overview of pipelined architecture**

According to the eight great ideas of computer architecture [12], the pipeline is one of the techniques used to improve the Processor's performance.

*Perspective Chapter: The Importance of Pipeline in Modern Cryptosystem DOI: http://dx.doi.org/10.5772/intechopen.102983*

Whenever a program is executed, it is executed through five phases;

a. Instruction / Opcode Fetch (IF)

b. Instruction Decode (ID)

c. Operand Fetch (OF)

d. Operand Execute (OE)

e. Operand Store (OS)

**Figure 1** shows the organization of a Computer. Steps involved in the execution of a program are:

Step 1: The Instructions / Program as well as operands are stored in the Main Memory initially

Step 2: The Processor fetches the instruction from the Instruction Memory Step 3: The Control Unit decodes the instruction and finds out the operand registers as well as the operation to be performed on the operand and sends the control signals to all the components

Step 4: Operands are fetched from the Data Memory

Step 5: Arithmetic and Logic Unit (ALU) executes the operands based on the operation decoded by the Control Unit

Step 6: The output from the ALU is stored back to the Data Memory

In the case of a non-pipelined architecture, after the ith instruction's execution, only the (i + 1)th instruction is initiated. After the (i + 1)th instruction's execution, only the (i + 2)nd instruction is initiated. The instructions will go on executing after finishing the previous instructions. This un-fashioned way of execution is known as instruction-wise interleaved execution or non-pipelined execution and it is shown in **Figure 2**.

tn: Execution duration of an instruction.

tp: Phase duration.

**Figure 1.** *Organization of a computer.*

**Figure 2.** *Non-Pipelined Execution of instructions.*

**Figure 3.** *Pipelined execution of instructions.*

In the case of a Pipelined architecture, while the ith instruction is going from the first phase (IF) to the second phase (ID), the (i + 1)th instruction will be going to the first phase (IF). And while the ith instruction is going from the ID phase to OF phase, the (i + 1)th instruction will be going IF phase to ID phase and the (i + 2)nd instruction will be going to IF phase. This way of execution is known as phase-wise interleaved execution or pipelined execution. The instruction execution periods are overlapped and the parallel execution of the instructions is taking place and it is shown in **Figure 3**. No more than one instruction will be available in the same phase at each time slot. All the instructions will be in different phases. Phase-wise, there will not be any contention.

tp in non-pipelined architecture = 15.

tp in pipelined architecture = 7.

So, in the pipelined architecture, the instruction/program will be executed faster than the non-pipelined architecture.

*Perspective Chapter: The Importance of Pipeline in Modern Cryptosystem DOI: http://dx.doi.org/10.5772/intechopen.102983*

The parameters used to evaluate the performance of the pipelined execution are: Speed-Up ratio (S) Frequency (f) Efficiency (E) Throughput (T)

**Figure 4** shows the two-stage pipelined implementation. Consider a pipelined implementation with two stages Si and Si+1. The pipeline implementation is done by inserting buffers/latches between each stage. Input will go out of the latch when the clock is enabled.

τd = Latch delay (Delay taken by the input to go out of the latch).

τm = Maximum Phase Duration

$$\text{Clock Period } (\mathfrak{r}) = \mathfrak{r}m + \mathfrak{r}d \tag{1}$$

The clock period is the sum of the Maximum Phase Duration (τm) and the Latch delay (τd). All the phases may not have the same phase duration. So, the maximum phase duration among the five phases is considered for calculating the clock period.

In **Figure 5**, the number of instructions (n) = 5.

The number of clock cycles = 9.

**Figure 4.**

*Two-stage pipelined execution of instructions.*


**Figure 5.** *Execution of instructions in a pipelined processor.*

The number of Phases (k) = 5. i.e.,

$$\text{The number of Clock Cycles} = \mathbf{k} + \mathbf{n} - \mathbf{1} \tag{2}$$

Speed-Up ratio = Non-Pipelined Execution Time/Pipelined Execution Time. i.e.,

$$\mathbf{S} = \frac{n \, X \, tn}{(k + n - 1) \, X \, \tau} \tag{3}$$

If n> > k, k + n-1 **≈** n. **∵** tn = k x tp and tp = τ

$$S = \frac{n \, X \, k}{k + n - 1} \tag{4}$$

So,

$$\mathbf{S} = \frac{t\mathbf{n}}{\pi} \tag{5}$$

$$\text{Frequency}, f = \frac{1}{\pi} \tag{6}$$

$$\text{Pipeline Efficiency}, E = \frac{S}{k} \tag{7}$$

$$\text{By substituting Eq. (5) to Eq. (7),}$$

$$E = \frac{n}{k + n - 1} \tag{8}$$

$$\text{Throughput, } T = \frac{B \, Xf}{N} \tag{9}$$

Where B is the message block size, f is the clock frequency and N is the adequate number of clock cycles utilized for the implementation.

## **3. SMS4-BSK cryptosystem**

The SMS4-BSK cryptosystem [13] is a 128-bit symmetric key block cipher. The algorithm is designed to protect the message transmitted through the Wireless Local Area Network (WLAN). Thirty-two rounding operations are used in the encryption and the key generation algorithms. The encryption, as well as the key generation algorithms, use the same architecture. A unique non-linear S-Box, BSK Processing Block, is implemented in the design and operated over GF(216).

**Figure 6** shows the process flow of the SMS4-BSK encryption architecture. The algorithm is designed especially for the sectors which use the shorter length messages (E.g., the Defense sector uses the shorter length messages to alert the enemy intrusion to the base army base camp).

*Perspective Chapter: The Importance of Pipeline in Modern Cryptosystem DOI: http://dx.doi.org/10.5772/intechopen.102983*

*Process flow of SMS4-BSK cryptosystem.*

#### **3.1 Encryption Algorithm**

Step 1: Message Mixing - In this step, the plaintext is split into eight sub-blocks of equal size initially and then mixed with the nearest sub-blocks in a round fashion.

Step 2: Message Swapping – In this step, the first eight bits are swapped with the second eight bits in each sub-blocks.

Step 3: Key Mixing (Key generation is mentioned in the Key Scheduling Algorithm) – The generated half left and right key of 16-bit each is mixed with each sub-blocks after undergoing linear left/right shift.

Step 4: 32 Round BSK Processing – Step 10 of the Key scheduling algorithm is applied here.

Step 5: Message Mixing 2 – In this step, the processed message sub-blocks are linearly mixed with other sub-blocks.

Step 6: 32 Rounding

Step 7: Rounded Encrypted Message Mixing – In this step, the message sub-blocks are mixed in the opposite way as that of Step 1.

Step 8: Mixed Encrypted Message Swapping - In this step, the bits in the message sub-blocks are swapped in the same way as that of Step 2.
