**Abstract**

In this digital world, all the digital information transmitted through the wireless channel has a threat to its security. Along with security, encryption speed is also a significant factor in transmitting the data as fast as possible. The pipeline is the technique used to improve the throughput of the encryption process so that the amount of data encrypted per unit time will be increased. In this chapter, the design of the modern SMS4-BSK cryptosystem is briefed, various pipeline designs of SMS4 algorithms are surveyed and the pipeline implementation on SMS4-BSK cryptosystem is analyzed. The SMS4-BSK cryptosystem is robust, fast and has a throughput of 7.4 Gbps. This modern cryptosystem can resist all kinds of cryptanalysis attacks. The pipelining technique is implemented in this cryptosystem to improve the throughput further. The pipelining method is applied in the encryption architecture of the cryptosystem. The pipelined design is implemented in Kintex-7 FPGA. The design achieved a throughput of 9.9 Gbps. The pipeline implementation can be extended to the key scheduling architecture also as both the encryption and the key scheduling use the same architecture. As per the SMS4-BSK algorithm, the keys are generated in the host system to improve the throughput.

**Keywords:** pipeline, SMS4-BSK cryptosystem, throughput

## **1. Introduction**

Pipelining is a very common and widely used technique to enhance the performance of a system without significantly investing in the hardware. In the pipelining technique, the computations are partitioned into a set of sub computations (elaborated in Section 2) and executed the sub computations in an overlapped fashion. The speed of execution is increased to an equal amount of sub computations obtained as a result of partitioning. The pipeline is used in different areas of computer design, such as memory access, instruction execution and arithmetic computation.

To improve the speed of computation, there is possible to adopt the following methods; Method 1: Replicating the hardware.

Method 2: Partitioning the computations.

In the former method, by replicating the hardware, the performance can be improved by sacrificing the hardware cost. In the latter method, by partitioning the computation, the overlapped execution technique is implemented where the performance improvement is quite close to the former method.

Jin *et al.* have described [1] the pipelined design and folded design of SMS4 Block Cipher and implemented both the designs on the Xilinx Vertex-4 FPGA device. The implementation results show improved throughput in the former design and area coverage is minimized in the latter. According to the author, the proposed design might be the flexible choice for both the area-critical and the speed-critical cases. Gao *et al.* have proposed [2] rolling and unrolling architectures based SMS4 cryptosystem. The rolling structure uses a feedback system to control the entire processing mechanism. The unrolling structure is a fully pipelined architecture. The combination of rolling and unrolling provides good processing speed with an average clock cycle of one clock for processing128-bit. Han *et al.* have designed [3] an SMS4 architecture with optimization in power dissipation and implementation cost. The authors proposed a cryptographic algorithm for a programmable security processor. A three-stage pipelining and a 16-bit instruction set to enhance security are implemented in the design. The cost of the Processor and the code density of the design is significantly less. The round keys are stored in shared memory. A security scheme is proposed to protect these round keys. Zhao *et al.* have proposed [4] Galois Counter Mode (GCM) based SMS4 architecture. The architecture is fully pipelined to provide better performance. The structure can process 128-bit data on an average at each clock period. Full pipelined architecture is used here to improve the speed of encryption. The proposed design is implemented in both Vetex-4 and Vertex-5 FPGA. Zhao *et al.* have proposed [5] a novel implementation FPGA scheme for SMS4 cryptosystem. The throughput achieved in this scheme is 1.9 Gbps. Lee *et al.* have surveyed [6] the study pattern of computer architecture among students in implementing pipelining in the processor design. The design required only 21 Million Instructions per Second (MIPS). Abdel-Hafeez *et al.* have implemented pipelining [7] in Advanced Encryption Standard (AES). The architecture is implemented in Altera Max 3000A Field Programmable Gate Array (FPGA). The author claims that the pipelined AES design has a 16% higher throughput and 36% less hardware area than other designs. Guo *et al.* have proposed a pipelined AES [8] cryptosystem by combining pipelining with parallel processing and reconfiguration techniques. The design achieved a throughput of 8.83 Gbps. Teo *et al.* have implemented [9] pipelining in Data Encryption Standard (DES) cryptosystem. The architecture is implemented in Altera Complex Programmable Logic Devices (CPLD). Four stage pipeline approach is used to improve the throughput of the DES architecture. Babu *et al.* have implemented pipelining in SMS4 [10] cryptosystem. The design is implemented in Altera FPGA. The pipelining is applied to the Twisted Binary Decision Diagram (BDD) S-Box architecture. The Twisted BDD with m = 4 possesses good speed and throughput. The pipeline is implemented after the transformation block at each round. There are 32 round operations in the encryption architecture. Taherkhani *et al.* have designed [11] the pipelined DES cryptosystem and Vertex-6 FPGA to implement the architecture. The non-pipelined and the pipelined DES architectures are implemented in the same environment and analyzed. The results showed that the pipelined architecture's performance and throughput are better.
