3.2 Nonadjacent form

Like addition, point subtraction on an elliptic curve is also effective especially when it comes to computing easily the opposite of a point on which we only change a coordinate: P (x, y) to -P(x,-y). We can use a signed representation of bits of the integer k. One of the particularly interesting representations is the nonadjacent form which uses {�1, 0, 1}: <sup>∑</sup><sup>l</sup>�<sup>1</sup> <sup>i</sup>¼<sup>0</sup>ki2<sup>i</sup> , where ki ∈ {�1, 0, 1}. To compute scalar multiplication [k]P by NAF, digits on NAF representation of scalar k are scanned from most significant digit to last significant digit. For each digit, a point doubling operation is performed, and point addition is computed when the digit is equal to 1 or a point subtraction when the digit is equal to �1. The advantage of this representation is that it possesses the following properties:

1. k has a unique NAF denoted NAF (k).

2. NAF(k) has the fewest non-zero digits of any signed digit representation of k.

C<sup>2</sup> ð <sup>3</sup> ¼ �f g 4; �3; �2; �1; 0; 1; 2; 3; 4 :We can define NAFw kð Þ as follows :

. For example, if the scalar k = 379 = (101111011)2, so NAF2(k), NAF3(k), and

A Survey of Fast Scalar Multiplication on Elliptic Curve Cryptography for Lightweight…

//scan from most significant digit to less significant

// compute point doubling // compute point addition

//compute point substraction

Algorithm 8 presents DA method using NAF of scalar k on fixed-size windows.

for j ={1, 3,..., (2<sup>w</sup>�<sup>1</sup>

significant digit. // compute point doubling

// compute point addition

//compute point substraction


//begin scanning from most significant digit to last

P, with ki j j , <sup>2</sup><sup>w</sup>�<sup>1</sup>

NAF kð Þ<sup>w</sup> <sup>¼</sup> <sup>∑</sup><sup>l</sup>

Output: Q= [k]P Begin

if ð Þ ki 6¼ 0 then

 

 

51

 end

end else

end

<sup>i</sup>¼<sup>0</sup>ki2<sup>i</sup>

Algorithm 7. NAF method. Input: NAF(k), P ∈ E (Fp) Output: Q= [k]P

DOI: http://dx.doi.org/10.5772/intechopen.86584

Q ∞ forð Þ i l � 1 to 0 do Q 2Q

end if ð Þ ki ¼ �1 then jQ Q � P end

> end return ð Þ Q

ifð Þ ki ¼ 1 then jQ Q þ P

Begin

 end

 

NAF4(k) can be computed (with �1= 1):

1. NAF2(k)=(1 0 100001 0 1)

2. NAF3(k)=(3 0 0 0 01 0 0 3)

3.NAF4(k) =(3 0 0 0 0 0 05)

Q ∞ forð Þ i l � 1 to 0 do Q 2Q

if ð Þ ki . 0 then Q Q þ ki j ½ �P;

Q Q � ki j ½ �P;

end

end return ð Þ Q

Algorithm 8. NAF method with fixed size windows. Input: NAF(k), P ∈ E (Fp), precomputed points [j]P

3. The length of NAF(k) is at most one more than the length of binary k.

For example, for k = 2552 = (11111111)2 where the density of non-zero digits is maximum, the computation of 255P implies seven point additions. But if we transform it into 256P -P which is equal to (10000000–1)P, only one addition is needed. Thus, the NAF kð Þ¼ð100000001Þ2where 1 representes �1. The NAF(k) can be generated by dividing successively k by 2. If k is odd, the rest r ∈ {�1, 1} is chosen so that the quotient (k-r)/2 is even. Thus, the next digit of NAF representation will be equal to 0.

```
Algorithm 6. Computing NAF for scalar k.
Input: k ¼ the scalar k integer ð Þ
Output: NAF(k),
Begin
             i 0
        whileð Þ k≥1 do
         ifð Þ ki odd then
       ki 2 � ð Þ kmod4 ;
      k ¼ ki;

              end
              else
    jki 0
              end
          ki¼ k
               2
                ;
           i i þ 1;

              end
 return ki�1 ; ki ð Þ ; …………:k1 ; k0

end
```
Based on DA algorithm from left to right, Algorithm 6 computes scalar multiplication by using NAF(k).

Thus, the average density of non-zero digits (�1 or 1) for all NAF (k) with length (l-1) digits is approximately (l-1)/3. The average computation of Algorithm 7 is (l-1) point doublings and (l-1)/3 point additions. However, it requires a scalar conversion time from k to NAF(k) (see Algorithm 6). The NAF method can be generally used for a set of digits <sup>C</sup>2<sup>w</sup> ¼ �2<sup>w</sup>�<sup>1</sup> ; …::2<sup>w</sup>�<sup>1</sup> to represent the scalar k: That's equivalent to split it into fixed � size windows w: For example,

A Survey of Fast Scalar Multiplication on Elliptic Curve Cryptography for Lightweight… DOI: http://dx.doi.org/10.5772/intechopen.86584

```
Algorithm 7. NAF method.
```
3.2 Nonadjacent form

be equal to 0.

Begin

 end

50

 

 

Output: NAF(k),

k ¼ ki;

jki 0

plication by using NAF(k).

generally used for a set of digits <sup>C</sup>2<sup>w</sup> ¼ �2<sup>w</sup>�<sup>1</sup>

form which uses {�1, 0, 1}: <sup>∑</sup><sup>l</sup>�<sup>1</sup>

Like addition, point subtraction on an elliptic curve is also effective especially when it comes to computing easily the opposite of a point on which we only change a coordinate: P (x, y) to -P(x,-y). We can use a signed representation of bits of the integer k. One of the particularly interesting representations is the nonadjacent

multiplication [k]P by NAF, digits on NAF representation of scalar k are scanned from most significant digit to last significant digit. For each digit, a point doubling operation is performed, and point addition is computed when the digit is equal to 1 or a point subtraction when the digit is equal to �1. The advantage of this repre-

2. NAF(k) has the fewest non-zero digits of any signed digit representation of k.

For example, for k = 2552 = (11111111)2 where the density of non-zero digits is maximum, the computation of 255P implies seven point additions. But if we transform it into 256P -P which is equal to (10000000–1)P, only one addition is needed. Thus, the NAF kð Þ¼ð100000001Þ2where 1 representes �1. The NAF(k) can be generated by dividing successively k by 2. If k is odd, the rest r ∈ {�1, 1} is chosen so that the quotient (k-r)/2 is even. Thus, the next digit of NAF representation will

Based on DA algorithm from left to right, Algorithm 6 computes scalar multi-

; …::2<sup>w</sup>�<sup>1</sup> to represent the scalar

Thus, the average density of non-zero digits (�1 or 1) for all NAF (k) with length (l-1) digits is approximately (l-1)/3. The average computation of Algorithm 7 is (l-1) point doublings and (l-1)/3 point additions. However, it requires a scalar conversion time from k to NAF(k) (see Algorithm 6). The NAF method can be

k: That's equivalent to split it into fixed � size windows w: For example,

3. The length of NAF(k) is at most one more than the length of binary k.

, where ki ∈ {�1, 0, 1}. To compute scalar

<sup>i</sup>¼<sup>0</sup>ki2<sup>i</sup>

sentation is that it possesses the following properties:

Modern Cryptography – Current Challenges and Solutions

1. k has a unique NAF denoted NAF (k).

Algorithm 6. Computing NAF for scalar k.

Input: k ¼ the scalar k integer ð Þ

i 0 whileð Þ k≥1 do ifð Þ ki odd then ki 2 � ð Þ kmod4 ;

> end else

end ki<sup>¼</sup> <sup>k</sup> 2 ; i i þ 1;

end return ki�<sup>1</sup> ; ki ð Þ ; …………:k<sup>1</sup> ; k<sup>0</sup>

```
Input: NAF(k), P ∈ E (Fp)
Output: Q= [k]P
Begin
         Q ∞
   forð Þ i l � 1 to 0 do
         Q 2Q
  ifð Þ ki ¼ 1 then
  jQ Q þ P
            end
    if ð Þ ki ¼ �1 then
       jQ Q � P
            end

           end
        return ð Þ Q

end
                                                    //scan from most significant digit to less significant
                                                    // compute point doubling
                                                    // compute point addition
                                                    //compute point substraction
```
C<sup>2</sup> ð <sup>3</sup> ¼ �f g 4; �3; �2; �1; 0; 1; 2; 3; 4 :We can define NAFw kð Þ as follows : NAF kð Þ<sup>w</sup> <sup>¼</sup> <sup>∑</sup><sup>l</sup> <sup>i</sup>¼<sup>0</sup>ki2<sup>i</sup> P, with ki j j , <sup>2</sup><sup>w</sup>�<sup>1</sup> .

For example, if the scalar k = 379 = (101111011)2, so NAF2(k), NAF3(k), and NAF4(k) can be computed (with �1= 1):

1. NAF2(k)=(1 0 100001 0 1)

2. NAF3(k)=(3 0 0 0 01 0 0 3)

3.NAF4(k) =(3 0 0 0 0 0 05)

Algorithm 8 presents DA method using NAF of scalar k on fixed-size windows.

Algorithm 8. NAF method with fixed size windows.


The average density of non-zero digits for all NAF (k) with length l digits is approximately l/(w+1). Thus, Algorithm 8 performs on average (l-1) point doublings and l/(1+w) point additions. However, this method generates precomputed points [j]P for j=1, 3, . . ., 2w�<sup>1</sup> � <sup>1</sup>: Despite the cost of precomputed points 1 point doubling <sup>þ</sup> <sup>2</sup>w�<sup>2</sup> -1) point additions), the usage of NAFw(k) with windows remains more interesting than the one without window.

costs by reusing all intermediate calculations. We keep the initial representation of k with the additional constraint that the exponents form two decreasing sequences: amax ≥ a1 ≥ a2 ≥ ........ athe and bmax ≥ b1 ≥ b2 ≥ .....athe. This formulation makes it possible to calculate only amax doublings, bmax triplements, and (lt - 1) additions.

A Survey of Fast Scalar Multiplication on Elliptic Curve Cryptography for Lightweight…

The scalar multiplication is as follows: 22 (3<sup>3</sup> (2 � 3P + P) -P). Thus, the cost of scalar multiplication is 4 triplements + 3 doublings + 2 additions. This approach has been generalized using a slightly larger number space requiring pre-calculated points [24]. In this case the values of ki are prime numbers other than 3: {�1, �5,

If memory storage is available, the precomputed points can be used to decrease the computation time. The window method or block can be used differently on signed representations such as NAF, MOF, complement coding, or unsigned representations such as double-and-add. If we are interested in sliding window representation, the number of precomputed points varies according to the methods. Take the example of variable windows size (sliding) having a maximum number of five

For the double-and-add method, we will have all the odd combinations of the maximum of 5 bits, that is, which begin and end with a 1. We will thus have at most 15 precomputed points: [3]P, [5]P, [7]P, [9]P, [11]P, [13]P, [15]P, [17]P, [19]P, [21]

For wNAF method, the blocks are processed through variable windows size (or sliding) having a maximum number of five digits. These windows begin and end with a non-zero digit. As a result, the value Vi of each block of the scalar k is odd and is less than 2w. There are no two consecutive non-zero digits, so the number of zeros is at least equal to the number of zero digits in the �1 block. The maximum

window is 5 bits, the largest corresponding precomputed point is 10101

Note that the negative points are the symmetrical positive points, they are neither stored nor computed, and they are obtained almost free. For windows with

w=3bits w=4bits w=5bits w=5bits 1 0 1=5P 1 0 0 1= 9P 1 0 1 0 1= 21P 1 0 1 0 1=-21P 1 0 1=3P 1 0 01 = 7P 1010 1= 19P 1 0 1 0 1=-19P 1 0 1=-3P 1 0 0 1= -7P 1 0 0 0 1= 17P 1000 1=-17P 1 0 1=-5P 100 1= -9P 1 0 0 01 = 15P 1 0 0 0 1=-16P

MOF uses a signed representation just like NAF, but there can be two consecutive non-zero digits. For windows with a maximum of 5 bit length, the derivation of the computed points is done by subtracting each bit ki�<sup>1</sup> from the

a maximum size of 5 bits, the number of precomputed points is 10.

possible combinations for precomputed points are the following:

) - 1. If the maximum length of the

1 0 1 0 1= 13P 1010 1=-13P 1 0 1 0 1= 11P 1 0 1 0 1=-13P


= 21P, and

For example, 752= 23

�7, �11}.

digits.

3.6 Comparison

block with that of ki.

53

�3<sup>4</sup> + 22

DOI: http://dx.doi.org/10.5772/intechopen.86584

P, [23]P, [25]P, [27]P, [29]P, and [31]P.

number of precomputed points required is (2w�<sup>2</sup>

�3<sup>3</sup> –22 .

A last generalization of this method is to use NAFw(k) with variable window size (or sliding) lengths with a maximum number of digits. These windows begin and end with a non-zero. If we take the example of NAF2(k) =(1 0 10000 1 0 1) with sliding windows having a maximum length of three digits, these windows begin and end with non-zero digits. We thus obtain

$$\text{NAF}\_2(k) = \underline{\mathbf{10}} \,\overline{\mathbf{1}} \,\mathbf{0} \,\mathbf{0} \,\mathbf{0} \,\mathbf{0} \,\overline{\mathbf{1}} \,\mathbf{0} \,\overline{\mathbf{1}}$$

The precomputed points are [3]P and [5]P; the scalar multiplication is as follows: [3]P ! [6]P(point doublings) ! [12]P(point doublings) ! [24]P(point doublings) ! [48]P(point doublings) ! [96]P(point doublings) ! [192]P(point doublings) ! [384]P(point doublings) ! [379]P(point subtraction of -[5]P). Thus, we perform 8 point operations, against 12 in the case where the windows are fixed.

#### 3.3 Mutual opposite form (MOF) algorithm

More recent mechanisms like the mutual opposite form (MOF) [22] and the complementary recoding algorithm [23] used signed representation digits {�1, 0, 1}.

In MOF, the representation of the scalar k is obtained by subtracting each ki�<sup>1</sup> bit from that of ki. The most significant bit is 1 and the least significant digit is �1. Its output is comparable to that of NAF.

For example, if the scalar k = 379 = (101111011)2, then MOF (k) = 1 11000 1 1 0 1 can be calculated. The conversion is simpler than that of NAF because it only requires subtraction operations. In addition MOF can scan bits or digits from left to right or vice versa, which is more flexible.

#### 3.4 One's complementary recoding algorithm (CR1)

In one's complementary recoding method, the representation of the scalar k is obtained through its complement k : ∑<sup>l</sup>�<sup>1</sup> <sup>i</sup>¼<sup>0</sup>ki2<sup>i</sup> <sup>¼</sup> <sup>2</sup><sup>l</sup> � <sup>k</sup> � <sup>1</sup>: The <sup>k</sup> complement is obtained by inverting each bit of the k scalar: For example, if the scalar k ¼ 379 ¼ ð Þ 101111011 <sup>2</sup> ,then it can be computed : k ¼ 29 � k � 1 ¼ ð Þ <sup>1000000000</sup>–<sup>010000100</sup> � <sup>1</sup> <sup>2</sup> <sup>¼</sup> <sup>10100001</sup> <sup>00</sup>–1)2. Thus, we can see that the density of the non-zero bits is reduced from 7 to 4. However, if the number of 1 in the original k scalar is greater than l/2, the method is not more interesting because the goal is to have the least 1 in the final representation.

#### 3.5 Double-base number system

In the methods discussed above, the scalar is represented in a single base; the double-base numbering system (DBNS) offers a representation in two bases [11]. The scalar k is represented as a sum of combined powers of 2 and 3: k=∑<sup>l</sup> <sup>i</sup>¼<sup>1</sup>ki2ai 3bi , where ki ∈ {�1, 1} and ai, bi ≥ 0. The direct usage of this system can induce a high computational cost: ∑bitriples, ∑aidoublings. Significant improvement can reduce

A Survey of Fast Scalar Multiplication on Elliptic Curve Cryptography for Lightweight… DOI: http://dx.doi.org/10.5772/intechopen.86584

costs by reusing all intermediate calculations. We keep the initial representation of k with the additional constraint that the exponents form two decreasing sequences: amax ≥ a1 ≥ a2 ≥ ........ athe and bmax ≥ b1 ≥ b2 ≥ .....athe. This formulation makes it possible to calculate only amax doublings, bmax triplements, and (lt - 1) additions. For example, 752= 23 �3<sup>4</sup> + 22 �3<sup>3</sup> –22 .

The scalar multiplication is as follows: 22 (3<sup>3</sup> (2 � 3P + P) -P). Thus, the cost of scalar multiplication is 4 triplements + 3 doublings + 2 additions. This approach has been generalized using a slightly larger number space requiring pre-calculated points [24]. In this case the values of ki are prime numbers other than 3: {�1, �5, �7, �11}.

#### 3.6 Comparison

The average density of non-zero digits for all NAF (k) with length l digits is approximately l/(w+1). Thus, Algorithm 8 performs on average (l-1) point doublings and l/(1+w) point additions. However, this method generates precomputed points [j]P for j=1, 3, . . ., 2w�<sup>1</sup> � <sup>1</sup>: Despite the cost of precomputed points 1 point doubling <sup>þ</sup> <sup>2</sup>w�<sup>2</sup> -1) point additions), the usage of NAFw(k) with win-

A last generalization of this method is to use NAFw(k) with variable window size (or sliding) lengths with a maximum number of digits. These windows begin and end with a non-zero. If we take the example of NAF2(k) =(1 0 10000 1 0 1) with sliding windows having a maximum length of three digits, these windows begin and

NAF2ð Þ¼ k 1 0 10000 1 0 1

point operations, against 12 in the case where the windows are fixed.

More recent mechanisms like the mutual opposite form (MOF) [22] and the complementary recoding algorithm [23] used signed representation digits

In MOF, the representation of the scalar k is obtained by subtracting each ki�<sup>1</sup> bit from that of ki. The most significant bit is 1 and the least significant digit is �1.

For example, if the scalar k = 379 = (101111011)2, then MOF (k) = 1 11000 1 1 0 1 can be calculated. The conversion is simpler than that of NAF because it only requires subtraction operations. In addition MOF can scan bits or digits from left to

In one's complementary recoding method, the representation of the scalar k is

In the methods discussed above, the scalar is represented in a single base; the double-base numbering system (DBNS) offers a representation in two bases [11].

where ki ∈ {�1, 1} and ai, bi ≥ 0. The direct usage of this system can induce a high computational cost: ∑bitriples, ∑aidoublings. Significant improvement can reduce

The scalar k is represented as a sum of combined powers of 2 and 3: k=∑<sup>l</sup>

complement is obtained by inverting each bit of the k scalar: For example, if the scalar k ¼ 379 ¼ ð Þ 101111011 <sup>2</sup> ,then it can be computed : k ¼ 29 � k � 1 ¼ ð Þ <sup>1000000000</sup>–<sup>010000100</sup> � <sup>1</sup> <sup>2</sup> <sup>¼</sup> <sup>10100001</sup> <sup>00</sup>–1)2. Thus, we can see that the density of the non-zero bits is reduced from 7 to 4. However, if the number of 1 in the original k scalar is greater than l/2, the method is not more interesting because

<sup>i</sup>¼<sup>0</sup>ki2<sup>i</sup> <sup>¼</sup> <sup>2</sup><sup>l</sup> � <sup>k</sup> � <sup>1</sup>: The <sup>k</sup>

<sup>i</sup>¼<sup>1</sup>ki2ai

3bi ,

The precomputed points are [3]P and [5]P; the scalar multiplication is as follows: [3]P ! [6]P(point doublings) ! [12]P(point doublings) ! [24]P(point doublings) ! [48]P(point doublings) ! [96]P(point doublings) ! [192]P(point doublings) ! [384]P(point doublings) ! [379]P(point subtraction of -[5]P). Thus, we perform 8

dows remains more interesting than the one without window.

Modern Cryptography – Current Challenges and Solutions

end with non-zero digits. We thus obtain

3.3 Mutual opposite form (MOF) algorithm

Its output is comparable to that of NAF.

right or vice versa, which is more flexible.

obtained through its complement k : ∑<sup>l</sup>�<sup>1</sup>

3.5 Double-base number system

3.4 One's complementary recoding algorithm (CR1)

the goal is to have the least 1 in the final representation.

{�1, 0, 1}.

52

If memory storage is available, the precomputed points can be used to decrease the computation time. The window method or block can be used differently on signed representations such as NAF, MOF, complement coding, or unsigned representations such as double-and-add. If we are interested in sliding window representation, the number of precomputed points varies according to the methods. Take the example of variable windows size (sliding) having a maximum number of five digits.

For the double-and-add method, we will have all the odd combinations of the maximum of 5 bits, that is, which begin and end with a 1. We will thus have at most 15 precomputed points: [3]P, [5]P, [7]P, [9]P, [11]P, [13]P, [15]P, [17]P, [19]P, [21] P, [23]P, [25]P, [27]P, [29]P, and [31]P.

For wNAF method, the blocks are processed through variable windows size (or sliding) having a maximum number of five digits. These windows begin and end with a non-zero digit. As a result, the value Vi of each block of the scalar k is odd and is less than 2w. There are no two consecutive non-zero digits, so the number of zeros is at least equal to the number of zero digits in the �1 block. The maximum number of precomputed points required is (2w�<sup>2</sup> ) - 1. If the maximum length of the window is 5 bits, the largest corresponding precomputed point is 10101 = 21P, and

$$\sum\_{w=s}$$


possible combinations for precomputed points are the following:

Note that the negative points are the symmetrical positive points, they are neither stored nor computed, and they are obtained almost free. For windows with a maximum size of 5 bits, the number of precomputed points is 10.

MOF uses a signed representation just like NAF, but there can be two consecutive non-zero digits. For windows with a maximum of 5 bit length, the derivation of the computed points is done by subtracting each bit ki�<sup>1</sup> from the block with that of ki.


1: If k ∈�b cn , n � 1½, kP ¼ tP where t ¼ ð Þ k � n

A Survey of Fast Scalar Multiplication on Elliptic Curve Cryptography for Lightweight…

j k, ,<sup>½</sup> kP <sup>¼</sup> tP where t <sup>¼</sup> <sup>k</sup>

For example, p =23 is a prime number, just to better explain this technique, but in reality p is much bigger than this. For an elliptic E over F23 defined by E(F23),

+x+1, then # E(F23)=28, E(F23) is a cyclic group, and P(0, 1) is a generator point. SR makes an equivalent representation on the set of points in [⌊n/2⌋+1, n-1], so that computing points 16P, 22P, and 27P can be, respectively, replaced by -12P, -7P, and -P. In this case, the computation of 27P is replaced by the calculation of -P and is almost free. For WSN or IoT embedded devices, replacing the calculation of kP by tP using Eq. (5.1) in [⌊n⌋+1, n-1] can significantly accelerate scalar multiplication. From Eq. (6), all scalars can be scanned: In the interval [⌊n⌋+1, n-1], for

(5)

2: If k∈ �0

8 ><

DOI: http://dx.doi.org/10.5772/intechopen.86584

>:

• [15]P = [13]P + 2([1]P)

• [16]P = [12]P + 2([2]P)

• [17]P = [11]P + 2([3]P)

• .........= .........+……….

• .........= .........+……….

• [26]P = [2]P + 2([12]P)

• [27]P = [1]P + 2([13]P)

expression ∑<sup>n</sup>�<sup>1</sup>

55

It can be inferred that ∑<sup>n</sup>�<sup>1</sup>

∑ n�1 k¼1

∑ n�1 k¼1

which is equal to ⌊log2(k)⌋+1, or log2(k) if k=2x

log <sup>2</sup> <sup>k</sup> � 2 k � <sup>n</sup>

Thus,the gain α in bit length is α ¼ log <sup>2</sup> 1 þ

kP ¼ 2 ∑

<sup>k</sup>¼b cþ <sup>n</sup>=<sup>2</sup> <sup>1</sup>kP can be replaced by

kP ¼ 2 ∑

y 2 =x<sup>3</sup> n 2

this example we have the following equivalence representations.

<sup>k</sup>¼b cþ <sup>n</sup>=<sup>2</sup> <sup>1</sup>kP <sup>¼</sup> <sup>∑</sup>b c� <sup>n</sup>=<sup>2</sup> <sup>1</sup>

kP þ

In SR technique, [15]P, [16]P, ........., [26]P, [27]P] can be replaced, respectively,

kP þ ∑ b c� n=2 1 k¼1

The complexity of scalar multiplication can be determined by the bit length of k

n 2 j k<sup>P</sup> <sup>þ</sup> <sup>∑</sup>

b c� n=2 1 k¼1

by [�13]P, [�12]P,........., [�2]P, [�1]P in interval [⌊n⌋+1, n-1]. The

b c� n=2 1 k¼1

representation, log2(k) can be replaced in scalar reduction technique by

� � � � <sup>¼</sup> log 2k <sup>þ</sup> log <sup>2</sup> <sup>1</sup> <sup>þ</sup>

� �

2

<sup>k</sup>¼<sup>1</sup> kP <sup>þ</sup> <sup>2</sup>∑b c� <sup>n</sup>=<sup>2</sup> <sup>1</sup>

b c� n=2 1 1¼1

j j k P þ

n � 2k k

� � � �

n 2

, where x is an integer. In binary

n � 2k k

> � �

� � � � <sup>¼</sup> log <sup>2</sup>

� � (8)

j jt k � � � �

� � � � : (9)

<sup>1</sup> kP. Thus

kP (6)

j k<sup>P</sup> (7)


The remaining combinations give the same negative values. Thus, the number of precomputed points is 7: [3]P, [5]P, [7]P, [9]P, [11]P, [13]P, and [15]P.

Complement recoding uses the same representation of MOF, but for the derivation of precomputed points, it takes all combinations of up to 5 bits, beginning and ending with a non-zero number, i.e., (2w�<sup>1</sup> - 1) = 15 precomputed points: [3]P, [5]P, [7]P, [9]P, [11]P, [13]P, [15]P, [17]P, [19]P, [21]P, [23]P, [25]P, [27]P, [29]P, and [31]P. Table 1 presents a comparison between different methods.

#### 3.7 Scalar reduction method

We have developed a scalar reduction (SR) algorithm; its main advantage is that it can be easily applied to almost all existing fast scalar multiplication methods described in previous sections. This scalar reduction scheme is an improvement based on the negative of a point. Through this, it makes a specific reduction of the scalar in a selected interval. Using negation is a well-known trick in cryptanalysis as well as in cryptography for computation of scalar multiplication with additionsubtraction chains [25, 26]. This scheme replaces point kP by an equivalent representation of another point tP in the scalar multiplication operation where k and t are scalars and k > t. This technique is applied in the interval [⌊n/2⌋+1, n-1], where ⌊n/2⌋ is the integer part function of n/2. As the negative of a point is obtained almost free, we have used it to make fast computation. Given point P=(xp, yp) in affine coordinates, the negative of point kP=(xkp, ykp) can be computed as kP=(xkp, ykp), and then change the sign on the y-coordinate (ykp). Thus, by kP the scalar reduction technique gets equivalent point tP through Eq. (5).


#### Table 1.

Cost for computation and memory storage.

A Survey of Fast Scalar Multiplication on Elliptic Curve Cryptography for Lightweight… DOI: http://dx.doi.org/10.5772/intechopen.86584

$$\begin{cases} \text{1. If } \mathbf{k} \in ] [\mathbf{n}], \mathbf{n} - \mathbf{1}[\text{, kP} = \text{tP} \text{ where } \mathbf{t} = (\mathbf{k} - \mathbf{n}) \\\\ \text{2. If } \mathbf{k} \in ] \mathbf{0} \left| \frac{\mathbf{n}}{2} \right|, [\text{, kP} = \text{tP} \text{ where } \mathbf{t} = \mathbf{k} \end{cases} \tag{5}$$

For example, p =23 is a prime number, just to better explain this technique, but in reality p is much bigger than this. For an elliptic E over F23 defined by E(F23), y 2 =x<sup>3</sup> +x+1, then # E(F23)=28, E(F23) is a cyclic group, and P(0, 1) is a generator point. SR makes an equivalent representation on the set of points in [⌊n/2⌋+1, n-1], so that computing points 16P, 22P, and 27P can be, respectively, replaced by -12P, -7P, and -P. In this case, the computation of 27P is replaced by the calculation of -P and is almost free. For WSN or IoT embedded devices, replacing the calculation of kP by tP using Eq. (5.1) in [⌊n⌋+1, n-1] can significantly accelerate scalar multiplication. From Eq. (6), all scalars can be scanned: In the interval [⌊n⌋+1, n-1], for this example we have the following equivalence representations.

• [15]P = [13]P + 2([1]P)

For example, for some values (16–31) of 5-bit blocks, we have:

10001: :10001 110011 ! 9P

Modern Cryptography – Current Challenges and Solutions

10101: :10101 111111 ! 11P

11001: :11001 101110 ! 13P

11101: :11101 100111 ! 15P

The remaining combinations give the same negative values. Thus, the number of

10010: :10010 110110 ! 9P

10110: :10110 111010 ! 11P

11010: :11010 101011 ! 13P

11110: :11110 100010 ! 15P 10011: :10011 110101 ! 5P

10111: :10111 111001 ! 3P

11011: :11011 101101 ! 7P

11111: :11111 100001 ! P

Complement recoding uses the same representation of MOF, but for the derivation of precomputed points, it takes all combinations of up to 5 bits, beginning and ending with a non-zero number, i.e., (2w�<sup>1</sup> - 1) = 15 precomputed points: [3]P, [5]P, [7]P, [9]P, [11]P, [13]P, [15]P, [17]P, [19]P, [21]P, [23]P, [25]P, [27]P, [29]P, and

We have developed a scalar reduction (SR) algorithm; its main advantage is that

Methods Cost Precomputed points W = 5 Directions

A < 2<sup>w</sup>�<sup>1</sup>

A <= 2<sup>w</sup>�<sup>1</sup>

<sup>2</sup> <sup>A</sup> <sup>0</sup> …. ⇆

<sup>3</sup> <sup>A</sup> <sup>0</sup> …. !

<sup>2</sup> <sup>A</sup> <sup>0</sup> …. ⇆

<sup>3</sup> <sup>A</sup> <sup>0</sup> …. ⇆

A 2<sup>w</sup>�<sup>1</sup> -1 15 ⇆



it can be easily applied to almost all existing fast scalar multiplication methods described in previous sections. This scalar reduction scheme is an improvement based on the negative of a point. Through this, it makes a specific reduction of the scalar in a selected interval. Using negation is a well-known trick in cryptanalysis as well as in cryptography for computation of scalar multiplication with additionsubtraction chains [25, 26]. This scheme replaces point kP by an equivalent representation of another point tP in the scalar multiplication operation where k and t are scalars and k > t. This technique is applied in the interval [⌊n/2⌋+1, n-1], where ⌊n/2⌋ is the integer part function of n/2. As the negative of a point is obtained almost free, we have used it to make fast computation. Given point P=(xp, yp) in affine coordinates, the negative of point kP=(xkp, ykp) can be computed as kP=(xkp, ykp), and then change the sign on the y-coordinate (ykp). Thus, by kP the scalar reduction

precomputed points is 7: [3]P, [5]P, [7]P, [9]P, [11]P, [13]P, and [15]P.

[31]P. Table 1 presents a comparison between different methods.

technique gets equivalent point tP through Eq. (5).

wþ1

wþ1

wþ1

3.7 Scalar reduction method

DA (l-1)D+ð Þ <sup>l</sup>�<sup>1</sup>

NAF (l-1)D+ð Þ <sup>l</sup>�<sup>1</sup>

MOF (l-1)D+ð Þ <sup>l</sup>�<sup>1</sup>

RC1 <(l-1)D+ð Þ <sup>l</sup>�<sup>1</sup>

wNAF (l-1)D+ <sup>l</sup>

wMOF (l-1)D+ <sup>l</sup>

wRC1 <(l-1)D+ <sup>l</sup>

Cost for computation and memory storage.

Table 1.

54

10000: :10000 11000 ! P

10100: :10100 111100 ! 5P

11000: :11000 101000 ! 3P

11100: :11100 100100 ! 7P


$$\bullet \ [\mathtt{27}] \mathtt{P} = [\mathtt{1}] \mathtt{P} + \mathtt{2} ([\mathtt{13}] \mathtt{P}) \mathtt{P}$$

It can be inferred that ∑<sup>n</sup>�<sup>1</sup> <sup>k</sup>¼b cþ <sup>n</sup>=<sup>2</sup> <sup>1</sup>kP <sup>¼</sup> <sup>∑</sup>b c� <sup>n</sup>=<sup>2</sup> <sup>1</sup> <sup>k</sup>¼<sup>1</sup> kP <sup>þ</sup> <sup>2</sup>∑b c� <sup>n</sup>=<sup>2</sup> <sup>1</sup> <sup>1</sup> kP. Thus

$$\sum\_{\mathbf{k}=1}^{\mathbf{n}-1} \mathbf{k} \mathbf{P} = 2 \sum\_{\mathbf{k}=1}^{\lfloor \mathbf{n}/2 \rfloor -1} \mathbf{k} \mathbf{P} + \left\lfloor \frac{\mathbf{n}}{2} \right\rfloor \mathbf{P} + \sum\_{\mathbf{l}=1}^{\lfloor \mathbf{n}/2 \rfloor -1} \mathbf{k} \mathbf{P} \tag{6}$$

In SR technique, [15]P, [16]P, ........., [26]P, [27]P] can be replaced, respectively, by [�13]P, [�12]P,........., [�2]P, [�1]P in interval [⌊n⌋+1, n-1]. The expression ∑<sup>n</sup>�<sup>1</sup> <sup>k</sup>¼b cþ <sup>n</sup>=<sup>2</sup> <sup>1</sup>kP can be replaced by

$$\sum\_{\mathbf{k}=1}^{\mathbf{n}-1} \mathbf{k} \mathbf{P} = 2 \sum\_{\mathbf{k}=1}^{\lfloor \mathbf{n}/2 \rfloor -1} \mathbf{k} \mathbf{P} + \sum\_{\mathbf{k}=1}^{\lfloor \mathbf{n}/2 \rfloor -1} |\mathbf{k}| \mathbf{P} + \left\lfloor \frac{\mathbf{n}}{2} \right\rfloor \mathbf{P} \tag{7}$$

The complexity of scalar multiplication can be determined by the bit length of k which is equal to ⌊log2(k)⌋+1, or log2(k) if k=2x , where x is an integer. In binary representation, log2(k) can be replaced in scalar reduction technique by

$$\log\_2\left(\mathbf{k} - 2\left(\mathbf{k} - \frac{\mathbf{n}}{2}\right)\right) = \log\_2\mathbf{k} + \log\_2\left(\mathbf{1} + \frac{\mathbf{n} - 2\mathbf{k}}{\mathbf{k}}\right) \tag{8}$$

Thus,the gain α in bit length is α ¼ log <sup>2</sup> 1 þ n � 2k k � � � � � � � � � � <sup>¼</sup> log <sup>2</sup> j jt k � � � � � � � � � � : (9)


#### Table 2.

Running times (ms) using affine coordinates.

SR technique is tested in affine coordinates. The scalars are in binary and NAF form combined with the scalar reduction scheme. The gain rate depends on the value of k. For comparison, if (αrs/da), (αrs/naf), and (αrs�naf/da) define, respectively, the gain rate of the scalar reduction (SR) method compared to double-and-add (DA), NAF and SR combined with NAF are compared to DA. The results are given in Table 2.

processors to reduce the average time of the scalar multiplication. As in the producer-consumer problem, the first processor initially reads P and then keeps

forms additions. The computation is terminated when there is no more 2<sup>i</sup>

For other schemes, this technique of parallelization consists in partitioning the scalar k (represented on l bits) into m fixed-size blocks on SIMD architectures [28]. This partitioning generates precomputed points that need to be calculated and

Recent work [29], inspired by [30], uses this technique in m blocks of length v bits in wireless sensor networks. The scalar is represented on l bits and is divided into m blocks Bi of length vb =l/m according to m sensors chosen to participate in

This partitioning generates precomputed points Pi = 2ibP. For example, consider a scalar k of 160 bits and point P; we want to compute kP on four sensors. The scalar

> þð Þ <sup>B</sup>1:40B1:41……B1:<sup>79</sup> <sup>2</sup>:240<sup>P</sup> |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} blockB<sup>1</sup>

<sup>P</sup> <sup>þ</sup> ………: <sup>þ</sup> Bm�12ð Þ <sup>m</sup>�<sup>1</sup> <sup>v</sup>

þ

P with lj the bit on position j on the binary sequence of

<sup>P</sup> <sup>þ</sup> <sup>B</sup>222<sup>v</sup>

þð Þ <sup>B</sup>3:120B3:121……B3:<sup>159</sup> <sup>2</sup>2159<sup>P</sup> |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} blockB<sup>3</sup> Precomputed points are 240P, 280P, and 2120P. Note that all parallelism techniques based on scalar partitioning generate pre-calculated points, which must first

P into the buffer whenever a

P in the

P (10)

P from the buffer and per-

scanning ki and computing point doubling. It writes 2i

Algorithm 9. Elliptic curve exponentiation based on precomputation.

A Survey of Fast Scalar Multiplication on Elliptic Curve Cryptography for Lightweight…

Input: Data k <sup>¼</sup> <sup>∑</sup><sup>l</sup>�<sup>1</sup>

for 0≤j≤15 do

end for i from 15 to 0 do T 2T T 2T þ A ui ½ �þ B vi ½ �

> end Return T

<sup>i</sup>¼<sup>0</sup>k32iþ<sup>j</sup>þ2i

<sup>i</sup>¼<sup>0</sup>k32iþ16þ<sup>j</sup>þ2<sup>i</sup> A 0½ �¼ ∞ B 0½ �¼ ∞ T ¼ ∞

Output: kP, Begin

ui¼∑<sup>4</sup>

vi¼∑<sup>4</sup>

� � � � � � � � � � � � � � � �

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � end

� � � � � <sup>i</sup>¼<sup>0</sup>ki2<sup>i</sup> P

DOI: http://dx.doi.org/10.5772/intechopen.86584

non-zero ki is detected. The second processor reads 2<sup>i</sup>

4.3 Parallelization by partitioning the scalar

stored prior to starting parallel calculations.

kP <sup>¼</sup> <sup>B</sup>020<sup>v</sup>

iv li2<sup>j</sup>

kP <sup>¼</sup> ð Þ <sup>B</sup>0:0B0:1……B0:<sup>39</sup> <sup>2</sup>:20<sup>P</sup> |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} blockB<sup>0</sup>

k is broken down into four blocks of 40 bits:

<sup>P</sup> <sup>þ</sup> <sup>B</sup>121<sup>v</sup>

buffer.

the computation.

length l.

57

where Bi=∑ivþv�<sup>1</sup>

ð Þ <sup>B</sup>2:80B2:81……B2:<sup>119</sup> <sup>2</sup>280<sup>P</sup> |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} blockB<sup>2</sup>

#### 4. Parallelization of scalar multiplication on scalar arithmetic

Parallel computing is another choice for accelerating computation and balancing workload. For distributed system, a task can be divided into smaller ones which are then carried out simultaneously by different processors. The parallel computing for accelerating computation of scalar multiplication is a very hot research topic in cryptography. It can be also achieved through one or more arithmetic levels: on the formulas of operations such as addition and doubling, between the operations themselves, or on the scalar by partitioning it. In most current works, various solutions have been proposed in literature, but in this chapter we present works based on scalar arithmetic.

#### 4.1 Efficient elliptic curve exponentiation

The efficient elliptic curve exponentiation based on point precomputation is proposed in [24]. To calculate Q = kP where Q and P are 2 points represented in Jacobian coordinates and k is a positive integer of 160 bits, a precomputed table which consists of 62 points is prepared.

A s½�¼ <sup>∑</sup><sup>4</sup> <sup>j</sup>¼<sup>0</sup>as, <sup>j</sup> <sup>2</sup>32jG3 and B s½�¼ <sup>∑</sup><sup>4</sup> <sup>j</sup>¼<sup>0</sup>as, <sup>j</sup> <sup>2</sup>16þ32jG3 where 1≤s≤31 and as,0…,as,0 is a binary representation of s <sup>¼</sup> <sup>∑</sup><sup>4</sup> <sup>j</sup>¼<sup>0</sup>as,j2j . Then calculation of kP is done by Algorithm 9.

Since this method is based on precomputation, a precomputed table is prepared, and the exponentiation loop can be performed separately by different processors.

#### 4.2 Parallel scalar multiplication on two processors

In [27], two processors and a circular buffer are used to perform parallel scalar multiplication. A buffer acts as a communication channel between the two

A Survey of Fast Scalar Multiplication on Elliptic Curve Cryptography for Lightweight… DOI: http://dx.doi.org/10.5772/intechopen.86584

Algorithm 9. Elliptic curve exponentiation based on precomputation.

```
Input: Data k ¼ ∑l�1
                        i¼0ki2i
                                P
Output: kP,
Begin
      for 0≤j≤15 do
      ui¼∑4
             i¼0k32iþjþ2i
    vi¼∑4
           i¼0k32iþ16þjþ2i
           A 0½ �¼ ∞
           B 0½ �¼ ∞
             T ¼ ∞
  �
  �
  �
  �
  �
  �
  �
  �
  �
  �
  �
  �
  �
  �
  �
  �
              end
   for i from 15 to 0 do
            T 2T
  T 2T þ A ui ½ �þ B vi ½ �
 �
 �
 �
 �
 �
              end
           Return T
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
end
```
SR technique is tested in affine coordinates. The scalars are in binary and NAF form combined with the scalar reduction scheme. The gain rate depends on the value of k. For comparison, if (αrs/da), (αrs/naf), and (αrs�naf/da) define, respectively, the gain rate of the scalar reduction (SR) method compared to double-and-add (DA), NAF and SR combined with NAF are compared to DA. The results are given

Parallel computing is another choice for accelerating computation and balancing workload. For distributed system, a task can be divided into smaller ones which are then carried out simultaneously by different processors. The parallel computing for accelerating computation of scalar multiplication is a very hot research topic in cryptography. It can be also achieved through one or more arithmetic levels: on the formulas of operations such as addition and doubling, between the operations themselves, or on the scalar by partitioning it. In most current works, various solutions have been proposed in literature, but in this chapter we present works

The efficient elliptic curve exponentiation based on point precomputation is proposed in [24]. To calculate Q = kP where Q and P are 2 points represented in Jacobian coordinates and k is a positive integer of 160 bits, a precomputed table

<sup>j</sup>¼<sup>0</sup>as, <sup>j</sup>

Since this method is based on precomputation, a precomputed table is prepared, and the exponentiation loop can be performed separately by different processors.

In [27], two processors and a circular buffer are used to perform parallel scalar

multiplication. A buffer acts as a communication channel between the two

216þ32j

<sup>j</sup>¼<sup>0</sup>as,j2j

G3 where 1≤s≤31 and

. Then calculation of kP is

4. Parallelization of scalar multiplication on scalar arithmetic

6

n 3

√ 6578 6573 7600 6416 6600 27

√ √ α (sr/da) 2.12% 4.77% 99.63%

√ 6282 6326 5317 6239 6698 5114

√ √ 6279 6325 5320 6100 6556 27

√ √ α (sr/naf) 1.46% 99.47% √ √√ α (sr-naf/da) 6.94% 5.41% 99.63%

n <sup>2</sup> 2n 3

√ 6579 6572 7604 6555 6931 7471

5n

<sup>6</sup> n-1

in Table 2.

SR, scalar reduction.

Running times (ms) using affine coordinates.

Table 2.

based on scalar arithmetic.

A s½�¼ <sup>∑</sup><sup>4</sup>

56

done by Algorithm 9.

4.1 Efficient elliptic curve exponentiation

NAF SR DA Gain <sup>n</sup>

Modern Cryptography – Current Challenges and Solutions

which consists of 62 points is prepared.

as,0…,as,0 is a binary representation of s <sup>¼</sup> <sup>∑</sup><sup>4</sup>

G3 and B s½�¼ <sup>∑</sup><sup>4</sup>

4.2 Parallel scalar multiplication on two processors

<sup>j</sup>¼<sup>0</sup>as, <sup>j</sup> 232j processors to reduce the average time of the scalar multiplication. As in the producer-consumer problem, the first processor initially reads P and then keeps scanning ki and computing point doubling. It writes 2i P into the buffer whenever a non-zero ki is detected. The second processor reads 2<sup>i</sup> P from the buffer and performs additions. The computation is terminated when there is no more 2<sup>i</sup> P in the buffer.

#### 4.3 Parallelization by partitioning the scalar

For other schemes, this technique of parallelization consists in partitioning the scalar k (represented on l bits) into m fixed-size blocks on SIMD architectures [28]. This partitioning generates precomputed points that need to be calculated and stored prior to starting parallel calculations.

Recent work [29], inspired by [30], uses this technique in m blocks of length v bits in wireless sensor networks. The scalar is represented on l bits and is divided into m blocks Bi of length vb =l/m according to m sensors chosen to participate in the computation.

$$kP = B\_0 \mathcal{Z}^{0v} P + B\_1 \mathcal{Z}^{1v} P + B\_2 \mathcal{Z}^{2v} P + \dots \\ \dots \\ \dots \\ + B\_{m-1} \mathcal{Z}^{(m-1)v} P \tag{10}$$

where Bi=∑ivþv�<sup>1</sup> iv li2<sup>j</sup> P with lj the bit on position j on the binary sequence of length l.

This partitioning generates precomputed points Pi = 2ibP. For example, consider a scalar k of 160 bits and point P; we want to compute kP on four sensors. The scalar k is broken down into four blocks of 40 bits:

$$\begin{aligned} kP &= \underbrace{(B\_{0.0}B\_{0.1}\dots...B\_{0.39})\_2\cdot 2^0P}\_{blabel\_0} + \underbrace{(B\_{1.40}B\_{1.41}\dots...B\_{1.79})\_2\cdot 2^{40}P}\_{blabel\_1} + \\ &\underbrace{(B\_{2.80}B\_{2.81}\dots...B\_{2.119})\_2\cdot 2^{80}P}\_{blabel\_2} + \underbrace{(B\_{3.120}B\_{3.121}\dots...B\_{3.159})\_2\cdot 2^{159}P}\_{blabel\_3} \end{aligned}$$

Precomputed points are 240P, 280P, and 2120P. Note that all parallelism techniques based on scalar partitioning generate pre-calculated points, which must first be calculated and stored, thus leading to additional memory and energy consumption.

In [31], a parallel computation of kP between N sensor nodes is presented by partitioning the scalar k to m blocks of length v = k/N bits, and each block is computed by one sensor node. A distributed algorithm (double-and-add, NAF, etc.) composed of m blocks is also proposed, and each block mi of the distributed algorithm operates on one block mi of the scalar. Algorithms 10 and 11 show, respectively, block i for double-and-add and NAF algorithms.

as k from the (m-1) points derived from calculation of scalar multiplication on (m-1) blocks. For each block diP, it needs to find di. And then after, it also needs to

A Survey of Fast Scalar Multiplication on Elliptic Curve Cryptography for Lightweight…

The predominance of scalar multiplication in all operations makes the performance of the cryptosystem relatively based on this scalar operation. Theoretically, the efficiency of the formula using Jacobian coordinates can be determined by the number of multiplication (M) and of square (S) operations which compose it. Operations like addition, subtraction denoted by A, and multiplication with a constant are negligible when faced with square and multiplication of two variables. It is widely accepted that the cost of square is equivalent to 0.6–1 of the cost of multiplication [32–34]. Hence, for a scalar multiplication with a scalar of length of n bits, we can determine the ratio (r=S/M) from which each approach justifies better

To perform fast computation of scalar multiplication, which is the major computation involved in ECC, much research has been devoted to the point arithmetic level and the scalar arithmetic. In this chapter, we have presented only works on scalar arithmetic level. All the methods studied are almost based on scanning bits or digits of the scalar with a scan step. In the comparative studies, we found that calculations can be faster if the number of bits scanned is higher. However, scanning a number of bits greater than 1 results in precomputed points that need to be computed or stored before. In future works, we can explore mechanisms for accelerating calculation of precomputed points in order to avoid storing them. Like computing point doubling formula, we can consider effective point operation for-

)P before getting scalar k.

mulas which should allow to increase the scan step.

2 University of Franche Comte, Femto-St, France

\*Address all correspondence to: yfaye@univ-zig.sn

\*, Hervé Guyennet<sup>2</sup> and Ibrahima Niang3

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium,

perform (m!2v

performance.

5. Conclusion

Author details

1 University Assane Seck, Senegal

3 University Cheikh Anta Diop, Senegal

provided the original work is properly cited.

Youssou Faye<sup>1</sup>

59

4.4 Performance measurement

DOI: http://dx.doi.org/10.5772/intechopen.86584

```
Algorithm 10. Double-And-Add for node i
```

```
Input: d=(dv-1,……….., d1,d0)2, P ∈ E (Fp)
Output: Q=[d]P
Begin
 Q ∞
  forj 0 to v � 1 do
       ifdj ¼ 1 then
      Q Q þ 2viP 

           end
    P 2P

            end
       return Qð Þ

end
                                                     // begin scanning bits from right-to-left.
                                                     //2viP is the pre-computed point
```
Algorithm 11. NAF method for i.

```
Input: NAF(d)= (dv-1,……….., d1,d0), P ∈ E (Fp)
Output: Q= [d]P
Begin
            Q ∞
      for j 0 to v � 1 do
             P 2Q
  if dj ¼ 1   then
    Q Q þ 2viP 

               end
   if dj ¼ �1   then
    Q Q � 2viP 

               end

               end
           return ð Þ Q

end
                                                     // begin scan from right to left step by step
                                                     // compute point doubling
                                                     // 2viP is the pre-computed point
```
So as not to compromise security when partitioning scalar, the reliability and efficiency are taken into account. They demonstrate that after partitioning the scalar k to m blocks of length v, the node which leads calculation keeps one of the m blocks into its local memory and distributes (m-1)blocks to others nodes. In this case, a possibility is to send the (m-1) blocks securely by symmetric encryption. If blocks are sent randomly without encryption, the intruder, after gaining (m-1) blocks of the m blocks, must perform (m!2v )P to find the private scalar k. Moreover, if the intruder gains the (m-1) results sent by other nodes, security is not compromised; it has to deal against the ECDLP. So, it is as difficult to find k from kP A Survey of Fast Scalar Multiplication on Elliptic Curve Cryptography for Lightweight… DOI: http://dx.doi.org/10.5772/intechopen.86584

as k from the (m-1) points derived from calculation of scalar multiplication on (m-1) blocks. For each block diP, it needs to find di. And then after, it also needs to perform (m!2v )P before getting scalar k.

### 4.4 Performance measurement

The predominance of scalar multiplication in all operations makes the performance of the cryptosystem relatively based on this scalar operation. Theoretically, the efficiency of the formula using Jacobian coordinates can be determined by the number of multiplication (M) and of square (S) operations which compose it. Operations like addition, subtraction denoted by A, and multiplication with a constant are negligible when faced with square and multiplication of two variables. It is widely accepted that the cost of square is equivalent to 0.6–1 of the cost of multiplication [32–34]. Hence, for a scalar multiplication with a scalar of length of n bits, we can determine the ratio (r=S/M) from which each approach justifies better performance.
