**Innovative Multilingual CAPTCHA Based on Handwritten Characteristics Handwritten Characteristics**

**Innovative Multilingual CAPTCHA Based on** 

DOI: 10.5772/intechopen.72599

Maha Hamad Aldosari Maha Hamad Aldosari Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.72599

#### **Abstract**

Completely Automated Public Turing Test to Tell Computers and Humans Apart (CAPTCHA) is a kind of test which is commonly used by different websites on the Internet to differentiate between humans and automated bots. Most websites require users to pass the CAPTCHA before signing up or filling out most forms. CAPTCHA today is even used on some mobile applications to provide a higher security level that can protect websites and mobile applications against malicious attacks by automated bots and spammers. The technique essentially relies on employing the human recognition ability, which is not available in automated bots or machines, through leveraging the handwriting characteristics in designing CAPTCHA. The novelty of the technique proposed in this work is that it adopts handwritten characters of four different languages (English, Arabic, Spanish, and French) to generate handwritten multilingual CAPTCHA text. The technique was duly tested and the initial experiments' results for the technique have shown a promising security level that each of the techniques would provide.

**Keywords:** CAPTCHA, handwritten CAPTCHA, web security, optical character recognition (OCR)

#### **1. Introduction**

Web applications have increased rapidly and become a daily necessity for most people [1]. Creating an email account, using social networking sites, and accessing websites are examples of day-to-day activities for Internet users. The fast evolution of the Internet means that the security aspect is being threatened [2]. The number of bots (robot) programs that attack websites has increased. These bots can bring down the site and cause a significant amount of damage. These attacks can take many forms such as DDoS attacks, viruses, worms, and many other malicious devices. They are also considered as the primary reason for email spam [3].

Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons

Therefore, it is obvious that stopping such bots by means of a reliable Completely Automated Public Turing Test to Tell Computers and Humans Apart (CAPTCHA) is inevitable. More so, in a multilingual world, multilingual CAPTCHAs are indispensable.

• The second step is samples distribution. In this step, the CAPTCHA characters' samples were distributed to 100 volunteers; each volunteer wrote the samples' characters of the 4 adopted languages by their own handwriting style [1]. As a result, we had a total of 100 different instances with different handwriting styles for each character in each language. So, we ended up with almost more than 10,700 samples' characters that need to be stored on a

Innovative Multilingual CAPTCHA Based on Handwritten Characteristics

http://dx.doi.org/10.5772/intechopen.72599

165

• The third step is transforming samples into digital format. Here, all the collected samples

• The fourth step is sorting data. In this step, we sorted out the collected data into four languages. So, at this point, we have 4 sets of images, each set belongs to one language, and there are 100 different images for each character in each language. Moreover, 4 tables were created on the database to store the images that will be used later to generate the CAPTCHA [1]. • The fifth step is classifying the worldwide countries into categories according to the spoken language there. The countries go with one of the four adopted languages; however, the rest of the countries where their main spoken language is not one from the four adopted languages, we classify them as English-speaking country. After that we stored the countries

• The sixth step is identifying a list of inappropriate words in each language and storing it on the database as well. **Figure 1** summarizes the data gathering and preparation phase steps.

Choosing and utilizing the handwriting in designing new CAPTCHA technique was not decided randomly with any logical reasons. On the contrary, it was chosen after a quite long search and study of what characteristics the handwriting has, and how it could be utilized in

Nevertheless, the handwriting in general has some characteristics that can only be utilized by humans. Due to the human brain's superior ability, the brain can analyze and recognize unclear handwritten characters and digits; it also can recognize various different

database in order to be used later in the CAPTCHA implementation phase.

were scanned and stored in digital formats (images).

list with their matched languages on the database [1].

**Figure 2** shows an abstract view of the technique process.

**Figure 1.** Steps of the data gathering and preparation phase.

**2.2. Algorithm technique**

security field.

**2.3. Handwriting characteristics**

Completely Automated Public Turing Test to Tell Computers and Humans Apart () is considered one of the most common techniques that can be used to distinguish between humans and artificial agents (or bots). For time being, the exponential growth of free web services has led to the misuse of automated bots and spam [4], which has resulted in serious security issues in web services. Using CAPTCHA in its various types has proven to be effective in protecting websites, and the services they provide, from any harm caused by bots' attacks [1].
