**3.3 Dynamic analysis**

In dynamic analysis, the application actions are dynamically analyzed and monitored during the execution time. The unexecuted code might be missed by this approach, but it can effectively detect the malware behaviors which are not detectable by the static analysis. Since this approach occurs during runtime, it can be performed in a controlled environment to avoid damaging the device [52].

Android dynamic malware analysis detection techniques (see **Figure 3**) can be classified into [53, 54]:


Tam [55] applied dynamic analysis method and machine language to detect malware. They capture real-time system calls performed by the application as key information to discriminate between ransomware, malware, and trusted files and

**51**

*Android Application Security Scanning Process DOI: http://dx.doi.org/10.5772/intechopen.86661*

obfuscation techniques.

**3.4 Ransomware detection**

called it CopperDroid. CopperDroid runs the Android application in the sandbox and records all system calls, in particular inter-process communications (IPC) and remote procedure call (RPC) interactions which are essential to understanding an application maliciousness behavior. However, some types of malware can detect the virtual environment and act differently (as a benign) which gives false positives. Recent research [56] dynamically classifies Android applications to malicious or benign in the first launching of the app. The classification is applied based on the frequency of system calls as an indicator of suspicious behavior. They have built a syscallcapture system to capture and analyze the behavior of system call traces made by each application during their runtime. They have achieved an accuracy level of 85% and 88% using the decision tree algorithm and the random forest algorithm, respectively. Also, Wang [57] proposed a dynamic analysis to analyze an application on the fly to detect malicious behavior. They developed a prototype called Droid-AntiRM to identify malware applications that employ anti-analysis techniques. The prototype identifies the condition statements in applications that could trigger the malicious acts of malware, which are unable to be recognized by static analysis. However, their prototype cannot handle dynamic code loading, encryption, or other various

Many tools have been developed based on the dynamic perspective such as TaintDroid [44], Droidbox [58], and MobSF [59]. Additionally, some tools are considering both static and dynamic analysis in their solutions such as VirusTotal tool [60].

Unfortunately, there were very few researches studying ransomware where the malicious app blocks access to the Android device or/and its data. In [61] the authors presented a tool called Cryptolock that focuses on detecting ransomware by tracking the changes in real-time user data. They have implemented the tool on Windows platform. However, Cryptolock may send a false-positive alert because it cannot differentiate whether the user or the ransomware is encrypting a set of files [62]. They focus on changes on user's data rather than trying to discover ransomware by investigating its execution (e.g., API call monitoring and access permissions).

HelDroid tool [63] was developed to analyze Android ransomware and to detect both crypto and locking ransomwares. The tool includes a text classifier that uses natural language processing (NLP) features, a lightweight Smali emulation technique to detect the locking scheme, and the application of taint tracking for detecting fileencrypting flows. The primary disadvantage of this approach is that it highly depends on a text classifier as it assumes the availability of text. Also, it cannot be applied to some languages that have no specific phase structure like Chinese, Korean, and Japanese. This approach can be easily avoided by ransomware by applying techniques such as encryption and code obfuscation [63]. Moreover, like whatever machine learning approach, HelDroid trains the classifier in order to label an app as a ransomware. The detection capability of the model depends on the training dataset [64–66]. Another work in literature exploring the ransomware detection in Android mobiles was presented in [67]. The authors presented R-PackDroid as a static analysis approach that classifies Android applications into ransomware, malware, or benign using random forest classifier. The classification employed was based on information taken from the system API packages. An advantage over the previous approach (HelDroid) is its ability to detect ransomware regardless of the application language. Also, it flags the applications that were recognized as ransomware with very low confidence by the VirusTotal service. However, R-PackDroid cannot analyze applications with a feature code that is dynamically loaded at runtime or

classes that are fully encrypted because it relies on static analysis.

#### *Android Application Security Scanning Process DOI: http://dx.doi.org/10.5772/intechopen.86661*

*Telecommunication Systems – Principles and Applications of Wireless-Optical Technologies*

applications because it relies on static analysis [47].

the malicious behavior of an application.

it displayed similar behaviors of malware.

rendering it useless to be applied in real time.

**3.3 Dynamic analysis**

classified into [53, 54]:

static analysis in a machine learning system to distinguish malware from trusted applications. They considered linear support vector machines for classification. This approach, however, cannot detect runtime loaded and obfuscated malicious

Yang [49] developed a prototype (AppContext) that detects malicious apps using static analysis. They mined 633 benign apps from Google Play and 202 malware apps from various malware datasets. AppContext identifies applications using machine learning based on the contexts that trigger security-sensitive behaviors (e.g., the conditions and events that cause the security-sensitive behaviors to occur). But this approach can be evaded by dynamic code loading and consumes

Akhuseyinoglu and Akhuseyinoglu [51] have proposed an automated featurebased static malware detection system called AntiWare for Android devices. It is automated since it engages the machine learning method for detecting malicious applications by using the extracted apps' features. They took into consideration the requested permissions and Google market data, including developer name, download time, and user ratings from Google Play as a feature set. AntiWare is designed to predict the rank of an application inquired by the user as malicious or benign and then report the results to the user. The main disadvantages are primarily depending on market data on Google Play and the requested permissions. The market data is not reliable since a lot of applications are invented by different new developers every second. Additionally, the permissions by its own are not sufficient to assess

In dynamic analysis, the application actions are dynamically analyzed and monitored during the execution time. The unexecuted code might be missed by this approach, but it can effectively detect the malware behaviors which are not detectable by the static analysis. Since this approach occurs during runtime, it can be performed in a controlled environment to avoid damaging the device [52].

Android dynamic malware analysis detection techniques (see **Figure 3**) can be

• **Anomaly detection:** the anomaly-based detection has the ability to identify suspicious behaviors to indicate the presence of malware. A drawback for this technique is that it can sometimes flag a benign application as malware because

• **Taint analysis:** it is an efficient technique that checks and monitors sensitive information; however, a limitation is that the performance becomes very slow

• **Emulation-based detection:** it is a detection technique, where it scans the application behavior by simulating the conditions of its execution environment to determine if the application is a benign or malware application from the behavior. Similar to this technique is sandbox-based detection, but the main difference originates from the details of designing each approach. A major

Tam [55] applied dynamic analysis method and machine language to detect malware. They capture real-time system calls performed by the application as key information to discriminate between ransomware, malware, and trusted files and

drawback for this approach is that it requires more resources.

huge human efforts in labeling each security-sensitive behavior [50].

**50**

called it CopperDroid. CopperDroid runs the Android application in the sandbox and records all system calls, in particular inter-process communications (IPC) and remote procedure call (RPC) interactions which are essential to understanding an application maliciousness behavior. However, some types of malware can detect the virtual environment and act differently (as a benign) which gives false positives.

Recent research [56] dynamically classifies Android applications to malicious or benign in the first launching of the app. The classification is applied based on the frequency of system calls as an indicator of suspicious behavior. They have built a syscallcapture system to capture and analyze the behavior of system call traces made by each application during their runtime. They have achieved an accuracy level of 85% and 88% using the decision tree algorithm and the random forest algorithm, respectively.

Also, Wang [57] proposed a dynamic analysis to analyze an application on the fly to detect malicious behavior. They developed a prototype called Droid-AntiRM to identify malware applications that employ anti-analysis techniques. The prototype identifies the condition statements in applications that could trigger the malicious acts of malware, which are unable to be recognized by static analysis. However, their prototype cannot handle dynamic code loading, encryption, or other various obfuscation techniques.

Many tools have been developed based on the dynamic perspective such as TaintDroid [44], Droidbox [58], and MobSF [59]. Additionally, some tools are considering both static and dynamic analysis in their solutions such as VirusTotal tool [60].

#### **3.4 Ransomware detection**

Unfortunately, there were very few researches studying ransomware where the malicious app blocks access to the Android device or/and its data. In [61] the authors presented a tool called Cryptolock that focuses on detecting ransomware by tracking the changes in real-time user data. They have implemented the tool on Windows platform. However, Cryptolock may send a false-positive alert because it cannot differentiate whether the user or the ransomware is encrypting a set of files [62]. They focus on changes on user's data rather than trying to discover ransomware by investigating its execution (e.g., API call monitoring and access permissions).

HelDroid tool [63] was developed to analyze Android ransomware and to detect both crypto and locking ransomwares. The tool includes a text classifier that uses natural language processing (NLP) features, a lightweight Smali emulation technique to detect the locking scheme, and the application of taint tracking for detecting fileencrypting flows. The primary disadvantage of this approach is that it highly depends on a text classifier as it assumes the availability of text. Also, it cannot be applied to some languages that have no specific phase structure like Chinese, Korean, and Japanese. This approach can be easily avoided by ransomware by applying techniques such as encryption and code obfuscation [63]. Moreover, like whatever machine learning approach, HelDroid trains the classifier in order to label an app as a ransomware. The detection capability of the model depends on the training dataset [64–66].

Another work in literature exploring the ransomware detection in Android mobiles was presented in [67]. The authors presented R-PackDroid as a static analysis approach that classifies Android applications into ransomware, malware, or benign using random forest classifier. The classification employed was based on information taken from the system API packages. An advantage over the previous approach (HelDroid) is its ability to detect ransomware regardless of the application language. Also, it flags the applications that were recognized as ransomware with very low confidence by the VirusTotal service. However, R-PackDroid cannot analyze applications with a feature code that is dynamically loaded at runtime or classes that are fully encrypted because it relies on static analysis.

Likewise, Mercaldo [68] focused on ransomware detection specifically in Android. They tested a dataset composed of 2477 samples with real-world ransomware and benign applications. The main issue of this approach is that it is manual and requires a lot of effort to analyze and build logic rules used for the classification [69].

Another automated detection approach was introduced in [70] to analyze and penetrate the malicious ransomware. They have introduced some features of static and dynamic analysis of malware. In static analysis, malicious features can be discovered with permissions, API calls, and APK structure, while malicious features in the dynamic analysis may include access to sensitive data or sensitive paths, access to the HTTP server, and user charge without notification and bypass permissions. The aim was to produce a better performance apparatus that supports ransomware detection in Android mobiles which they have designed but did not implement. The authors analyzed one malware and listed the steps of APK analysis as a concept but did not implement the proposed design. Therefore, there are no results that can prove the effectiveness of their approach.

In [71], the authors experimentally presented a new framework called DNADroid which is a hybrid of static and dynamic techniques. This framework employs a static analysis approach to classify apps into suspicious, malware, or trusted. Only suspicious classified applications are then inspected by dynamic analysis to determine if it is ransomware or not. The main weakness is that dynamic analysis is only applied to suspicious applications leaving the possibility of having malware not successfully recognized by static analysis.

#### **3.5 Dataset creation and utilization**

Datasets are mainly in two types. The first type is the Android application datasets. These include both benign apps and malicious apps. For the benign apps, the majority of researchers are collecting them from the app stores like Google Play Store [30, 37, 60]. For malicious apps, it depends on malicious behavior under study. For example, for malware Android apps, VirusTotal was one of the main sources for many researchers [38, 60]. For ransomware apps, HelDroid project [63] and RansomProper project [38] were also used.

The second type is the datasets generated after extracting the app's features. The researchers can either use existing constructed datasets considering the features under study or build new ones by screening the apps and extracting their features. The main concerns regarding the use of existed datasets are (1) absence of up-todate apps and operating system version (2) including many duplicate samples (3) and not being accessible. These reasons could motivate researchers to build their own up-to-date and labeled datasets.

### **4. Android malware application detection and ranking**

Many previous works have considered the problem of ranking Android apps and classify them to either malware or benign apps. The majority of these solutions have relied mainly only on the permission model and what types of permissions are requested/used by the application. They used different ways and depth of analysis in this regard.

The work presented in [72] studied the permission occurrences in the market apps and the malware apps. Also, the authors analyzed the rules (a combination of permissions) defined in Kirin [73] in order to calculate the risk signals and to reduce the warning rates. Gates et al. in [74] have compared work presented in [72, 73], naïve-based algorithms and two proposed methods for risk scoring. These methods

**53**

*Android Application Security Scanning Process DOI: http://dx.doi.org/10.5772/intechopen.86661*

consider the rarity of permissions as the primary indicator that contributes to raising a warning. The performance comparison was in terms of the detection rate. The authors in [75] used similar hypotheses of listing the permissions in each app and count occurrences of permissions in similar apps (a game category in their case) excluding the user-defined permissions that are not affecting the privacy. In their solution, they gave the user the choice to turn off the permission(s). In [76], the authors used the combination of features (permissions) to compare the clean app values with the malware values to come up with thresholds that will be used to classify new Android apps. Within the same context, the idea presented in [77] was to construct a standard permission vector model for each category, which can be used as a baseline to measure and assess the risk of applications within the same category. For each downloaded app, the permission vector will be extracted and compared with the standard one; the amount of deviation from the baseline will calculate the app's risk. While discussing the approaches in the existing risk scoring systems and their main dependency on the Android permissions, it is worth asking how many of them have considered the involvement of the user with the scoring results and, if they decided to involve the user, how the risk was displayed and communicated to the user. The empirical study conducted in [78], which implemented an intensive study on top, used permissions with a high-risk level. They calculated the risk level based on the type of permissions and the probability they will be requested by the app. The risk value for each permission in addition to its technical name and description was transmitted to the users. Although a coloring code was used to indicate the level of risk, still users are involved in technical details which will not help them to take proper decisions regarding the apps' installation. The work presented in [79] has utilized fuzzy logic to measure the risk score. Also, in addition to the permissions and their categories, they took input from different antivirus tools to calculate the score. Their system allowed the user to upload the app's APK through the browser and provided them with a risk report. This report showed the risk score, permission usage rate, and unnecessary permission usage rate in addition to the list of permissions, their categories, and risk level. On the other hand, the authors in [80] have considered the statistical distribution of the Android permissions in addition to the probabilistic functions. The declared but not exploited permissions and vice versa were all consid-

ered in their risk analysis. Machine learning was also utilized to measure risk.

[85] using lifelog analysis views in terms of risk history and app's risk view.

evaluation and consequently an accurate detection of malware apps.

From the above-related work, we can observe that the majority of the previous solutions have mainly relied on permissions either statistically or based on probability to analyze Android apps, to classify them as malwares, and to measure their risk level. Although permissions are important to analyze and classify Android applications. However, these permissions should be up-to-date. Also, other important static and dynamic metrics need to be considered to guarantee a comprehensive

In terms of visualizing the permissions and their risks, the authors in [81] introduced Papilio to visualize Android application permissions graphically. This helped them to find the relations among the applications and applications' permissions as well. Papilio was able to find the permissions requested frequently by applications and permissions that either never requested or requested infrequently. The authors in [82] stressed the importance of visualizing the statistical information related to Android permissions. Having graphical representation for the permissions' statics within a specific category encouraged the users to choose more often apps with a lower number of permissions. A privacy meter was used in [83] to visualize the permissions' statistics through a slider bar which outperformed the existing warning system like Google's permission screens. Visualizing app activities enhances user's awareness and sensitivity to the privacy intrusiveness of mobile applications [84]. Another attempt to visualize the permissions statistics was also introduced in

#### *Android Application Security Scanning Process DOI: http://dx.doi.org/10.5772/intechopen.86661*

*Telecommunication Systems – Principles and Applications of Wireless-Optical Technologies*

classification [69].

prove the effectiveness of their approach.

**3.5 Dataset creation and utilization**

own up-to-date and labeled datasets.

malware not successfully recognized by static analysis.

and RansomProper project [38] were also used.

**4. Android malware application detection and ranking**

Likewise, Mercaldo [68] focused on ransomware detection specifically in Android. They tested a dataset composed of 2477 samples with real-world ransomware and benign applications. The main issue of this approach is that it is manual and requires a lot of effort to analyze and build logic rules used for the

Another automated detection approach was introduced in [70] to analyze and penetrate the malicious ransomware. They have introduced some features of static and dynamic analysis of malware. In static analysis, malicious features can be discovered with permissions, API calls, and APK structure, while malicious features in the dynamic analysis may include access to sensitive data or sensitive paths, access to the HTTP server, and user charge without notification and bypass permissions. The aim was to produce a better performance apparatus that supports ransomware detection in Android mobiles which they have designed but did not implement. The authors analyzed one malware and listed the steps of APK analysis as a concept but did not implement the proposed design. Therefore, there are no results that can

In [71], the authors experimentally presented a new framework called DNADroid which is a hybrid of static and dynamic techniques. This framework employs a static analysis approach to classify apps into suspicious, malware, or trusted. Only suspicious classified applications are then inspected by dynamic analysis to determine if it is ransomware or not. The main weakness is that dynamic analysis is only applied to suspicious applications leaving the possibility of having

Datasets are mainly in two types. The first type is the Android application datasets. These include both benign apps and malicious apps. For the benign apps, the majority of researchers are collecting them from the app stores like Google Play Store [30, 37, 60]. For malicious apps, it depends on malicious behavior under study. For example, for malware Android apps, VirusTotal was one of the main sources for many researchers [38, 60]. For ransomware apps, HelDroid project [63]

The second type is the datasets generated after extracting the app's features. The researchers can either use existing constructed datasets considering the features under study or build new ones by screening the apps and extracting their features. The main concerns regarding the use of existed datasets are (1) absence of up-todate apps and operating system version (2) including many duplicate samples (3) and not being accessible. These reasons could motivate researchers to build their

Many previous works have considered the problem of ranking Android apps and classify them to either malware or benign apps. The majority of these solutions have relied mainly only on the permission model and what types of permissions are requested/used by the application. They used different ways and depth of analysis

The work presented in [72] studied the permission occurrences in the market apps and the malware apps. Also, the authors analyzed the rules (a combination of permissions) defined in Kirin [73] in order to calculate the risk signals and to reduce the warning rates. Gates et al. in [74] have compared work presented in [72, 73], naïve-based algorithms and two proposed methods for risk scoring. These methods

**52**

in this regard.

consider the rarity of permissions as the primary indicator that contributes to raising a warning. The performance comparison was in terms of the detection rate.

The authors in [75] used similar hypotheses of listing the permissions in each app and count occurrences of permissions in similar apps (a game category in their case) excluding the user-defined permissions that are not affecting the privacy. In their solution, they gave the user the choice to turn off the permission(s). In [76], the authors used the combination of features (permissions) to compare the clean app values with the malware values to come up with thresholds that will be used to classify new Android apps. Within the same context, the idea presented in [77] was to construct a standard permission vector model for each category, which can be used as a baseline to measure and assess the risk of applications within the same category. For each downloaded app, the permission vector will be extracted and compared with the standard one; the amount of deviation from the baseline will calculate the app's risk.

While discussing the approaches in the existing risk scoring systems and their main dependency on the Android permissions, it is worth asking how many of them have considered the involvement of the user with the scoring results and, if they decided to involve the user, how the risk was displayed and communicated to the user. The empirical study conducted in [78], which implemented an intensive study on top, used permissions with a high-risk level. They calculated the risk level based on the type of permissions and the probability they will be requested by the app. The risk value for each permission in addition to its technical name and description was transmitted to the users. Although a coloring code was used to indicate the level of risk, still users are involved in technical details which will not help them to take proper decisions regarding the apps' installation. The work presented in [79] has utilized fuzzy logic to measure the risk score. Also, in addition to the permissions and their categories, they took input from different antivirus tools to calculate the score. Their system allowed the user to upload the app's APK through the browser and provided them with a risk report. This report showed the risk score, permission usage rate, and unnecessary permission usage rate in addition to the list of permissions, their categories, and risk level. On the other hand, the authors in [80] have considered the statistical distribution of the Android permissions in addition to the probabilistic functions. The declared but not exploited permissions and vice versa were all considered in their risk analysis. Machine learning was also utilized to measure risk.

In terms of visualizing the permissions and their risks, the authors in [81] introduced Papilio to visualize Android application permissions graphically. This helped them to find the relations among the applications and applications' permissions as well. Papilio was able to find the permissions requested frequently by applications and permissions that either never requested or requested infrequently. The authors in [82] stressed the importance of visualizing the statistical information related to Android permissions. Having graphical representation for the permissions' statics within a specific category encouraged the users to choose more often apps with a lower number of permissions. A privacy meter was used in [83] to visualize the permissions' statistics through a slider bar which outperformed the existing warning system like Google's permission screens. Visualizing app activities enhances user's awareness and sensitivity to the privacy intrusiveness of mobile applications [84]. Another attempt to visualize the permissions statistics was also introduced in [85] using lifelog analysis views in terms of risk history and app's risk view.

From the above-related work, we can observe that the majority of the previous solutions have mainly relied on permissions either statistically or based on probability to analyze Android apps, to classify them as malwares, and to measure their risk level. Although permissions are important to analyze and classify Android applications. However, these permissions should be up-to-date. Also, other important static and dynamic metrics need to be considered to guarantee a comprehensive evaluation and consequently an accurate detection of malware apps.

There have been many types of research on designing malicious detection approaches. Such approaches resort to static analysis of the malware, and others use dynamic analysis, while some methods utilize both static and dynamic analyses to get better detection of a malicious incident. Moreover, the generated datasets will be analyzed in order to detect any potential security threats, regardless whether these datasets were constructed based on static or dynamic tests or even both. Usually, data mining techniques could be used for the purpose of detecting and classifying attacks [42, 52]. Moreover, intelligence techniques could be utilized to even rank the risk by assigning the attack a risk score [42, 86].

The scanning service might fruit in developing a mobile application that is installed on user's devices to examine the Android application and discriminate, if it is a clean app or a malicious app to warn the user and protect her/his Android device. DREBIN [87] is one of the malware detection systems available for smartphones. One of the major features that DREBIN provides is instantaneous malware detection. When a new application is downloaded, DREBIN starts the analyzing process directly. As a result, the user is protected against any unreliable sources. Another example of anti-malware software is HinDroid [88] which has been integrated as one of Comodo's mobile security scanning tools. HinDroid structures the APIs based on heterogeneous information network in order to make predictions about the tested application. Consequently, HinDroid can reduce the time and cost of analyzing Android apps.
