*Coding Theory - Recent Advances, New Perspectives and Applications*

code scanning environment. The code scanning environment consists of a web app, an integrated development environment (IDE) plugin, code analyzers and refactoring tools. Google Assistant was chosen as the virtual assistant because of its popularity and easy-to-use App Engine and Dialogflow frameworks. The process flow is as follows: a user invokes a Google Assistant app (aka, Google Actions app) using a set of phrases understood by the system. This app is specially designed to understand trigger phrases associated with code analysis. Trigger phrases are training phrases that are entered into Dialogflow using an intent management system. Dialogflow is a natural language understanding platform that allows users to design and integrate a conversational user interface into a mobile app, web application, device, bot, interactive voice response system, etc. [34]. **Figure 6** captures the current intents incorporated into MyCodeAnalyzer. Each intent is backed by machine learning and NLP technology that uses named entity recognition (NER) and other approaches to extract entities from speech, determine context, and carry out tasks.


#### **Figure 6.**

*Current Dialogflow intents used by MyCodeAnalyzer.*

*Conversational Code Analysis: The Future of Secure Coding DOI: http://dx.doi.org/10.5772/intechopen.98362*

The intents in MyCodeAnalyzer are organized into 6 main categories: *Default Welcome Intent, vulnerability-scanning, clone-detection, Cancel, Bye, and Default Fallback Intent*. The *Default Welcome Intent* is used to welcome the user to the system and provide a description of potential requests that the application can fulfill. The *vulnerability-scanning* intent is the most complex of the intents and uses a tree-like structure to allow the user to conversationally scan a project for vulnerabilities, email a scan report or auto-fix errors based on the capabilities of the code analyzer. The *clone-detection* intent is used to scan a project for duplicated code and to provide a visualization showing a side-by-side comparison of similar code. While clones may not be vulnerable, they could become bloat in a project and could potentially lead to vulnerabilities. The *Cancel* intent is used to exit a task currently underway. *Bye* is used to exit the system and the *Default Fallback Intent*, as the name suggests, is used to ask the user to repeat a phrase for clarification or serve as a graceful fail mechanism.

Once invoked, the Google Assistant app communicates with the Google Conversation API to determine the user's intent. After intent has been determined, the Google Actions app then uses webhooks to communicate with a web service running on the user's computer. Using a tunneling service, the web service interacts with the user's IDE by way of a plugin. This plugin invokes a code analyzer or refactoring tool, takes actions based on the user's request, and places a message in a message queue. The web service then reads the queue and returns the message to the Google Assistant app, which then reads the message back to the user. The webhooks were set up in Dialogflow and run as servlets on Google App Engine. A servlet accepts valid Dialogflow POST requests and responds with data that is processed by the Google Assistant app and returned as output messages to the user. **Figure 7** further shows the internals of the system during a conversation between the user and the assistant. While only the static analysis portion of the system is demonstrated in this work, the system is modular enough for dynamic and hybrid analysis tools to be incorporated

**Figure 7.** *Internals of MyCodeAnalyzer showing the flow of information throughout the system.*

using the PnP approach discussed in Section 4. This approach provides a more complete code analysis depending on the user's preferences.

### **5.2 Accessing information about the coding environment**

Two types of code-related information are accessed on the user's computer: code within the IDE and code from a Git repository (e.g., GitHub) currently opened in a web browser. The first type of information is important because it helps us to scan code being actively developed, while the second type is used in the case where the user would like to ensure that a repository is safe before forking it. MyCodeAnalyzer can detect GitHub pages that are open in a browser. On systems running MacOS, Applescript is used to communicate with the web browser. Other approaches will be employed in the future to reproduce this functionality on machines running other operating systems.

In order to access the user's computer to scan the code being worked on in the IDE or referenced in the browser, a methodology must be established to access this information in a minimally invasive manner. To do so, we created a plugin for a given IDE. Currently, we have plugins for IntelliJ IDEA and Eclipse. The plugin becomes a part of the IDE, monitors the code being developed, and updates a message queue (data file) with information about the code files and projects manipulated by the programmer. Also, special system calls are used to access any browser tabs that point to GitHub projects. A local web app in the form of a Spring MVC REST API [35] runs on the user's computer. The job of the local web app is to communicate with MyCodeAnalyzer by way of a tunnel in order to scan local code or GitHub projects displayed in the user's web browser.

#### *5.2.1 Accessing code within the IDE*

Listing 1 shows the Applescript code that is used to check for gui-based applications that are currently open on the user's computer. Following this is a snapshot of the corresponding output, which includes the Intellij IDEA IDE in the list. This Applescript code is added to the REST app where it is run on localhost and invoked by MyCodeAnalyzer to determine if the user is actively using an IDE. To further contextualize the process of determining which code the user would like to scan, it is also of interest to find out the *frontmost* or most active application on the user's computer. To do so, the code shown in Listing 2 was used. This code is expected to return a single application, which in turn allows MyCodeAnalyzer to return a more direct response to the user. For example, a response might be, "*Say IDE, if you would like me to scan the code that you are currently working on in IntelliJ"* instead of using indirect phrases such as " *… may be working on."*

set text item delimeters to ", " tell application "System Events" to (name of every process where background only is false) as text end tell

Listing 1. Applescript code used to list all gui-based applications that are currently running on the user's computer.

The following is a sample output generated using the code in Listing 1: "Google Chrome, Sublime Text, Terminal, idea, pycharm,Teams, Mail, teXShop, Notes, Spotify, Finder, Microsoft PowerPoint, X11.bin, AdobeReader, iTunes, Microsoft Excel,Script Editor, Activity Monitor, System Preferences, Safari, Preview"

*Conversational Code Analysis: The Future of Secure Coding DOI: http://dx.doi.org/10.5772/intechopen.98362*

Since most IDEs are standalone applications, we believe the best way to have access to the user's code in a minimally invasive manner is to be an "insider" (That is, to use a plugin that becomes part of the IDE). Consequently, the goal of the plugins was to monitor the code being developed by taking note of the coding project and the coding files being manipulated by the user. To accomplish this, listeners were added to the IDE to detect when the text editor portion of the IDE is active, when tabs are activated or switched, and when code files are edited. The message queue is updated with the following pieces of information when the aforementioned actions are performed: *ProjectName, ProjectLocation, CurrentFile, DateAdded, CurrentlyActive*. This queue is then queried for active files and projects when POST requests are made by the Google Assistant app to the local REST service running on the user's computer.

tell application "System Events" name of application processes whose frontmost is true end tell

Listing 2. Applescript code used to determine the most active application on a computer.

### *5.2.2 Accessing code referenced by tabs opened in the web browser*

Like IDEs, web browsers provide little to no way for outside tools to access their core areas. However, the Applescript-based techniques used previously for accessing the System Events utility can be used to access the tabs that are currently open in the web browser on the user's device. Listing 3 is used to retrieve tabs currently open in Google Chrome. This script can be modified to get tabs in other browsers such as FireFox or Safari. MyCodeAnalyzer then checks if any of the URLs point to valid public GitHub accounts, which are then searched for coding projects if the user requests that a scan of a Git project be performed.

```
set text item delimeters to ","
tell application "Google Chrome" to URL of tabs of every window
as text
end tell
```
Listing 3. Applescript code used to retrieve tabs currently open in Google Chrome.

### **6. Case study**

In this section, we present a case study that demonstrates an implementation of our proposed methodology. The main goal of this case study is to demonstrate the applicability of integrating a virtual assistant into a code analysis framework to allow the user to conversationally scan their code for vulnerabilities. The system is currently in a prototypical stage. Here we perform a scan of a coding project using the Google Assistant app via an Apple iPhone.

The following was done based on the proposed approach discussed in Section 5:

#### 1.Create a Google Assistant app

A Google Assistant app was created based on the intents depicted in **Figure 7**. Dialogflow, Google App Engine, and Google Actions Console are key

components in the design of the app. Once designed, the app was tested using the Google Actions API Simulator as well as released in alpha mode and tested on a smart phone running the Google Assistant.

#### **Figure 8.**

*A conversation between MyCodeAnalyzer and a human tester while scanning the OWASP WebGoat project.*

2.Create a local web app to interface with the Google Assistant app and the coding environment

The local web app was created using Spring Boot [35] and was launched on the computer via Apache Tomcat [36].

3.Create an IDE plugin for IntelliJ IDEA

Our IntelliJ IDEA plugin was created and installed in IntelliJ version 2020.3.2. The plugin was installed using the IntelliJ plugin installer, which installs a local plugin from a JAR (Java ARchive) file.

4.Choose and integrate a code analyzer

PMD [37] static code analyzer (version 6.31.0) was chosen for this study. PMD uses a rule-based system to find common programming flaws in code written in 8 programming languages, offering the most support for Java and Apex. The rules used by PMD are divided into categories such as best practices, error prone, and security. For this case study, a set of rules was selected from the error prone and security categories.

5.Chose a vulnerable project

The OWASP WebGoat [38] project was used to evaluate the system. WebGoat is an insecure application that allows researchers and developers to test vulnerabilities commonly found in Java-based applications that use common and popular open source components [38].

6.Test the system and report results

To integrate the Google Actions app with the local web app, Ngrok [39] was chosen as the tunneling tool. Ngrok is a tool that exposes local servers behind NATs and firewalls to the public Internet over secure tunnels [39].
