May Big Data Analysis Be Used to Diagnose Early Autism?

*Terje Solsvik Kristensen*

#### **Abstract**

In this paper, a technique for early autism identification is presented. Both a multilayered perceptron (MLP) neural network and a support vector machine (SVM) have been used for classification. Detection of early autism is important, since the prognosis to treat autism is then much better. The patterns of both methods to use have been extracted from high-performance liquid chromatography data in urine. The training samples consist of two types, one from normal children and one from children with autism. The classification rate has been estimated for both algorithms to about 80% or better. The algorithm that gave the best result was SVM. The program that we used to do the analysis we have developed in Java. A lot of work remains to improve the results and increase the recognition rate of the data. The parameter values used in both networks and also the configuration of the networks are not yet optimal. This could be solved by using a particle swarm optimization (PSO) method. We have not yet been using a deep learning network, for instance, a TensorFlow network to raise the classification rate of the different algorithms. We have not yet made a classification between different types of autism of the autism spectrum. All this belongs to future work.

**Keywords:** autism, HPLC spectra, MLP, SVM, PSO, TensorFlow

#### **1. Introduction**

Autism is usually diagnosed by a series of behavioral tests and symptoms [1]. Autism effects the information processing in the brain by changing how the nerve cells (neurons) and their synapses are connecting and organizing themselves. What is triggering this process is not yet understood. Globally, about 25 million people is estimated to suffer from autism. Autism is therefore a huge problem to solve. However, at the moment there is no known cure for it.

Suffering from autism may be identified early at an age of 5 months. However, a clear diagnosis is usually not possible before the children are one and half year or three years old. There seems to be a growing evidence that the earlier the behavioral therapies of autism are started, the better the chances are for the children to be able to live relatively normal lives when growing up.

Autism may be linked to metabolic abnormalities, and these metabolic changes may be detectable in the children's urine. By using high-performance liquid

chromatography (HPLC) spectral data [2], we have found that children of the autism group and the normal group seem to have distinct chemical fingerprints in their urine.

An early test may soon be a reality to identify children at risk for developing early autism. The urine of children with autism may have a certain chemical signature. This indicates also that there can be certain substances in the urine that may trigger the onset of autism [3, 4].

If we are able to develop a method to identify early autism by a chemical or statistical test rather by observing a full-blown behavior, we can start the treatment earlier. There also exist scientists that are linking autism with the production of toxins that may interfere with the brain development [5]. One compound that may be identified in the urine is N-methyl nicotinamide (NMND) which also has been associated with Parkinson's disease. There are also scientists that are arguing for that autism may be associated with metabolic products of certain bacteria that must be identified [6].

However, this work is on how to use a multi-layered perceptron (MLP) neural network [7, 8] and a support vector machine (SVM) [9], to classify between HPLC samples belonging to normal children and samples belonging to children suffering from autism.

The organization of the chapter is as follows: in Section 2, the data is described, and in Section 3 the feature extraction techniques used have been presented. Section 4 defines a MLP neural network and the algorithm used for training it is defined. In Section 5, the SVM network and training algorithm for such a network is presented and in Section 6 we present the results of both a small-scale experiment and a proof of concept experiment. Section 7 gives the conclusion. In Section 8, we present further work on how we may use more advanced technology to confirm and validate the results achieved in this chapter.

#### **2. The data**

The HPLC data was recording by a company Tipogen ltd. at Bergen Hightech Centre, Norway. The company went corrupt some years ago. These of the first experiment was based on 30 samples of urine spectra from both normal and autism children. The aim of this first experiment was to verify a so-called *proof of principle*. First, we want to find out if datamining based on machine learning algorithms could be used to verify autism using HPLC spectra of children [2]. **Figure 1** shows how HPLC spectra look like.

The first axis represents what is called "*retention time*." This represents the peak ID. The second axis represents the intentions or the "*peak area*." The data was delivered in a spread sheet format. One spread sheet for normal children and one spread sheet for children with autism.

#### **3. Feature extraction**

The sample length may vary for both control and autism data. This was not easy to handle in an adequate way early in the analysis. An example of a pattern generated from the data is given in **Figure 2**. Each sample has a specific number of peaks. The different numbers of peaks belonging to each sample makes the analysis more complicated and was an important parameter to be estimated in the recognition process.

#### **Figure 1.**

*An example of a HLPC spectrum of a child with autism.*

#### **Figure 2.**

*HPLC peak extraction is generating by the data patterns.*

#### **3.1 Pattern diagnostics**

To discriminate patterns acquired from healthy individuals and individuals affected from a disease is the most important aspect of a pattern diagnostic technique. Pattern diagnostic is based on the analysis of a huge amount of HPLC data [2] and may be used to find patterns to identify a disease.

Mass spectrometry (MS) data in the blood is an alternative method to be used [10, 11]. This method has been given promising results in detection of early cancer [12] and may also be used to show early autism. Mass spectrometry data consists of a set of m/z values (m is the atomic mass and z is the charge of the ion) and the corresponding relative intensities of all molecules present with that m/z ratio. The MS data of a chemical sample is thus an indication of the actual molecules. The data might therefore be used to predict the presence of a disease condition and distinguish it from a sample taken from a healthy individual.
