**3. Mathematics for forensic image processing**

Forensic can provide information security when the information source is not trusted. There are forgery detection algorithms to detect interplay between the actual and the modified details. Probability, linear algebra helps in learning forgery. Fourier transform infrared micro spectroscope is used in analysing traumatic brain injuries [16]. Hypothesis testing with local estimates help in identifying the unaltered and falsified images. Stochastic gradient descent approaches are also used in forensic editing detection. These are used as metadata is unreliable and multimedia forensic is found superior in such cases. In certain advance studies graph theory is also employed to image identification purposes in a better way. So now we will walk through the different aspects to have the essence of the approaches utilized in forensic sciences.

#### **3.1 Steganography**

The art of hidden writing is known as Steganography. Steganography is different from cryptography (the art of secret writing), which is utilized to make a message unreadable by the creator. Steganography is commercially used functions in the digital world, most notably digital watermarking. If someone else steals the file and claims the work as his or her own, the artist can later prove ownership because only he/she can recover the watermark [17, 18].

Kessler [19] quoted that a computer forensics examiner might suspect the use of steganography because of the nature of the crime, books in the suspect's library, the type of hardware or software discovered, large sets of seemingly duplicate images, statements made by the suspect or witnesses, or other factors. A website might be suspect by the nature of its content or the population that it serves. These same items might give the examiner clues to passwords, as well. And searching for steganography is not only necessary in criminal investigations and intelligence gathering operations. Forensic accounting investigators are realizing the need to search for steganography as this becomes a viable way to hide financial records which is discussed by [20].

#### **3.2 Discrete Fourier transform**

A discrete Fourier transform transforms a sequence of N complex numbers f g *xn* ≔ *x*1,*x*2, … *xN*�<sup>1</sup> to another sequence of complex numbers f g *Xr* ≔ *X*1,*X*2, … *XN*�<sup>1</sup> as,

*Mathematical Basics as a Prerequisite to Artificial Intelligence in Forensic Analysis DOI: http://dx.doi.org/10.5772/intechopen.108416*

$$X\_r = \sum\_{n=0}^{N-1} \mathbf{x}\_n e^{\frac{-2\pi n}{N}} = \sum\_{n=0}^{N-1} \mathbf{x}\_n \left( \cos \left( \frac{-2\pi rn}{N} \right) + i \sin \left( \frac{-2\pi rn}{N} \right) \right) \tag{30}$$

$$X = F(\mathfrak{x})\tag{31}$$

#### **3.3 Score-based likelihood**

Statistical methods are widely used for digital image forensic. A score-based likelihood ratio (SLR) for camera device identification is one amongst them. Score based likelihood is a deciding factor for selection of the similarities between the evidences such as two fingerprints obtained from a crime scene and the one from the suspect. Photo-response non-uniformity as a camera fingerprint and one minus the normalized correlation as a similarity score [21]. The procedure includes source identification problems and forgery and tampering detection problems. A camera device identification problem utilizes the SLR. It decides whether given an unknown digital image has been taken by a known camera device or not along with the strength and weakness of the evidence in favour of the decision. It helps in qualifying the weight of the evidence. SLRs fit probability density functions to both sets of scores. Both pdfs are evaluated at the score between the noise residual of the image in question and the camera fingerprint, and the SLR is the ratio of the results.

Let's discuss a simple example as,

Let *I*<sup>1</sup> be the image output from a camera that includes noise, and let *I*<sup>0</sup> be the perfect image without noise. Consider the camera fingerprint, as K, and denote all other noise components from the image as *ϕ*. Then the image output t *I*1can be modelled as,

$$I\_1 = I\_0 + I\_0 K + \phi \tag{32}$$

The true image K is not obtained practically so we estimate it using maximum likelihood estimator *K*^

*K*^ can be obtained by collecting first N images from the camera in discussion as *I* ð Þ<sup>1</sup> ,*I* ð Þ2 <sup>⋯</sup> � *I* ð Þ *N*

The noise residuals from each image can be computed utilizing a filter *F* using the formula

$$\mathbf{W}^{(i)} = I\_1^{(i)} - F\left(I\_1^{(i)}\right) \tag{33}$$

with *i* ¼ 1,2, … *N:*

Then the photo response non uniformity is given by the formula,

$$\hat{K} = \frac{\sum\_{i=0}^{N} \mathbf{W}^{(i)} I\_1^{(i)}}{\sum\_{i=0}^{N} \left( I\_1^{(i)} \right)^2} \tag{34}$$

For accuracy of similarity score, peak to correlation score is replaced instead of the normalized correlation computation. The former has a better decision threshold

stability [21]. Further a trace anchored score-based likelihood ratio could also be studied. SLR framework uses the hypothesis testing methodology to justify the conclusion based on the evidences received.

#### **3.4 Basics of neural network**

Neural network consists of artificial neurons connected in a network structure. Its structure is inspired by the human brain and learns from the data feed to it. It has an extensive approximation property. There are several types of neural networks namely, recurrent neural network, convolutional neural network, radial basis function neural network, feedforward neural network, modular neural network. A basic structure will be discussed here for getting the insight of its architecture. Single layer neural network in mathematical expressions is seen as

$$y\_1 = f(a\_{11}\mathbb{1}\_1 + a\_{12}\mathbb{1}\_2 + a\_{13}\mathbb{1}\_3) \tag{35}$$

The output *y*<sup>1</sup> is derived from the input *xj* using weights *α*1*<sup>j</sup>* described in a function form *j* ¼ 1 *to* 3 (**Figure 1**).

#### *3.4.1 Neural network with one hidden layer*

The neural network with one hidden layer having three inputs and one output with one hidden layer is given by,

**Figure 1.**

*Representation of a single layer neural network without hidden layer.*

*Mathematical Basics as a Prerequisite to Artificial Intelligence in Forensic Analysis DOI: http://dx.doi.org/10.5772/intechopen.108416*

$$\begin{aligned} y\_1^{(2)} &= \mathbf{g} \left( a\_{11}^{(1)} \mathbf{x}\_1 + a\_{12}^{(1)} \mathbf{x}\_2 + a\_{13}^{(1)} \mathbf{x}\_3 \right) \\ y\_2^{(2)} &= \mathbf{g} \left( a\_{21}^{(1)} \mathbf{x}\_1 + a\_{22}^{(1)} \mathbf{x}\_2 + a\_{23}^{(1)} \mathbf{x}\_3 \right) \\ y\_3^{(2)} &= \mathbf{g} \left( a\_{31}^{(1)} \mathbf{x}\_1 + a\_{32}^{(1)} \mathbf{x}\_2 + a\_{33}^{(1)} \mathbf{x}\_3 \right) \\ \mathbf{y}\_1^{(3)} &= \mathbf{g} \left( a\_{11}^{(2)} \mathbf{y}\_1^{(2)} + a\_{12}^{(2)} \mathbf{y}\_2^{(2)} + a\_{13}^{(2)} \mathbf{y}\_3^{(2)} \right) \end{aligned} \tag{36}$$

The output *y* ð Þ2 *<sup>j</sup>* is derived from the input *xj* � � using weights *α*ð Þ<sup>1</sup> *ij* n o described in a function form for*i*,*j* ¼ 1 *to* 3 . The superscript describes the level of layer applied. Then the final output *y* ð Þ3 <sup>1</sup> is computed from the information of hidden layer. Similarly, architecture of networks with two hidden layers can be generated (**Figure 2**).

Such structures are classified under single layer feed forward network. In multilayer feed forward networks, the inclusion of one or more hidden layers enables the network to be computationally more effective. Feedback networks take in output as its input and closes the loop to gain improvements in the procedure. There are other patterns as well to enhance the performances of the network.

Kingston [22] quoted that Simulation experiments with a type of neural network known as a Hopfield net indicate that it may have value for the storage of tool mark patterns (including bullet striation patterns) and for the subsequent retrieval of the matching pattern using another mark by the same tool for input.

A convolutional neural network is a deep learning algorithm which takes image as an input, assign importance to various aspects and is able to differentiate between the features. Its architecture is very similar to the neurons in human brain. An image which is a 3\*3 matrix entry is written as 9\*1 vector and is feed to a multilayer perceptron for classification. Digital forensic implements neural network for better analysis [23].

#### **Figure 2.**

*Visualization of the neural network with one hidden layer having three inputs and one output with one hidden layer.*

### **3.5 Basics of probability**

Forensic analysis is using probabilistic inferences and probabilistic reasoning. Probability can be termed as specialised facet of logical reasoning whereas statistics deals with collection and summary of data. It may be utilized in measuring. Forensic may include criminal proceedings with chemical analysis of suspicious substances and measurements of elemental compositions of glass fragments. Probability is a branch of mathematics which aims to conceptualise uncertainty and render it tractable to decision-making (Aitken-Roberts-Jackson). In the criminal justice context, the accused is either factually guilty or factually innocent, there is no other possibility. Hence, p(Guilty, G) + p(Innocent, I) = 1. Applying the ordinary rules of number, this further implies that p(G) = 1–p(I); and p(I) = 1–p(G). The probabilistic formulae are also used to measure levels of uncertainty associated with a particular estimate.

Probability is widely used in AI as it resolves around the data collection handling and analysing to reach conclusive outcomes. With the advance foreseen utility of AI in various fields like agriculture, engineering, demography, medicine, education, marketing and so on probability will be involved as the core of algorithms. An overview of few of the important basic concepts are included in this section.

Statistics will deal in data characterisation and analysis. It involves grouping and the choice of selection by hypothesis testing. Here an overview is included.

An experiment is a process of observation and measurement. Randomness is the key desired inclusion for the experiments in discussion. The subsets of a sample space are events.

If equally likely events having finite outcomes, then P(A) is the probability of event A as

$$P(A) = \frac{\text{Number of outcomes to the occurrence of an event A}}{\text{Total number of equally likely events}}$$

$$\mathbf{0} \le P(A) \le \mathbf{1}$$

Impossible event P(A)=0 and P(A)=1 is a certain event.

Conditional probability deals with the concept of events occurrence under the condition of another event already occurred. If the requirement to study the probability of event A under the circumstance with event B occurring already is given by

$$P\left(\frac{A}{B}\right) = \frac{P(A \cap B)}{P(B)}, P(B) \neq \mathbf{0}$$

### **3.6 Bayes theorem**

Bayes theorem is implemented in a variety of applications where conditional probability is involved. It is used in classification of occurrence of an event when the consideration of probability of occurrence of event is true is included without any specific evidence.

Bayes theorem states that

If we consider S to be a sample space which is subdivided into *C*1,*C*2, … ,*Ck* with *P C*ð Þ*<sup>i</sup>* 6¼ 0 for *i* ¼ 1,2, … ,*k:* Then for any arbitrary event A in S with *P A*ð Þ 6¼ 0 we have for *r* ¼ 1,2, … ,*k*,

*Mathematical Basics as a Prerequisite to Artificial Intelligence in Forensic Analysis DOI: http://dx.doi.org/10.5772/intechopen.108416*

$$P\left(\frac{C\_r}{A}\right) = \frac{P(C\_r \cap A)}{\sum\_{i=1}^k P(C\_i \cap A)} = \frac{P(C\_r)P\left(\frac{A}{C\_r}\right)}{\sum\_{i=1}^k P(C\_i)P\left(\frac{A}{C\_i}\right)}\tag{37}$$

Although the Bayes theorem is widely used but there are few limitations of the theorem as the requirement of mutually exclusive and exhaustiveness is not achievable practically. If the data is huge the computation is not feasible. In many problems the requirement of relevant conditional probability to be stable over time is also not achievable. Further often assuming the constant likelihood ratio in the denominator is not feasible in practical cases. Bayes theorem helps in making interpretations from the collected samples from the site of crime [24].

#### **3.7 Probability density functions**

Probability density function is widely used in the AI and Neural networks. Probability density function is used to define a range for a specific occurrence of a continuous random variable. The area under the curve represents the interval in which the continuous random variable will fall.

*f x*ð Þ is called a probability density function if,

$$\int\_{-\infty}^{\infty} f(\mathbf{x})d\mathbf{x} = \mathbf{1} \quad f(\mathbf{x}) \ge \mathbf{0} \quad \forall \, \mathbf{x} \tag{38}$$

Probability density function is different from probability mass function which will indicate whether there is a probability of a function to achieve a specific value and not a range as in the case of probability density function.

Probability density function of an image (Rafael C. GONZALES)

$$A = \sum\_{\mathbf{g}=0}^{255} m\_{l\_k}(\mathbf{g}+\mathbf{1})\tag{39}$$

*mIk* ð Þ *g* þ 1 represents the number of pixels in *Ik*(k denotes colour band of image *I* ) with value g. A denotes the number of pixels in *I*. The gray level probability density function of *Ik* is given by,

$$p\_{I\_k}(\mathbf{g} + \mathbf{1}) = \frac{1}{A} m\_{I\_k}(\mathbf{g} + \mathbf{1})\tag{40}$$

The use of probability lies in the starting point of neural network for making decisions. Casual and probabilistic decisions are made naturally in human brain but the machine needs to be trained using probability theory. Machine learning uses Bayes theorem. Bayes theorem relates marginal probability and conditional probabilities. Experiment outcomes termed as events are mutually exclusive if they do not occur simultaneously. Events are called exhaustive if the occurrence of them constitute the entire possibilities.

Mathematically events A and B are said to be mutually exclusive and exhaustive if

$$A \cap B = \mathcal{Q}$$

$$A \cup \mathcal{B} = \mathcal{S}$$

where S constitutes all possible outcomes of the sample space.

For further enhancement of the approaches many more algorithms are required which could be understood by starting with the ideas discussed above. Probability is utilized to understand the interpretations of forensic and law for concluding legal decisions [25].
