**Abstract**

We are proposing a new concept for new multimedia data processing techniques for varied multimedia sources. We address our work toward speech, video and images, handwriting and text documents. The methods and techniques will form a toolkit that can be used for different cases in a court of law in order to extract information from different multimedia sources. In addition to the data mentioned above, social media (e.g., Facebook and Twitter) provide multimedia data in new formats. Such data allow the investigator to identify and compare objects, events or persons based on data properties, including biometric features or more symbolic features that point to coincidences and anomalies. We continue our work of novel methods for forensic multimedia data analysis Part I by a description of related work and a proposal of the methods and techniques we are developing beyond the state of the art for handwriting, multimedia feature extraction, novelty detection, legal aspects and cloud computing. Then, we describe the tasks that should be solved by the system and the different multimedia data. We describe expected results of our proposed toolkit. Finally, we summarize the objective of our work.

**Keywords:** multimedia forensic data analysis, standardization of forensic data analysis, video and image enhancement, video analysis, image analysis, speech analysis, case-based reasoning, multimedia feature extraction, handwriting, twitter data analysis, novelty detection, legal aspects

### **1. Introduction**

The objective of this work is to provide novel methods and techniques for the analysis of forensic multimedia data. These methods and techniques should form a novel toolkit for automatic forensic multimedia data. The data modalities the proposed work is considering are images and videos, text, handwriting, speech and audio signals, social media data, log data and genetic data. The integration of methods for all these different data modalities in one toolkit should allow the cross-analysis of these data and the detection of events by interlinking between these data. The proposed methods will face on standard forensic tasks, e.g., identification of events, persons or groups, and device recognition. Together with the end users and the police forces, new standard tasks will be worked out during the project and will give a new input to the standardization aspect of forensic data analysis.

The proposed novel methods and techniques will consider all aspects of multimedia data analysis such as device identification and trustworthiness of the data, signal enhancement, preprocessing, feature extraction, signal and data analysis and interpretation.

*2.1.2 Beyond the state of the art*

*DOI: http://dx.doi.org/10.5772/intechopen.92548*

context.

considered.

writing matching.

*2.2.1 State of the art*

**133**

**2.2 Novelty detection**

rating the expert feedback when necessary.

*Novel Methods for Forensic Multimedia Data Analysis: Part II*

provide important cues for the identification of places.

When the documents in forensic applications are considered, the preprocessing step gains more importance due to the variety of artifacts and degradations that are generated intentionally to hide the evidences or happened because of the environmental conditions. The enhancement methods should be carefully applied in order not to remove important cues while reducing the noise. The methods that will be developed will be adaptive and will learn from a large number of data by incorpo-

Inspired by cognitive studies that have observed the human tendency to read whole words at a time [11], in recent studies word-spotting techniques have been proposed as an alternative to character-based systems in the word retrieval literature [12–16]. As a novel direction in forensic handwriting identification, to be able to work in the large scale, we will follow the direction in word spotting and describe and match the words rather than characters. Generic image features will be used to describe the word images as a whole. This will enable us to capture the writing habits and styles of individuals and also the character variations in a different

Going beyond the writer identification and verification, recognition of printed and handwritten text from documents (such as letters and notes) and the text in the photographs taken in the environment (such as the labels of shops, buildings or streets and advertisements on boards) will be another issue that will be considered. While the commercially available optical character recognition systems are very successful for printed documents, recognition of handwritten text continues to be a challenge [17, 18]. More importantly, the recognition of words in unconstrained settings or "in the wild" is still an interesting problem [19]. Besides scene and object detection and recognition methods that will be developed, recognition of text in the images, ranging from license plates to shop labels and even the text on clothes, will

Both identification and recognition will be attacked similar to generic object recognition, and word images will be described using advanced image descriptors to be able recognize and identify text in multi-author multi-language cases. Besides the SIFT [20]-based and k-AS [21]-based features that we considered in our previous studies [22–24], SURF [25], FREAK [26] and Shape Context [27]-based features will be adopted for word description. Together with the statistical analysis of the occurrence of some features, the spatial layout of extracted features will also be

Large volumes of data will be exploited through data mining techniques in order

to learn the in-class similarities and between-class differences. Novel similarity measures will be developed. The main goal will be to provide the sufficient amount of data to the experts to be validated. The main challenge will be not to miss any important data while reducing the huge volume. Therefore, the methods that will be developed for similarity should be both robust and fast. We will benefit from expertise of partners in different areas to design new similarity measures for hand-

The aim of novelty detection is to recognize inputs that cannot be properly represented by information provided by previous inputs (i.e., a nominal distribution). Recognizing that an input differs in some respect from previous inputs is a

The chapter is continuation of Part I of novel methods for forensic multimedia data analysis [1]. The main aspects of multimedia forensic data analysis, the background and the system architecture are described over in Part I. Related work and the proposed methods and techniques that go beyond the state of the art for video, images, speech, multimedia feature extraction, text and Twitter data analysis are also described in Part I. Here we are continuing our description of related work and the proposed methods and techniques that go beyond the state of the art for handwriting, novelty detection, law aspects and cloud computing in Section 2. Task system has to be solved based on the different multimedia data, for case-based reasoning and legal aspects, which are explained in Section 3. In Section 4, we describe what kind of results we expect from our system and our work. Finally, we give conclusions in Section 5.

### **2. Related work to be continued from Part I**

### **2.1 Handwriting recognition**

### *2.1.1 State of the art*

Criminologists and forensic document examiners have been investigating the clues to identify the handwritings for decades [2]. However, identification by human experts has the drawbacks of nonobjective measurements and nonreproducible decisions besides the cost of expert training. Recent attempts in computer-supported handwriting identification aim to address the challenge of doing this task using computer vision and pattern recognition techniques. The Forensic Information System for Handwriting (FISH) system developed by German law enforcement is followed by more recent ones including WANDA and CEDAR-FOX systems [3–5].

As an initial step to identification, the documents are required to be processed in order to handle the noise and for enhancement [6]. Then some similarity measures have to be defined to identify all the writings of a person as same and to differentiate it from others' writings [7]. The writer identification systems make use of the individuality of handwritings and try to capture the characteristics of handwriting in a macro level by the style of handwriting and in a micro level by the shapes of the characters.

The literature is mostly dominated by character-based systems. HCLUS prototype matching techniques [3], dynamic time warping-based techniques [8, 9] and structural features [10] have been recently proposed for matching allographs (prototypical character shapes). The disadvantage of character-based systems is the requirement for character segmentation and the character classification (whether the character is y or g, etc.) prior to finding the similarities and dissimilarities on a specific character. However, both tasks are challenging and error-prone when free handwriting is the issue. The shape of the characters may change with the size of writing or with pen size and type. The characters can be written differently in a different context, i.e., in different words, and that requires the consideration of the preceding and following characters. Most importantly, it is very difficult to apply those systems to large datasets.

### *2.1.2 Beyond the state of the art*

The proposed novel methods and techniques will consider all aspects of multimedia data analysis such as device identification and trustworthiness of the data, signal enhancement, preprocessing, feature extraction, signal and data analysis and

The chapter is continuation of Part I of novel methods for forensic multimedia data analysis [1]. The main aspects of multimedia forensic data analysis, the background and the system architecture are described over in Part I. Related work and the proposed methods and techniques that go beyond the state of the art for video, images, speech, multimedia feature extraction, text and Twitter data analysis are also described in Part I. Here we are continuing our description of related work and the proposed methods and techniques that go beyond the state of the art for handwriting, novelty detection, law aspects and cloud computing in Section 2. Task system has to be solved based on the different multimedia data, for case-based reasoning and legal aspects, which are explained in Section 3. In Section 4, we describe what kind of results we expect from our system and our work. Finally, we

Criminologists and forensic document examiners have been investigating the clues to identify the handwritings for decades [2]. However, identification by

nonreproducible decisions besides the cost of expert training. Recent attempts in computer-supported handwriting identification aim to address the challenge of doing this task using computer vision and pattern recognition techniques. The Forensic Information System for Handwriting (FISH) system developed by German

As an initial step to identification, the documents are required to be processed in order to handle the noise and for enhancement [6]. Then some similarity measures have to be defined to identify all the writings of a person as same and to differentiate it from others' writings [7]. The writer identification systems make use of the individuality of handwritings and try to capture the characteristics of handwriting in a macro level by the style of handwriting and in a micro level by the shapes of the

The literature is mostly dominated by character-based systems. HCLUS prototype matching techniques [3], dynamic time warping-based techniques [8, 9] and structural features [10] have been recently proposed for matching allographs (prototypical character shapes). The disadvantage of character-based systems is the requirement for character segmentation and the character classification (whether the character is y or g, etc.) prior to finding the similarities and dissimilarities on a specific character. However, both tasks are challenging and error-prone when free handwriting is the issue. The shape of the characters may change with the size of writing or with pen size and type. The characters can be written differently in a different context, i.e., in different words, and that requires the consideration of the preceding and following characters. Most importantly, it is very difficult to apply

human experts has the drawbacks of nonobjective measurements and

law enforcement is followed by more recent ones including WANDA and

interpretation.

*Digital Forensic Science*

give conclusions in Section 5.

**2.1 Handwriting recognition**

CEDAR-FOX systems [3–5].

those systems to large datasets.

characters.

**132**

*2.1.1 State of the art*

**2. Related work to be continued from Part I**

When the documents in forensic applications are considered, the preprocessing step gains more importance due to the variety of artifacts and degradations that are generated intentionally to hide the evidences or happened because of the environmental conditions. The enhancement methods should be carefully applied in order not to remove important cues while reducing the noise. The methods that will be developed will be adaptive and will learn from a large number of data by incorporating the expert feedback when necessary.

Inspired by cognitive studies that have observed the human tendency to read whole words at a time [11], in recent studies word-spotting techniques have been proposed as an alternative to character-based systems in the word retrieval literature [12–16]. As a novel direction in forensic handwriting identification, to be able to work in the large scale, we will follow the direction in word spotting and describe and match the words rather than characters. Generic image features will be used to describe the word images as a whole. This will enable us to capture the writing habits and styles of individuals and also the character variations in a different context.

Going beyond the writer identification and verification, recognition of printed and handwritten text from documents (such as letters and notes) and the text in the photographs taken in the environment (such as the labels of shops, buildings or streets and advertisements on boards) will be another issue that will be considered. While the commercially available optical character recognition systems are very successful for printed documents, recognition of handwritten text continues to be a challenge [17, 18]. More importantly, the recognition of words in unconstrained settings or "in the wild" is still an interesting problem [19]. Besides scene and object detection and recognition methods that will be developed, recognition of text in the images, ranging from license plates to shop labels and even the text on clothes, will provide important cues for the identification of places.

Both identification and recognition will be attacked similar to generic object recognition, and word images will be described using advanced image descriptors to be able recognize and identify text in multi-author multi-language cases. Besides the SIFT [20]-based and k-AS [21]-based features that we considered in our previous studies [22–24], SURF [25], FREAK [26] and Shape Context [27]-based features will be adopted for word description. Together with the statistical analysis of the occurrence of some features, the spatial layout of extracted features will also be considered.

Large volumes of data will be exploited through data mining techniques in order to learn the in-class similarities and between-class differences. Novel similarity measures will be developed. The main goal will be to provide the sufficient amount of data to the experts to be validated. The main challenge will be not to miss any important data while reducing the huge volume. Therefore, the methods that will be developed for similarity should be both robust and fast. We will benefit from expertise of partners in different areas to design new similarity measures for handwriting matching.

### **2.2 Novelty detection**

### *2.2.1 State of the art*

The aim of novelty detection is to recognize inputs that cannot be properly represented by information provided by previous inputs (i.e., a nominal distribution). Recognizing that an input differs in some respect from previous inputs is a

very important capability of a learning system. In classification problems novelty detection is particularly useful when a relevant class is under-represented in the data, so that a classifier cannot be trained to reliably recognize that class or when hierarchical classifiers trained on different concept information disagree on the output.

• The availability of labeled data for training/validation of models used by

• Often, the data contain noise that tends to be similar to the actual anomalies

We will consider novelty detection as a CBR problem [34]. The CBR-based novelty detection will consist of successively adapting or evolving the previously obtained solutions, taking into account the data properties, the user's needs and any other prior knowledge into account. We will use a combination of statistical and similarity-based methods as the solution to the problems underlying the CBR methodology. Our proposed scheme differs from existing methodologies on novelty detection [29, 30, 39–41] since it can perform simultaneously novelty detection and

The normative compliance of the methodologies and tools proposed by the project will be assessed by reference to the European and national legal framework

Data protection is a fundamental right in Europe, enshrined in Article 8 of the Charter of Fundamental Rights of the European Union as well as in Article 16(1) of

The central legislative instrument for the protection of personal data in Europe is

work concerning data protection, more consistent with changes in the single market and with stronger needs to ensure security of European citizens, since 2009, the European Commission launched public consultations on data protection2 and engaged

published a communication on a comprehensive approach on personal data protection in the European Union3 that sets out the main themes of the reform. "After assessing

strong and consistent legislative framework across Union policies, enhancing individuals' rights, the Single Market dimension of data protection and cutting red tape for

<sup>1</sup> Directive 95/46/EC on the protection of individuals with regard to the protection of personal data and

<sup>5</sup> See p. 4, final COMMUNICATION FROM THE COMMISSION TO THE EUROPEAN PARLIAMENT, THE COUNCIL, THE EUROPEAN ECONOMIC AND SOCIAL COMMITTEE AND THE COMMITTEE OF THE REGIONS Safeguarding Privacy in a Connected World A European Data Protection Framework

<sup>2</sup> Two public consultations have been launched on the data protection reform: one from July to December 2009 (http://ec.europa.eu/justice/news/consulting\_public/news\_consulting\_0003\_en.htm) and a second one from November 2010 till January 2011 (http://ec.europa.eu/justice/news/consulting\_

. The Commission proposes that the new framework should consist of:

in intensive dialog with stakeholders. On 4 November 2010, the Commission

. In order to come to a complete revision of the entire frame-

, the European Commission is now proposing a

anomaly detection techniques is usually a major problem.

and hence is difficult to distinguish and remove.

*Novel Methods for Forensic Multimedia Data Analysis: Part II*

*DOI: http://dx.doi.org/10.5772/intechopen.92548*

handling and also considers the incremental nature of the data.

the Treaty on the Functioning of the European Union (TFEU).

*2.2.2 Beyond the state of the art*

**2.3 Legal aspects**

*2.3.1 State of the art*

the EU's 1995 Directive1

businesses"<sup>5</sup>

<sup>3</sup> COM(2010)609.

**135**

on data protection and privacy [42].

the impacts of different policy options4

public/news\_consulting\_0006\_en.htm).

<sup>4</sup> The Impact Assessment SEC(2012)72.

for the 21st Century Brussels, 25.1.2012 COM(2012) 9.

on the free movement on such data, OJ L 281, 23.11.1995, p. 31.

The goal of novelty detection is twofold: to be as accurate as possible in detecting inputs which do deviate from the nominal distribution (true positives) and to predict how many normal inputs will be erroneously flagged as positives (false positives). Novelty detection is also known as one-class classification [28] or learning from only positive (or only negative) examples. The standard approach has been to assume that novelties are outliers with respect to the nominal distribution and to build a novelty detector by estimating a level set of the nominal density. This approach allows fixing a threshold for acceptance of new data while having a degree of control over the number of false alarms raised. Using this framework, novelty detection can be interpreted as a binary classification problem. Several approaches have been utilized to tackle this problem: statistical methods, neural networks and support vector method approaches (see [29–32], for good reviews of these techniques). Bayesian methods have been used to provide a nonparametric estimation of the probability distribution [33], and content-based reasoning (CBR), based on Bayesian decision theory, has also been utilized. A common drawback of all these approaches is the assumption that novelties are uniformly distributed on the support of the nominal distribution, which is not true in most cases, mainly when the feature space dimension is high.

Novelty detection has been already applied with success on single modalities of the forensic multimedia data. For instance, the detection and classification of abnormal events in a surveillance video has been studied in [34–36]. In [33], novelty detection is applied in online document clustering, and in [37], novelty detection is applied on image sequences.

A new and promising approach to novelty detection in audiovisual data has been proposed in [38]. In this approach, the novelty detection is not the negative output of multiple classifiers but the disagreement of several concept hierarchical classifiers trained from different but hierarchically related concept. Here, the novelty is represented not by a fully new item but by relevant changes from previous seen items. In forensic multimedia data, this situation is very common when only one modality is affected.

According to [33], several factors make the novelty detection problem very challenging:

