*4.3.1 State of the art*

Most of computer vision algorithms rely on the extraction of meaningful features that transform raw data values into a more significant representation, better suited for classification and recognition. Although considered often not a central problem, the quality of feature representation can have critically important implications for the performance of the subsequent recognition methods.

Features are usually defined and selected according to a problem-oriented strategy, that is, ad hoc in light of the information considered relevant for the task at hand. In forensics, a plethora of features have been defined for the automated solutions to different problems, such as face detection, retrieval and recognition in video and images [58–60], individual people tracking over video sequences [61, 62], recognition of different biometric parameters (ear, gait, and iris) in images or videos [63, 64], speaker identification in audio signals, suspicious word detection, and handwriting recognition in text document.

**79**

*Novel Methods for Forensic Multimedia Data Analysis: Part I*

Main challenges in forensics scenarios regard the unconstrained conditions in which multimedia data are collected. For audio signals, this is usually in the form of channel distortion and/or ambient noise. For videos and images, problems arise from changes in the illumination direction and/or in the pose of the subjects, occlu-

For images and videos, according to the problem at hand, the features selected can be based on specific morphologic parameters of individuals, such as face characteristics (e.g., nose width and eye distance) [65], posture and gesture, ear details, and so on or on general appearance features computed with low-level descriptors. These descriptors can be either global or local and can exhibit different degrees of invariance. Global descriptor category includes features based on Principal Component Analysis (PCA) [66] and Linear Discriminant Analysis (LDA) [67]. The local descriptor category is currently spreading and comprises features based on local values of color, intensity, or texture. To this category belong Scale-Invariant Feature Transform (SIFT) [68], Local Binary Pattern (LBP) [69], Histograms of Oriented Gradients (HOG) [70], or Gabor wavelets [71]. LBP is a well-known texture descriptor and a successful local descriptor robust to local illumination variations [72]. LBP descriptors are compact and easy to compare by various histogram metrics. In addition, there are many LBP variants that improve the description performance; among these, the most popular is Multi-Scale LBP (MSLBP) [73]. HOG has been successfully applied to tasks such as human detection [70] and face recognition [74]. Similar to LBP, edge information captured by gradients within blocks is packed into a histogram. Discarding pixel location information by blockbased histogram binning, LBP and HOG gain invariance to local changes such as small facial expressions and pose variations in pedestrian images. The Gabor wavelets are also successful descriptors that capture global shape information centered at a pixel [75]. The convolution of multiple Gaussian-like kernels with different scales and orientations captures information insensitive to expression variation and blur at a pixel's location. Recently, a generalization of the Pairs of Pixels (POP) descriptor, called Centre Symmetric-Pairs of Pixels (CCS-POP), has been presented for face identification [76]. Another line of research currently gaining attention regards the computation of biologically inspired descriptors that result from the attempt to mimic natural visual systems. Several works have shown interesting results in a

variety of different face and object recognition contexts [77–79].

localizations or part visibility [62].

tions and (b) feature extraction from texts.

*4.3.2 Beyond the state of the art*

The approach based on local descriptors has recently gained popularity, especially in relation to the spreading of the bag-of-feature representation. Indeed, in this frame, local feature descriptors, which can achieve high robustness with respect to appearance variations, are employed to develop a bag of descriptors that represent image content. All such descriptors are, then, quantized using learned visual words to facilitate the retrieval or classification [80–83]. The approach seems promising in forensic scenarios to fit the high variation of object appearance across different views since some very informative local features can accommodate to bad

The problem of automatically extracting relevant information out of the enormous and steadily growing amount of electronic text data is becoming much more pressing. To overcome this problem, various technologies for information management systems have been explored within the Natural Language Processing (NLP) community. Two promising lines of research are represented by the investigation and development of technologies for (a) ontology learning from document collec-

*DOI: http://dx.doi.org/10.5772/intechopen.92167*

sions, aging, and so on.

*Digital Forensic Science*

of the different types of similarity measures is another challenging topic. Specific knowledge for the different types of data such as text [40, 41], images [42–44], video [45], 1-D signals, and meta-learning [36] is required in this work. The development of new similarity measures for multimedia data types and new data representations and ontologies will be done. A complex CBR system that can handle so many

Retrieval of multimedia data from a case base can be refined by relevance feedback mechanisms [46–52]. The user is asked to mark retrieved results as being "relevant" or not with respect to his/her interests. Then, feature weights and the similarity measures are suitably adapted to reflect user's interests. Relevance feedback can be implemented in a number of ways, for example, as the solution of an optimization problem, or as a classification problem. According to the problem at hand, the most suited formulation has to be devised. Thus, the main challenge will be to formulate the relevance feedback problem for forensic applications, so that the

Research has been described for learning of feature weights and similarity measures [53–55]. Case mining from raw data in order to get more generalized cases has been described by Jaenichen and Perner [56]. Learning of generalized cases and the hierarchy over the case base has been presented by the authors of Refs. [45, 57]. These works demonstrate that the system performance can be significantly improved

New techniques for learning of feature weights and similarity measures and case generalization for different multimedia types are necessary and will be developed

The partner IBAI has a number of national and international patents that protect their work on CBR for images and signals. It is to expect that new methods will be developed that can be protected by patents and can ensure the international compe-

Most of computer vision algorithms rely on the extraction of meaningful features that transform raw data values into a more significant representation, better suited for classification and recognition. Although considered often not a central problem, the quality of feature representation can have critically important impli-

Features are usually defined and selected according to a problem-oriented strategy, that is, ad hoc in light of the information considered relevant for the task at hand. In forensics, a plethora of features have been defined for the automated solutions to different problems, such as face detection, retrieval and recognition in video and images [58–60], individual people tracking over video sequences [61, 62], recognition of different biometric parameters (ear, gait, and iris) in images or videos [63, 64], speaker identification in audio signals, suspicious word detection, and

cations for the performance of the subsequent recognition methods.

The question of the Life Cycle of a CBR system goes along with the learning capabilities, case base organization and maintenance mechanism, standardization, and software engineering for which new concepts should be developed. As the result, we should come up with generic components for a CBR system for multimedia data analysis and interpretation that form a set of modules that can be easily integrated and updated into the CBR architecture. The CBR system architecture

different data types, similarities, and data sources is a novelty.

search is driven toward the cases more relevant to the case at hand.

should easily allow configuring modules for new arising task.

tition of European entities on CBR systems.

handwriting recognition in text document.

**4.3 Multimedia feature extraction**

*4.3.1 State of the art*

by these functions of a CBR system.

for these tasks.

**78**

Main challenges in forensics scenarios regard the unconstrained conditions in which multimedia data are collected. For audio signals, this is usually in the form of channel distortion and/or ambient noise. For videos and images, problems arise from changes in the illumination direction and/or in the pose of the subjects, occlusions, aging, and so on.

For images and videos, according to the problem at hand, the features selected can be based on specific morphologic parameters of individuals, such as face characteristics (e.g., nose width and eye distance) [65], posture and gesture, ear details, and so on or on general appearance features computed with low-level descriptors. These descriptors can be either global or local and can exhibit different degrees of invariance. Global descriptor category includes features based on Principal Component Analysis (PCA) [66] and Linear Discriminant Analysis (LDA) [67]. The local descriptor category is currently spreading and comprises features based on local values of color, intensity, or texture. To this category belong Scale-Invariant Feature Transform (SIFT) [68], Local Binary Pattern (LBP) [69], Histograms of Oriented Gradients (HOG) [70], or Gabor wavelets [71]. LBP is a well-known texture descriptor and a successful local descriptor robust to local illumination variations [72]. LBP descriptors are compact and easy to compare by various histogram metrics. In addition, there are many LBP variants that improve the description performance; among these, the most popular is Multi-Scale LBP (MSLBP) [73]. HOG has been successfully applied to tasks such as human detection [70] and face recognition [74]. Similar to LBP, edge information captured by gradients within blocks is packed into a histogram. Discarding pixel location information by blockbased histogram binning, LBP and HOG gain invariance to local changes such as small facial expressions and pose variations in pedestrian images. The Gabor wavelets are also successful descriptors that capture global shape information centered at a pixel [75]. The convolution of multiple Gaussian-like kernels with different scales and orientations captures information insensitive to expression variation and blur at a pixel's location. Recently, a generalization of the Pairs of Pixels (POP) descriptor, called Centre Symmetric-Pairs of Pixels (CCS-POP), has been presented for face identification [76]. Another line of research currently gaining attention regards the computation of biologically inspired descriptors that result from the attempt to mimic natural visual systems. Several works have shown interesting results in a variety of different face and object recognition contexts [77–79].

The approach based on local descriptors has recently gained popularity, especially in relation to the spreading of the bag-of-feature representation. Indeed, in this frame, local feature descriptors, which can achieve high robustness with respect to appearance variations, are employed to develop a bag of descriptors that represent image content. All such descriptors are, then, quantized using learned visual words to facilitate the retrieval or classification [80–83]. The approach seems promising in forensic scenarios to fit the high variation of object appearance across different views since some very informative local features can accommodate to bad localizations or part visibility [62].

### *4.3.2 Beyond the state of the art*

The problem of automatically extracting relevant information out of the enormous and steadily growing amount of electronic text data is becoming much more pressing. To overcome this problem, various technologies for information management systems have been explored within the Natural Language Processing (NLP) community. Two promising lines of research are represented by the investigation and development of technologies for (a) ontology learning from document collections and (b) feature extraction from texts.

Ontology learning is concerned with knowledge acquisition from texts as a basis for the construction of ontologies, that is, an explicit and formal specification of the concepts of a given domain and of the relations holding between them; the learning process is typically carried out by combining NLP technologies with machine learning techniques. Buitelaar [84] organized the knowledge acquisition process into a "layer cake" of increasingly complex subtasks, ranging from terminology extraction and synonym acquisition to the bootstrapping of concepts and of the relations linking them. Term extraction is a prerequisite for all aspects of ontology learning from text: measures for termhood assessment range from raw frequency to Information Retrieval measures such as TF-IDF, up to more sophisticated measures [85–88]. The dynamic acquisition of synonyms from texts is typically carried out through clustering techniques and lexical association measures [89, 90]. The most challenging research area in this domain is represented by the identification and extraction of relationships between concepts (taxonomical ones but not only); this research area presents strong connections with the extraction of relational information from texts, both relations and events (see below).

With feature extraction, we refer to the task of automatically identifying in texts instances of semantic classes defined in an ontology. This task includes recognition and semantic classification of items representing the domain referential entities ("Named Entity Recognition" or NER), either "named entities" or any kind of word or expression that refers to a domain-specific entity. Recently, extraction of interentity relational information is becoming a crucial task: relations to be extracted range from "place\_of", "author\_of," etc. to specific events, where entities take part in with usually predefined roles ("Relation Extraction"). Currently, there exist several feature extraction approaches, addressing different requirements, operating in different domains and on different text types, and extracting different information bits. If we look at the type of the underlying extraction methodology, systems can be classified into the following classes:


Depending on nature and depth of the features to be extracted, different amounts of linguistic knowledge must be resorted to. This means that type and role of the linguistic analysis differ from one system to another. The condition part of feature extraction rules may check the presence of a given lexical item, the syntactic category of words in context, and their syntactic dependencies. Different clues such as typographical features, relative position of words, or even coreference relations can also be exploited. Most feature extraction systems therefore involve linguistic text processing and semantic knowledge: segmentation into words, morphosyntactic tagging, (either shallow or full) syntactic analysis, and sometimes even lexical disambiguation, semantic tagging, or anaphora resolution.

**81**

*Novel Methods for Forensic Multimedia Data Analysis: Part I*

Text analysis can be carried out either at the preprocessing stage or as part of the feature extraction process. In the former case, the whole text is first analyzed. The analysis is global in the sense that items that are spread all over the document can contribute to build the normalized and enriched representation of the text. Then, the feature extraction process operates on the enriched representation of the text. In the latter case, text analysis is driven by the process of verifying a specific condition. The linguistic analysis is local, focuses on the context of the triggering item associated with a specific feature, and fully depends on the conditions to be checked

Different approaches to feature extraction will be investigated to assess their strength and effectiveness to detect and describe the multimedia data content relevant to forensic activities. Both biometric features and local informative descriptors will be studied and collected to create a range of different opportunities to describe multimedia data content. More precisely, low level, local, invariant descriptors will be explored to assure a good performance of detection algorithms, especially for recognition in the wild, whereas global biometric features and properties will be considered as high-level information that is better understandable by end users. A formal model will be adopted to define the features of different kinds. This will result into an ontological model that will organize different classes of features and foster their sharing and reuse. This will be a very innovative result since the ontology will be general and will approach the domain of multimedia data analysis. It will go further current metadata standards such as MPEG 7 or 21 and will be much more comprehensive and specific than other existing ontologies, which are only partially focused on feature extraction and always aimed at other problems such as multimedia data annotation. Additionally, the ontology will be enriched with algorithms to compute the features included, resulting into a toolbox for

feature extraction. This will be another very innovative result.

As far as feature extraction from texts is concerned, the main challenge is represented by the typology of texts to be dealt with, testifying noncanonical

Twitter is a new multimedia communication channel that is rapidly gaining popularity and users, yet police forces do not dispose of adequate methods to analyze the large amounts of textual data that are generated each day. Recently, several retrospective investigations concerning football riots revealed that Twitter was actively used by rivaling gang members to plan their assaults. Twitter data are hard to analyze because the text fragments are very short, multiple persons can be involved in a conversation about various topics, and the data are rapidly changing. Twitter is a recently introduced microblogging and information sharing platform [91] with over 140 million users and 340 million tweets per day. In the past, several studies have been dedicated to analyzing twitter feeds, for example, in the field of opinion mining and sentiment analysis. For example, in Ref. [92], the authors analyzed the text content of daily Twitter feeds by two mood tracking tools: OpinionFinder, which measures positive versus negative mood, and Google-Profile of Mood States (GPOMS), which measures mood in terms of six dimensions (Calm, Alert, Sure, Vital, Kind, and Happy). They cross-validated the resulting mood time series by comparing their ability to detect the public's response to the presidential election and thanksgiving day in 2008. Ratkiewicz et al. [93] used machine learning for analyzing politically motivated individuals and organizations that use multiple

*DOI: http://dx.doi.org/10.5772/intechopen.92167*

for that feature.

language usages.

**4.4 Text mining**

*4.4.1 State of the art*

*Digital Forensic Science*

be classified into the following classes:

structured text documents;

Ontology learning is concerned with knowledge acquisition from texts as a basis for the construction of ontologies, that is, an explicit and formal specification of the concepts of a given domain and of the relations holding between them; the learning process is typically carried out by combining NLP technologies with machine learning techniques. Buitelaar [84] organized the knowledge acquisition process into a "layer cake" of increasingly complex subtasks, ranging from terminology extraction and synonym acquisition to the bootstrapping of concepts and of the relations linking them. Term extraction is a prerequisite for all aspects of ontology learning from text: measures for termhood assessment range from raw frequency to Information Retrieval measures such as TF-IDF, up to more sophisticated measures [85–88]. The dynamic acquisition of synonyms from texts is typically carried out through clustering techniques and lexical association measures [89, 90]. The most challenging research area in this domain is represented by the identification and extraction of relationships between concepts (taxonomical ones but not only); this research area presents strong connections with the extraction of relational information from texts, both relations and events (see below). With feature extraction, we refer to the task of automatically identifying in texts instances of semantic classes defined in an ontology. This task includes recognition and semantic classification of items representing the domain referential entities ("Named Entity Recognition" or NER), either "named entities" or any kind of word or expression that refers to a domain-specific entity. Recently, extraction of interentity relational information is becoming a crucial task: relations to be extracted range from "place\_of", "author\_of," etc. to specific events, where entities take part in with usually predefined roles ("Relation Extraction"). Currently, there exist several feature extraction approaches, addressing different requirements, operating in different domains and on different text types, and extracting different information bits. If we look at the type of the underlying extraction methodology, systems can

• rule-based systems, using hand-crafted rules. Rule-based systems are particularly appropriate for dealing with documents showing very regular patterns, such as standard tables of data, Web pages with HTML markup, or highly

• systems incorporating supervised machine learning: an alternative to the timeconsuming process of hand coding of detailed and specific rules is represented by supervised semantic annotation systems, which learn feature extraction

• systems using unsupervised machine learning: they represent a viable alternative, currently being explored in different systems, to supervised machine learning approaches, as they dispense with the need for training data whose

Depending on nature and depth of the features to be extracted, different amounts of linguistic knowledge must be resorted to. This means that type and role of the linguistic analysis differ from one system to another. The condition part of feature extraction rules may check the presence of a given lexical item, the syntactic category of words in context, and their syntactic dependencies. Different clues such as typographical features, relative position of words, or even coreference relations can also be exploited. Most feature extraction systems therefore involve linguistic text processing and semantic knowledge: segmentation into words, morphosyntactic tagging, (either shallow or full) syntactic analysis, and sometimes even lexical

rules from a collection of previously annotated documents; and

production may be as time consuming as rule hand coding.

disambiguation, semantic tagging, or anaphora resolution.

**80**

Text analysis can be carried out either at the preprocessing stage or as part of the feature extraction process. In the former case, the whole text is first analyzed. The analysis is global in the sense that items that are spread all over the document can contribute to build the normalized and enriched representation of the text. Then, the feature extraction process operates on the enriched representation of the text. In the latter case, text analysis is driven by the process of verifying a specific condition. The linguistic analysis is local, focuses on the context of the triggering item associated with a specific feature, and fully depends on the conditions to be checked for that feature.

Different approaches to feature extraction will be investigated to assess their strength and effectiveness to detect and describe the multimedia data content relevant to forensic activities. Both biometric features and local informative descriptors will be studied and collected to create a range of different opportunities to describe multimedia data content. More precisely, low level, local, invariant descriptors will be explored to assure a good performance of detection algorithms, especially for recognition in the wild, whereas global biometric features and properties will be considered as high-level information that is better understandable by end users.

A formal model will be adopted to define the features of different kinds. This will result into an ontological model that will organize different classes of features and foster their sharing and reuse. This will be a very innovative result since the ontology will be general and will approach the domain of multimedia data analysis. It will go further current metadata standards such as MPEG 7 or 21 and will be much more comprehensive and specific than other existing ontologies, which are only partially focused on feature extraction and always aimed at other problems such as multimedia data annotation. Additionally, the ontology will be enriched with algorithms to compute the features included, resulting into a toolbox for feature extraction. This will be another very innovative result.

As far as feature extraction from texts is concerned, the main challenge is represented by the typology of texts to be dealt with, testifying noncanonical language usages.

### **4.4 Text mining**

### *4.4.1 State of the art*

Twitter is a new multimedia communication channel that is rapidly gaining popularity and users, yet police forces do not dispose of adequate methods to analyze the large amounts of textual data that are generated each day. Recently, several retrospective investigations concerning football riots revealed that Twitter was actively used by rivaling gang members to plan their assaults. Twitter data are hard to analyze because the text fragments are very short, multiple persons can be involved in a conversation about various topics, and the data are rapidly changing.

Twitter is a recently introduced microblogging and information sharing platform [91] with over 140 million users and 340 million tweets per day. In the past, several studies have been dedicated to analyzing twitter feeds, for example, in the field of opinion mining and sentiment analysis. For example, in Ref. [92], the authors analyzed the text content of daily Twitter feeds by two mood tracking tools: OpinionFinder, which measures positive versus negative mood, and Google-Profile of Mood States (GPOMS), which measures mood in terms of six dimensions (Calm, Alert, Sure, Vital, Kind, and Happy). They cross-validated the resulting mood time series by comparing their ability to detect the public's response to the presidential election and thanksgiving day in 2008. Ratkiewicz et al. [93] used machine learning for analyzing politically motivated individuals and organizations that use multiple

centrally controlled twitter accounts to create the appearance of widespread support for a candidate or opinion and to support the dissemination of political misinformation.
