**2. Related work**

*Artificial Intelligence - Latest Advances, New Paradigms and Novel Applications*

portion size from food images remains uncertainty.

Network (CNN) and Recurrent Neural Network (RNN).

*Limitation of integrating the purchase transaction over different database.*

tool to enrich our life. It can be defined as a system that will recommend items to the

Accordingly, to achieve better food recommendation, it would be useful to analyze foods that people are actually eating in daily life. POS (Point of Sale) is a large-scale transaction data relevant to the customer's purchase tendency [6]. The data is used only by individual store and not open for public. Therefore we cannot analyze the food purchase data among different stores, restaurants, canteens, and so on. *Amazon Go* is a smart store where a purchase transaction can be detected by a camera. Comparing with the POS system, the *Amazon Go* provides an automatic management of information about the foods that people bought, including the items that associated with those products and the appearance of each item with individual preference [7]. In addition, the system can predict the expectation of the market. Note that this kind of *Amazon-Go-like* system has similar constraint for collecting big data. The obtained database from different sources thus is varied. Therefore we can say that there is a limitation of integrating the purchase transaction over different database as shown in

From the diagram, it seems to be meaningful to create a system that analyzes the big data of food-images from various communities including companies, restaurants, and groups in social network system for extracting the people's preference of food combination, food design, and food appearance by applying the image recognition technology. In the field of learning representation, there are many established models such as Artificial Neural Network (ANN), Convolutional Neural

ANN is a broad term that encompasses any form of deep learning model. It can be either shallow or deep depending on the number of hidden layers. CNNs are designed specifically for computer vision. They are different from standard layers of ANNs as they are constructed to receive and process pixel data. RNNs are the "time series version" of ANNs. They are meant to process the sequences of data. They are at the basis of forecasting models and language models. The most common kind of recurrent layers are called LSTM (Long Short Term Memory) and GRU (Gated Recurrent Units). They contain a series of small, in-scale ANNs that are able

There was demonstration that digital imaging could estimate food information in many environments and it had many advantages over other methods [4, 5]. However, to derive the food information such as food type, food combination and

users/customers within an environment depending on their past activities.

**74**

**Figure 1.**

**Figure 1**.

Food image recognition is one of the promising applications of visual object application, as it will help estimate food characteristics and analyze people's eating choices for daily life. Many research works represented food recognition more practical by using the convolutional neural network (CNN) model [10–12]. CNN was applied to the tasks of food detection and recognition through parameter optimization. A dataset of the most frequent food items was constructed in a publicly available food-logging system. The CNN showed significantly higher accuracy than a conventional method did. In addition, the color feature is not always helpful for improving the accuracy by comparing the results of two group of controlled trials. It was reported that the achievement of CNN model was at 70–80% on one dataset and 60% on the multi-food dataset. The improvements could be expected by collecting more images and optimizing the network architecture and hyper-parameters.

For example, Deep Convolutional Neural Network (DCNN) was introduced for food recognition based on a combination of CNN-related techniques such as pre-training with the large-scale ImageNet data, fine-tuning and activation features extracted from the pre-trained CNN [13, 14]. Another approach was based on two main steps: firstly, to produce a food activation map on the input image (i.e. heat map of probabilities) for generating bounding boxes proposals and, secondly, to recognize each of the food types or food-related objects presented in each bounding box [15]. Interestingly, the Max-Pooling function was used for the data and the features extracted from this function were used to train the network. An accuracy of 86.97% for the classes of the FOOD-101 data set was recognized [16]. It was found that the image classification could be extended using prominent features that could categorize food images. Note that the feature-based approach and the multi-level classification approach (hierarchical approach) were highly appreciable to avoid mis-classifications when the number of classes was increased. However, these methodologies consumed high computational time.

#### **2.1 Concept of convolutional neural network (CNN)**

Convolutional neural network is a network that employs a mathematical operation called convolution. There are two main processes in CNN architecture – Learning extraction and Classification [17].

#### **Step-1: Learning extraction**

This process executes feature extraction from images through the following three layers -

a.Convolution layer: this is the first layer to extract features from an input image. There are matrix filters (feature map) that multiplies with image in order to extract some features such as edge, blur, and color.

