2.3.2. Visual features

In calculating visual features, each image is represented with a visual-word vector consisting of visual words. A visual word is a cluster in an image that represents a specific pattern shared by keypoints in that cluster. A keypoint in an image is a section of the image that is highly distinctive, allowing its correct match in a large database of features to be found. A keypoint is detected based on various image features. In this study, four types of features are used to detect a keypoint, including histogram of oriented gradients (HOG) [15], gray-level cooccurrence matrix (GLCM) [16], color histogram (CH) [17], and scale-invariant feature transform (SIFT) [14].

HOG is a feature descriptor that is calculated by counting occurrences of gradient orientation in localized portions of an image. Operating on local cells, HOG is invariant to geometric and photometric transformations, but for object orientation.

GLCM is got by calculating how often pairs of pixel with specific values and in a specified spatial relationship occur in an image. It is used to describe texture such as a land surface. It can provide useful information about the texture of an object but not information about the shape or size.

CH is defined as the distribution of colors in an image. It represents the actual number of pixels of a certain color in each of a fixed list of color ranges. A major drawback of a color histogram is that it does not take into account the size and shape of object.

SIFT is an algorithm to detect and describe local features in images. It produces an image descriptor for image-based matching and recognition. It mainly detects interest points from a gray image, at which statistics of local gradient directions of image intensities are accumulated to give a summarizing description of the local image structures around each interest point. The descriptor is used for matching corresponding interest points between different images.

In calculating visual word, the four types of features are firstly calculated for an image. Then, keypoints are derived based on these features. Thirdly, K-means clustering algorithm is used to cluster the keypoints into a large number of clusters. Each cluster is then considered as a visual word that represents a specific pattern. In this way, the clustering process generates a visualword vocabulary describing different patterns in the images. The number of clusters determines the size of the vocabulary.
