6.1.2 PASCAL VOC

The PASCAL VOC 2012 [19] are a commonly used database for evaluating semantic segmentation. The PASCAL VOC 2012 training and validation set has 11,540 images containing 27,450 bounding box annotated objects and 6929 segmentations in 20 categories and one background category. The data is divided into 5717 images for training and 5823 images for testing (val). The 20 object classes that have been selected are: person, bird, cat, cow, dog, horse, sheep, aeroplane, bicycle, boat, bus, car, motorbike, train, bottle, chair, dining table, potted plant, sofa, and tv/monitor.

## 6.1.3 MS COCO

The MS COCO training set contains about 80,000 images for training and 40,000 images for validation [20]. This dataset consists of ˜500,000 annotated objects of 80 classes and one background class. COCO is a more challenging dataset as it contains objects in a wide range of scales from small (<322 ) to large (>962 ) objects.

## 6.2 Experiment 1: object segmentation by RLS

This section compares our object segmentation RLS against Chan-Vese's LS [10], DRLSE [23], Li's method. [24], L2S [25] and a simple Neural Network with two fully-connected layers followed by a ReLU layer in between and we name it

F-ConNet. The experiments are validated on both synthetic images, medical images and natural images collected in the wild. The input images are resized to 64 ˜ 64, thus, the number of units in fully-connected layers and hidden cell of GRU is 4096.

#### Figure 7.

Comparison between the proposed RLS against other LS-based methods: 1st row—input images, 2nd row—CLS [10], 3rd row—DRLSE [23], 4th row—Li et al. [24], 5th row—L2S [25], 6th row—our RLS, and 7th row groundtruth. The best results for [10, 23–25].


#### Table 2.

Average F-measure (FM) and testing time obtained by CV's model, DRLSE [23], Li et al. [24], L2S [25], F-ConNet and our proposed RLS with two different ground truth (GT1 and GT2) across Weizmann database. Recurrent Level Set Networks for Instance Segmentation DOI: http://dx.doi.org/10.5772/intechopen.84675

Figure 7 shows comparison between our proposed RLS against other methods on image segmentation where CLS model [10] is used as baseline, DRLSE [23] as for global approach, Li et al. [24, 25], L2S [25] as for the inhomogeneity approach, F-ConNet as a baseline deep learning approach on the synthetic medical dataset and natural dataset. The best results from CLS's method, DRLSE, Li's method are L2S are given in the second, third, fourth, fifth rows respectively. The performance of our proposed RLS is given in the sixth row whereas the last row shows the ground truth. Quantity assessment on F-measure is given in Table 2 with two separated


#### Table 3.

Quantitative results and comparisons against existing CNN-based semantic segmentation methods (SDS (AlexNet) [27], Hypercolumn [28], CFM [29], MNC [26]) on the PASCAL VOC 2012 validation set.

#### Figure 8.

Some examples of semantic segmentation on PASCAL VOC 2012 database. The input image (1st column), MNC [25] (2nd column), our semantic segmentation CRLS (3rd column), and the ground truth (4th column) (best viewed in color).

groundtruth versions (GT1 and GT2) provided by two different annotator on Weizmann database [22]. Compare to other methods, RLS achieves the best segmentation performance in this experiment on both groundtruth annotated by two different people.

In terms of time consuming, the baseline method CLS takes 13.5 s while Li et al.'s and DRLSE approaches have similar time consuming of 20.4 and 23.5 s on average to process one image with original size. L2S consumes less time (10.2 s) than the others CLS, Li's and DRLSE whereas the proposed RLS takes 0.008 s and F-ConNet takes least time consuming with 0.001 s.

#### Figure 9.

Some examples of semantic segmentation on MS COCO database on validation set. The input image (1st column), MNC [26] our semantic segmentation CRLS (2nd column) and the ground truth (3rd column) (best viewed in color).

Recurrent Level Set Networks for Instance Segmentation DOI: http://dx.doi.org/10.5772/intechopen.84675


Table 4.

Quantitative results and comparisons against existing CNN-based semantic segmentation method MNC [26] on the MS COCO 2014 database.

## 6.3 Experiment 2: semantic instance segmentation by CRLS

We demonstrate our proposed CRLS approach on two common semantic segmentation dataset, i.e., PASCAL VOC 2012 and MS COCO. The end-to-end CRLS network is trained using the ImageNet pre-trained VGG-16 model. We follow the same protocols used in recent papers [26–29] to evaluate the semantic segmentation task. We also use the same metrics reported in recent semantic object segmentation papers [26–29].

On PASCAL VOC dataset, we evaluate mAP<sup>r</sup> with IoU at 0.5 and 0.7. As shown in Table 3, we compare our proposed CRLS against state-of-the-art CNN-based semantic segmentation methods including SDS [27], Hypercolumn [28], CFM [29] and MNC [26]. As can be seen from the table, our CRLS achieves higher mAP<sup>r</sup> (about 3%) at both 0.5 and 0.7 than previous methods using the same testing protocol. Not only higher segmentation accuracy, the experimental results also show that our proposed CRLS gives very better testing time (0.54 second per image). Some illustrations of multi-instance object segmentation by our proposed CRLS on PASCAL VOC 2012 dataset are shown in Figures 8 and 9.

FOn MS COCO 2014, we use 80 k images for training and 20 k images in the test set (test-dev) for evaluating. The performance is measured on (mAP<sup>r</sup> ) using IoU between 0.5 and 0.95 and mAP<sup>r</sup> using IoU at 0.5 (as PASCAL VOC metrics). As shown in Table 4, our CRLS achieves better results than the previous method (MNC) on the COCO dataset.
