**4.** *F***<sup>1</sup> score**

The value of accuracy in testing the data is known by the *F*<sup>1</sup> score, which is the average of Precision and Recall, where both metrics are calculated simultaneously [10]. Precision describes the degree of precision between the required data and the model's predicted outputs [10]. The percentage of success of a model in recovering information is represented through Recall. The formula for the *F*<sup>1</sup> score is as follows:

$$F\_1 \\ \text{Score} = 2 \* \frac{precision \* recall}{precision + recall} \tag{5}$$

The *F*<sup>1</sup> score calculation can be used as an evaluation standard from the predictive classification result if there is a class imbalance in the data.

The following are the steps taken to conduct this research:


#### **4.1 Result and discussion**

This research begins by analyzing the dataset that has been prepared to determine whether the data has missing values, data imbalances, and other problems in the data. Proceed to the preprocessing stage to remove symbols, emoji, number punctuations, and white/multiple spaces in the text. Filtering is also done to remove words that are stop words. Then, To identify the frequency of occurrence of a word in the document, the data is transformed into vector form, and the value of Term Frequency (TF) and Inverse Document Frequency (IDF) for each token (word) is calculated. The clean and weighted data is then divided into train and testing groups with varying ratios. The support vector machine method and the radial basis function (RBF) kernel are used to classify Instagram caption data. The whole process ends with evaluating the algorithm performance using the *F*<sup>1</sup> score to overcome the imbalanced data. The difference in the distribution of train data and testing data proportion aims to see whether there is an effect of the training data and testing data proportion on the results of the *F*<sup>1</sup> score.

The outcome of the analysis is as follows:

From **Table 3**, we can see the *F*<sup>1</sup> score is obtained from the experiment. The *F*<sup>1</sup> score is generated using a distinct proportion of training and testing data and the results of Recall value and Precision. The results show that a bigger proportion of data training, compared to the data testing, will produce a more significant *F*<sup>1</sup> score compared to the other proportions.

**Tables 4**–**7** show particular findings for Precision value, Recall, and *F*<sup>1</sup> Score in each category with a varied proportion of data training and testing (70:30, 60:40, 50:50, and 40:60, respectively).The following is the result of the calculation for Precision value, Recall, and *F*<sup>1</sup> Score in each data proportion:

In **Table 3** the classification results are presented using the Support Vector Machine algorithm. The average *F*<sup>1</sup> score is above 88% and the largest *F*<sup>1</sup> score is the proportion of training and testing data with a proportion of 70:30. These results are obtained through the Kernel Radial Basis Function (RBF). This proves that a larger amount of training data in a model can produce better results. The *F*<sup>1</sup> scores from each category with different training data share and testing data proportions are shown in **Tables 4**–**7**. The proportion of data share from training data and testing data generated is 70:30. These results are better, especially in the Technology category.

It might be interesting to split training data set and testing data set with the ratio of 80 per cent training set and 20 per cent test set and perform another experiment using that ratio. The result could give higher or lower accuracy compared with previous experiment. However, based on the references, it will be depends on the method and algorithm used.


**Table 3.**

*Comparison of precision, recall, and F*<sup>1</sup> *score for each training and testing proportion.*
