**2. TF–IDF**

The next step after text processing is TF — IDF method. At this stage, each word is assigned a weight based on how frequently it appears in the manuscript or document [6]. The computation of Term Frequency (TF) and Inverse Document Frequency (IDF) is also included in this technique (IDF). The steps are as follows:

1.Term Frequency (TF).

2. Inverse Document Frequency (IDF)

3.Term Frequency-Inverse Document Frequency (TF — IDF)

**TF (Term Frequency)** means the number of occurrences or the frequency of words in a document is calculated. The larger the conformity value, the more frequently a phrase appears in a text, indicating that it has a high TF [2]. Here is the formula from the TF:

$$TF = \begin{cases} 1 + \log\_{10} \left( f\_{t,d} \right) & , \quad f\_{t,d} > 0 \\ 0 & , \quad f\_{t,d} = 0 \end{cases} \tag{1}$$

A frequency term (*t*) in a document (*d*) is the value of *ft*, *<sup>d</sup>*, *d*.

**IDF (Inverse Document Frequency)** means the distribution of a term in a collection of related texts is calculated. The relationship between the terms available in the text is also shown through this [7]. The less text a particular term contains, the larger the IDF. Here is the formula from the IDF:

$$IDF\_t = \log\left(N\bigvee\_{df\_t}\right) \tag{2}$$

*N*: The number of text documents *dft* : The total number of documents that containing the phrase "t-word" (according to the referred term).

**TF — IDF** is the multiplication of the results of the weighting of the frequency of a term and the frequency of the document inversely related to that term [7]. Here is the formula from the TF — IDF:

$$w\_{\vec{\eta}} = \text{TF} \times \text{IDF}\_{\text{f}} \tag{3}$$

*TF*: Term Frequency *IDFt*: Inverse Document Frequency.
