**Table 1.**

*Description of different dataset used in random forest training.*

This approach leaves a lot of holes in a sequence when a region from sequence of interest is not similar to the region of the other sequence. The alignment of multiple sequences is not as simple as it may seem at the first glance and the position feature of amino acids is threatened. Therefore, we propose a method which conserves the position feature of amino acids in the sequence by translating sequences into terms to apply the same representation technique for text data. T232he window with a subrange in the sequence that gives the best metrics was used and slipped through the given sequence with a fixed step, and each nucleotide (amino acid) segment was stored as a term. The shortest size of the sliding window that gave better metrics values was the size 3. As in the example below, the illustration of sequence transformation using a sliding window with size from one to four (**Figure 1**).
