**Abstract**

The interaction between protein and Ribonucleic Acid (RNA) plays crucial roles in many biological aspects such as gene expression, posttranscriptional regulation, and protein synthesis. However, the experimental screening of protein-RNA binding affinity is laborious and time-consuming, there is a pressing desire of accurate and reliable computational approaches. In this study, we proposed a novel method to predict that interaction based on both sequences of protein and RNA. The Random Forest was trained and tested on a combination of benchmark datasets and the term frequency–inverse document frequency method combined with XgBoost algorithm was used to extract useful information from sequences. The performance of our method was very impressive, and the accuracy was as high as 94%, the Area Under the Curve of 0.98 and the Matthew Correlation Coefficient (MCC) of 0.90. All these high metrics, especially the MCC, show that our method is robust enough to keep its performance on unseen datasets.

**Keywords:** protein, RNA, interaction, random forest, TFIDF, machine learning
