**1. Introduction**

PCA is a statistical technique for compressing the content of large datasets into a smaller number of summary indices that can be examined and evaluated more quickly. Principal component analysis (PCA) is a multivariate statistical methodology that is frequently utilized nowadays [1]. It is a factor analysis-based statistical method that is widely used in the disciplines of pattern recognition and signal processing. It is a dimensionality reduction technique that condenses a large number of variables into a smaller set while maintaining the majority of the larger dataset. Because smaller datasets are easier to examine and visualize, machine learning algorithms can assess data more efficiently and rapidly without dealing with extra impediments.

PCA is also commonly employed in exploratory data analysis and prediction model construction. It is frequently used for dimensionality reduction, which involves projecting each data point onto only the first few principal components (PCs) in order to obtain lower-dimensional data with the least amount of variance. The first PC is a direction that lowers the predicted data variance. The *i* th PC minimizes the variance of the projected data by being the inverse of the first *i* 1 PC.

The primary components of the data covariance matrix can be proven to be eigenvectors. As a result, Eigen decomposition of the data covariance matrix or singular value decomposition of the data matrix is typically used to extract primary components. PCA, closely related to factor analysis, is the most fundamental of the real eigenvector-based multivariate techniques. On the other hand, factor analysis makes additional domain-specific assumptions about the underlying structure and solves matrix eigenvectors. Canonical correlation analysis (CCA) is also tied to PCA. PCA suggests a new orthogonal coordinate system for defining variance in a single dataset, whereas CCA proposes coordinate systems for describing cross-covariance across two datasets.

The purpose of the research is to explore and suggest a suitable type of PCA to reduce the data dimensions, which helps to identify the malware data points significantly.
