**3.2 Algorithm**

Data preparation is a key process in data analysis. The basic preparation and cleaning procedures are:


Specifically, the cleaning includes the following items:


More advanced techniques include:

• Coding:

Categorical variables are labeled as character variables and must be converted to a factor type for modeling purposes. Queues perform this task.

• Outliers:

For numeric variables, we can identify deviations numerically by the value of the bias.

• Normalization/logarithmic transformation:

One of the techniques to normalize the biased distribution is logarithmic transformation. First, a new variable is created, while later the value of the bias of this new variable is calculated and printed.

• Standardization:

One of the standardization techniques is that all characteristics are centered around zero and have approximately the variance of one unit. Scaling is used so that the variable is converted. The result is that these variables are standardized with a mean of zero.

As part of the preparation for PCA, firstly missing values from the dataset were filled with zeros. After that, the data was scaled by using a standard scaler, which standardizes features by removing the mean and scaling to unit variance. The preprocessed dataset, was then used for:


All three of the PCA methods were instanciated with the number of components set to 7. After PCA, the now transformed data went through several clustering methods for the purpose of comparing results. The clustering methods that were used for each PCA are:


Furthermore, each of the clustering methods were executed with just the preprocessed data, without PCA, also for the purpose of comparing results.

**Algorithm 1**: Principal Component Analysis.
