**6.1 Ensure that the data sets used to train and evaluate the prediction models are representative and high quality**

Ensuring a representative dataset and high-quality data is crucial to avoid social biases in the predictions made by an AI product. For example, if a dataset is composed mainly of one demographic group, the AI product may not accurately capture the needs or behaviors of other groups, leading to biased predictions. Additionally, if data quality is low, the AI product may generate inaccurate predictions, leading to social bias. Therefore, it is essential to take measures to ensure that the data set used to train an AI model is diverse, balanced, and high quality to minimize the risk of social biases in the predictions made by the AI product.

Several analytical procedures can be included to achieve a representative highquality data set. Such as including samples from all relevant groups and populations, use statistical methods to identify and remove any bias in the dataset before training the AI model or use multiple sources of data to train the AI model to reduce the

*Human Factor on Artificial Intelligence: The Way to Ethical and Responsible Economic Growth DOI: http://dx.doi.org/10.5772/intechopen.111915*

likelihood of bias from an only source. Many actionable guidelines for those actively involved in the development, evaluation, and implementation of AI-based prediction models are available (e.g., [23]) and adapted to specific sectors. For illustration, de Hond et al. [24] provide a general guide for healthcare. When it comes to the financial sector, several guidelines can be used depending on the specific case to face. For instance, Zampino et al. [25] offer an example of a guide for creditworthiness.
