**5.5 DQ control and follow-up**

Based on all DQ solutions tested, the most appropriate solution(s) should be selected for implementation. It is important to note here that the success of implementation is dependent on the guidance foreseen to all stakeholders. In essence, this comes down to providing information on the solution and its effectuation on

all (related) business processes to everybody involved. In addition, business rules, definitions, roles and responsibilities must be defined in consultation with all stakeholders.

Obviously, a close monitoring is needed in order to follow-up on the effectiveness of the implemented DQ solution in the real-world setting as a means to validate the (positive) impact of the proposed DQ solution. At the same time, it allows for the detection of unexpected errors that were unanticipated in the experimental test phase, and the swift adoption of corrective measure in case required. Specific monitoring tools that can be used here include control charts, also known as Shewhart charts, cause and effect diagrams, check sheets, histograms, Pareto charts, scatter diagrams, … [25].

With regards to the author disambiguation example described, it will be required to install business processes that allow for the coupling of a unique author ID with corresponding research publications. This includes the close cooperation of the authors, research administrators, data analysts and data system/IT-staff on the definitions, business rules and responsibilities of each stakeholder. For instance, it might well be that authors are obliged to enter a unique author ID in a database system in fixed format, rather than a free text field. A business rule could be that for each author, an author ID of a given type (i.e., ORCID, Researcher ID, Scopus ID, Research Gate ID.) should be kept in a data system, which translates to a value of a given format, that is, an integer, in terms of a derived validation rule. This author ID field might be used to search large bibliometric databases such as Web of Science, Scopus, … for publications that might be coupled to this author ID, which could be added to the bibliometric profile of a researcher. Furthermore, publications might also be retrieved using an author name search that are not yet coupled to this author ID. Therefore, an authentication step is required here in which the author has a critical responsibility to validate these publications. Research administrators and data analysts should be informed on the process of authentication in order to use the information in a correct manner. Although this might seem a perfect solution, the reality demonstrates that a continuous follow-up is required as practice demonstrates that authors sometimes use several author IDs of the same type. Therefore, a corrective action could be to adapt the business rules in order to allow for only one author ID of a give type within the data system as well as the notification to the author to take corrective measures in this respect and the follow-up thereof.

It is clear from the example described above, that data quality improvement is a process that requires continuous monitoring due to internal and external factors that might affect data quality and its related processes. Therefore, the systematic and continuous retaking of the DQ improvement workflow will be the only manner to constantly have qualitative data instrumental for high quality data analyses.
