*On the Use of Modified Winsorization with Graphical Diagnostic for Obtaining… DOI: http://dx.doi.org/10.5772/intechopen.104539*

influential observations resulting either from an incorrect distributional assumption or an inherent variability of the dataset [30]. Oftentimes, these hidden influential observations are not considered by any of the above methods for optimizing an actual hit rate. Consequently, the PDF's solution obtained by either of the two approaches may be optimal but not statistically optimal. To overcome the problem of hidden influential observations, Iduseri and Osemwenkhae [6] proposed a novel method for attaining an optimal training sample. Their method otherwise known as modified winsorization with graphical diagnostic (MW-GD) method yielded a PDF's solution which was statistically optimal for both the training sample that gave rise to it and for other training samples from the same population. However, the graphical diagnostic associated with this new method may be difficult to interpret if there are no significant differences between a variable shape in the groups of the 2-D area plot, and yet there is evidence of hidden influential observations in the training sample.

This paper provides a more comprehensive analysis of the idea and concept of the MW-GD method, as well as proposed an alternative statistical interpretation of the informative graphical diagnostic associated with the method when confronted with the challenge of differentiating between a variable shape in the groups of the 2-D area plot. The remaining sections of this paper are organized as follows. Sections 2 and 3 discuss the problems posed by the presence of outliers and legitimate contaminants in the training sample that yields the PDF, the concept of statistical optimality of the PDF classification accuracy, and the robustness of PDF, respectively. Section 4 describes in details the idea and concept of the modified winsorization with a graphical diagnostic for obtaining a statistically optimal training sample, as well as presents the proposed alternative statistical or numerical interpretation of the informative graphical diagnostic. Section 5 presents the results and discussions based on the application of two real life samples, while Section 6 presents the conclusions.
