**2. The rough set theory**

atmospheric dispersion models with proper meteorological data and predefined typical air

2 Emerging Pollutants - Some Strategies for the Quality Preservation of Our Environment

A lot of studies, however, do not give enough information about the possible relationships between sampling and meteorological parameters, as well as their optimal correspondence formal tools in order to enable modeling and determination of patterns which are characteristic of the investigated area. Proposing conceptual models enables decision-makers at many levels to assess and manage air quality as a whole, rather than on a pollutant-by-pollutant concentration. By developing a holistic approach to air quality, it is possible to evaluate its extensive benefits of more effective developments in existing air-quality features, thereby avoiding the growth of air-quality ceilings, and to consider air quality within its wider meteorological context [3, 4]. By establishing why and when air pollution occasions may occur across a region, strategies should be designed and implemented so as to deal with such episodes. The possibility of forecasting pollutant concentration near the ground with high spatial detail offers the opportunity of constantly monitoring and managing the territory. Air-quality modeling procedures can forecast the behavior and the effects of the substances emitted from identified sources, particularly using data from meteorological instruments. These models can supply the distribution of pollutant concentrations on the ground, and are used for thermoelectric power plant management, being very useful in the case of exceptional events, such

This study analyses the main relationships between air micro-pollution and meteorological conditions of the area surrounding Siracusa, a city located in Sicily. This was done by measuring air samples from a receiving station near a small town called Melilli, a Sicilian industrial

This station has been chosen because it allows the production of a complete picture with respect to the amount of micro-pollution data and meteorological variables descriptions [8]. Then the most reliable parameters for the phenomena of the dispersion of micro-pollutants were identified and also the various critical scenarios were checked, so that all available air pollution sources were considered [9–11]. In particular, a specially designed model, with forecasting abilities of air pollution, has been developed, working independently from the knowledge of the local sources [12]. This monitoring model uses temperature and wind vertical profiles, measured by Radar Analysis Support System (RASS, a radar manufacturer-independent system for evaluating the different elements of a radar by connecting to signals) and SOnic Detection And Ranging (SODAR, a meteorological instrument used as a wind profiler to measure the scattering of sound waves by atmospheric turbulence) and concentration data from ground stations. The local values are correlated with the characteristics of the thermal profile and the direction and intensity of the wind at a selected altitude. On the basis of stored and statistically analyzed data, the model is able to forecast the pollution in the area surrounding the ground station [13]

From the methodological point of view, the proposed approach is in the framework of multicriteria decision analysis, where a lot of different points of views, often conflicting one other, are explicitly considered together to support effective decisions. The utility or, better, the necessity of a multicriteria evaluation in public policies has been recently underlined by Munda [14].

and to give useful information about the management of its main sources.

pollution sources [1, 2].

as when a highly dangerous pollutant escapes [5].

area with a high environmental risk rate [6, 7].

The rough set theory (RST), introduced by Pawlak [15–17], has proved to be an excellent tool for data analysis, even in the presence of inconsistencies and ambiguities. The main idea of the RSA is that every object in the universe U (data to be analyzed) is associated a certain amount of information (data, knowledge), expressed by means of some attributes used for their description (e.g. if the objects are air pollution observed by monitoring stations, attributes may be air temperature, the relative humidity index, direction and wind speed, quantities of some micro-pollutants, etc.). Objects having the same description [18] in terms of these attributes are called indiscernible (similar); the indiscernibility relation thus generated induces a partition of the universe U into blocks of indiscernible objects, called elementary sets or granules of knowledge, which therefore result in information granulation. If set U is divided in some classes, objects indiscernible should belong to the same class to be consistent with the indiscernibility principle.

From the universe U, any subset X can be expressed either precisely (as a union of elementary sets) or approximately. In the latter case, the subset X may be characterized by two ordinary sets, called the lower and upper approximations. The lower approximation of X is composed of all the elementary sets included in X (whose elements, therefore, certainly belong to X), while the upper approximation of X consists of all the elementary sets which have a nonempty intersection with X (whose elements, therefore, may belong to X). A rough set is defined by means of these two approximations, which coincide in the case of an ordinary set. The difference between the two approximations represents the boundary region, whose elements cannot be characterized with certainty as belonging or not to X. The information about objects from the boundary region is, therefore, inconsistent or ambiguous.

The relations existing among conditional attributes/criteria and decisional classes in the multicriteria sorting problem are expressed by decision rules. These are logical statements of the type 'if…, then…', where the antecedent (condition part) specifies values assumed by one or more condition attributes/criteria and the consequence specifies an assignment to one or more decision classes. If there is only one possible consequence, then the rule is said to be certain, otherwise, it is said to be approximate or ambiguous. An object *x*∈U supports decision rule r if its description is matching both the condition part and the decision part of the rule; certain rules are supported only by objects from the lower approximation of the corresponding decision class; approximate rules are supported only by objects from the boundaries of the

Rough Set Applied to Air Pollution: A New Approach to Manage Pollutions in High Risk Rate…

http://dx.doi.org/10.5772/intechopen.75630

5

Procedures for the generation of decision rules from a decision are complex tasks, and a num-

• the generation of an exhaustive set of rules consisting of all possible rules for a decision

• the generation of a set of 'strong' decision rules, even partly discriminant, covering rela-

In this chapter to infer the rules, the jMAF software has been used, that is available for free in the Internet: RSES – Rough Set Exploration System, http://logic.mimuw.edu.pl/~rses , ROSE – ROugh Set data Explorer http://idss.cs.put.poznan.pl/site/rose.html, –jMAF, java Multicriteria and Multi-attribute Analysis Framework http://www.cs.put.poznan.pl/jblaszczynski/ Site/jRS.html, and jRank – ranking generator using Dominance-based Rough Set Approach

The rules inferred by DRSA can use also the 'at least' and 'at most' terms in their conditional and decisional parts. All these rules are expressed in a natural language, simple to understand the studied phenomenon and for decision support [22]. This means that the proposed approach actually is also able to explain the reasons of a particular pollution situation, moreover showing the real examples of these (traceability of decisions), and is able to support the management in preventing pollution damages, presenting them the situations where some critical events are most probable. Moreover, parameters like the support (the number of the objects which satisfy both the conditional part and the decisional part of the rule) and the confidence (the ratio between support and the number of the objects which satisfy only the conditional part of the rule, expressed in percentage) help the decision-maker in their choice of the most relevant

We can summarize the main characteristics of the rough set approach as follows. With respect to input information (object description), both quantitative and qualitative data can be considered, even if they present some inconsistencies. With reference to output, information about the relevance of attributes and the quality of approximation can be acquired, and the final

• the generation of a minimal set of rules covering all objects from a decision table;

tively many objects from the decision table (but not necessarily all of them).

http://www.cs.put.poznan.pl/mszelag/Software/jRank/jRank.html [28].

corresponding decision classes.

table; and

rules.

ber of procedures have been proposed to solve it [18, 25–27].

The existing induction algorithms use one of the following strategies:

The original RSA based on the indiscernibility relation (usually called classical rough set approach) is not, however, able to deal with preference ordered attribute domains (so-called criteria) and preference ordered decision classes (sorting problem), very often crucial for application to real problems in the field of multicriteria decision analysis.

To be able to deal with criteria and ordered decision classes, Greco et al. [19–25] have proposed an extension of the original rough set theory, called dominance-based rough set approach (DRSA), mainly based on the substitution of the indiscernibility relation by a dominance relation: object *a* dominates object *b*, if and only if *a* is at least as good as *b* with respect to all considered conditional criteria. In a similar way, the decision attribute *d* makes a partition of U into a finite number of preference ordered classes, Cl = {Clt, t = 1,…, *n*}, each *x*∈U belonging to one and only one class, Clt∈Cl. We can therefore state a basic consistency principle with respect to the dominance relation: if object *a* dominates object *b* with respect to a set of criteria and *b* belong to class Clt, *a* should belong at least to class Clt (upward union of Clt). Otherwise, there is an inconsistency with respect to the dominance principle. Therefore, *x* belongs to the lower approximation of any subset X of U if all objects dominating *x* belong to at least the same class of *x*, that is, *x* belongs to Clt or better without any ambiguity; x belongs to the upper approximation of X if among the objects dominated by x there is at least an object y belonging to Clt or better. In a similar way, it is possible to define lower and upper approximation of downward union of classes. Also in DRSA, the difference between the two approximations represents the boundary region.

The objects from U can be split into some decision classes by decisional criterion d, obtaining a decision table (DT), where each object x is described using some independent variables, called conditional attributes/criteria, and each object is assigned to a class of this partition, considered as a dependent variable. The quality of classification expresses the ratio between the objects which have been correctly classified and the total number of the elements of the DT, it lies between 0 (any object is not correctly classified) and 1 (all the objects of the universe are correctly classified), and therefore it can measure the goodness of the classification.

Besides, the classification quality may be unaltered if certain conditioned attributes are eliminated because they are superfluous. The minimal sets of the attributes which maintain the same classification quality of the entire table are called reducts. The intersection among all the reducts generates the core (the set of the most important attributes, which consequently cannot be eliminated without deteriorating the quality of the classification). Therefore, the attributes belonging to the core are indispensable, while the attributes belonging to the reducts are exchangeable with one another; the others are actually superfluous.

The relations existing among conditional attributes/criteria and decisional classes in the multicriteria sorting problem are expressed by decision rules. These are logical statements of the type 'if…, then…', where the antecedent (condition part) specifies values assumed by one or more condition attributes/criteria and the consequence specifies an assignment to one or more decision classes. If there is only one possible consequence, then the rule is said to be certain, otherwise, it is said to be approximate or ambiguous. An object *x*∈U supports decision rule r if its description is matching both the condition part and the decision part of the rule; certain rules are supported only by objects from the lower approximation of the corresponding decision class; approximate rules are supported only by objects from the boundaries of the corresponding decision classes.

Procedures for the generation of decision rules from a decision are complex tasks, and a number of procedures have been proposed to solve it [18, 25–27].

The existing induction algorithms use one of the following strategies:

of all the elementary sets included in X (whose elements, therefore, certainly belong to X), while the upper approximation of X consists of all the elementary sets which have a nonempty intersection with X (whose elements, therefore, may belong to X). A rough set is defined by means of these two approximations, which coincide in the case of an ordinary set. The difference between the two approximations represents the boundary region, whose elements cannot be characterized with certainty as belonging or not to X. The information about objects

The original RSA based on the indiscernibility relation (usually called classical rough set approach) is not, however, able to deal with preference ordered attribute domains (so-called criteria) and preference ordered decision classes (sorting problem), very often crucial for

To be able to deal with criteria and ordered decision classes, Greco et al. [19–25] have proposed an extension of the original rough set theory, called dominance-based rough set approach (DRSA), mainly based on the substitution of the indiscernibility relation by a dominance relation: object *a* dominates object *b*, if and only if *a* is at least as good as *b* with respect to all considered conditional criteria. In a similar way, the decision attribute *d* makes a partition of U into a finite number of preference ordered classes, Cl = {Clt, t = 1,…, *n*}, each *x*∈U belonging to one and only one class, Clt∈Cl. We can therefore state a basic consistency principle with respect to the dominance relation: if object *a* dominates object *b* with respect to a set of criteria and *b* belong to class Clt, *a* should belong at least to class Clt (upward union of Clt). Otherwise, there is an inconsistency with respect to the dominance principle. Therefore, *x* belongs to the lower approximation of any subset X of U if all objects dominating *x* belong to at least the same class of *x*, that is, *x* belongs to Clt or better without any ambiguity; x belongs to the upper approximation of X if among the objects dominated by x there is at least an object y belonging to Clt or better. In a similar way, it is possible to define lower and upper approximation of downward union of classes. Also in DRSA, the difference between the two

The objects from U can be split into some decision classes by decisional criterion d, obtaining a decision table (DT), where each object x is described using some independent variables, called conditional attributes/criteria, and each object is assigned to a class of this partition, considered as a dependent variable. The quality of classification expresses the ratio between the objects which have been correctly classified and the total number of the elements of the DT, it lies between 0 (any object is not correctly classified) and 1 (all the objects of the universe are correctly classified), and therefore it can measure the goodness of the

Besides, the classification quality may be unaltered if certain conditioned attributes are eliminated because they are superfluous. The minimal sets of the attributes which maintain the same classification quality of the entire table are called reducts. The intersection among all the reducts generates the core (the set of the most important attributes, which consequently cannot be eliminated without deteriorating the quality of the classification). Therefore, the attributes belonging to the core are indispensable, while the attributes belonging to the reducts are

exchangeable with one another; the others are actually superfluous.

from the boundary region is, therefore, inconsistent or ambiguous.

4 Emerging Pollutants - Some Strategies for the Quality Preservation of Our Environment

approximations represents the boundary region.

classification.

application to real problems in the field of multicriteria decision analysis.


In this chapter to infer the rules, the jMAF software has been used, that is available for free in the Internet: RSES – Rough Set Exploration System, http://logic.mimuw.edu.pl/~rses , ROSE – ROugh Set data Explorer http://idss.cs.put.poznan.pl/site/rose.html, –jMAF, java Multicriteria and Multi-attribute Analysis Framework http://www.cs.put.poznan.pl/jblaszczynski/ Site/jRS.html, and jRank – ranking generator using Dominance-based Rough Set Approach http://www.cs.put.poznan.pl/mszelag/Software/jRank/jRank.html [28].

The rules inferred by DRSA can use also the 'at least' and 'at most' terms in their conditional and decisional parts. All these rules are expressed in a natural language, simple to understand the studied phenomenon and for decision support [22]. This means that the proposed approach actually is also able to explain the reasons of a particular pollution situation, moreover showing the real examples of these (traceability of decisions), and is able to support the management in preventing pollution damages, presenting them the situations where some critical events are most probable. Moreover, parameters like the support (the number of the objects which satisfy both the conditional part and the decisional part of the rule) and the confidence (the ratio between support and the number of the objects which satisfy only the conditional part of the rule, expressed in percentage) help the decision-maker in their choice of the most relevant rules.

We can summarize the main characteristics of the rough set approach as follows. With respect to input information (object description), both quantitative and qualitative data can be considered, even if they present some inconsistencies. With reference to output, information about the relevance of attributes and the quality of approximation can be acquired, and the final results are expressed in the form of 'if…, then…' decision rules, which are sentences that decision-makers find easier to understand [29–31] and using only the most relevant attributes/criteria (i.e. some reduct).

high environmental risk rate place' by the Law 349/86 and covers six surrounding towns (Augusta, Priolo, Melilli, Siracusa, Floridia, Solarino); the landscape is very varied and is

Rough Set Applied to Air Pollution: A New Approach to Manage Pollutions in High Risk Rate…

http://dx.doi.org/10.5772/intechopen.75630

7

In this territory, a lot of chemical plants, energy production industries, and oil refineries are found, as well as members of a private organization, the industrial trust for environmental safety (CIPA, Consorzio Industriale Protezione Ambiente–Environmental Protection Industrial Consortium). In its operative center, CIPA assembles and works out different micro-pollution parameters and various meteorological variables, measured by 12 different monitoring stations. Data collection and processing is useful in statistical analysis and in upgrading air pollution management in order to avoid the air-tested-exceeding threshold

This chapter studies monitoring station in Melilli only, because in this place, data concerning air micro-pollution quantities and weather conditions, present at the moment of pollution sample construction, are thoroughly collected. In fact, in Melilli, monitoring station hourly quantities of some micro-pollutants, such as sulfur oxide, nitric oxide, non-methanic hydrocarbon, ozone, sulfonyl hydrogen, and different meteorological conditions present at the moment of their observations, such as air temperature, relative humidity index, wind direction, and speed are observed and stored. Some previous studies show the evident correlation between these environmental variables and the quantity of air micro-pollution found in the samples. Because of the complete data present in the samples studied, levels of four micro-

Data recovered from the Melilli monitoring station during 2 weeks, more precisely 1 week in January and 1 in August 2010, have been studied, in order to observe differences of analysis results also on the basis of the different seasons of the year. Daily available recorded 'objects' described both by meteorological variables (condition attributes) and by micro-pollution quantity (decision attribute) have been considered. More than 1000 data records have been

The selected condition attributes/criteria (descriptors) considered in this analysis are the hour of observation (attribute), wind speed (criterion) and wind direction (attribute), air temperature (criterion), and the relative humidity index (criterion), whereas the levels of the aforementioned micro-pollutants are the decision classes. The descriptors have been chosen because in previous studies [36, 42] they looked like some very important factors, at a local

In spite of the fact that data samples used are restricted to a relatively short period of time (each one only 2 weeks), their analysis allowed us to obtain some interesting results, both from methodological and from operational points of view, which give an idea of the knowledge extraction (in terms of decision rules) from available data using the considered approach

, NMHC, NOx) in correlation with the meteorological variables previ-

formed by sandy hills, mountains, and plains near the coast [37, 38].

qualities, previously established [38–42].

ously mentioned [42, 43] are analyzed in this chapter.

analyzed, as an example to which the RSA could be applied.

level, influencing air micro-pollution quantity.

pollutants (SOx, CH<sup>4</sup>

**4. Results**

In the case of air pollution problem at hand, for example, we can consider some different decision classes of pollution according to an increasing level of some micro-pollutants (SOx, NOx,…). Since some meteorological variables (conditional attributes/criteria) present a monotonic relationship with the degree of pollution (e.g. the air temperature, the degree of humidity) and other no (e.g. wind direction, etc.), it is very important from both the operational and methodological points of view to take into consideration and to exploit in the appropriate way in the description of the objects and in the rule induction attributes and criteria distinctly. Therefore, we have to consider the indiscernibility relation with respect to the former, the dominance relation with respect to the latter, and the assignment to ordered classes with respect to the decision.

Greco et al. [26] proposed an approach for this kind of real-life multicriteria problems. This can be easily modeled by introducing some appropriate thresholds to discretize the conditional attributes and to characterize different levels of air pollution, for the decision classes. No discretization is required with respect to criteria, using the DRSA.

Consequently, the rough sets could be very efficiently applied in the case of uncertainty derived from the granularity of information. Actually, granules of condition attributes/criteria (objects having the same descriptions or respectively belonging to the same dominating/dominated sets) are used to approximate granules of decision (assignment to some decision classes).

The RSA is therefore very different with respect to the fuzzy sets, where the linguistic imprecision due to the use of natural language is mainly considered, and the membership function aims at indicating in what degree each object belongs to a particular class. Of course, the two approaches are not mutually exclusive, but they can actually be used in a complementary way [32–34]. Using a terminology from image representation, we could say that rough sets are related to the number of pixels of an image (its resolution), while the fuzzy sets represent the number of gray levels between black and white. At an operational level, the implementation of fuzzy sets always requires the definition and specification of particular membership functions, one for each attribute, not easy to specify analytically. Therefore, both classical rough set approach and fuzzy sets are sensitive to the specification of these values and both interesting and useful sensitivity and robustness analysis are actually useful and recommended by moving the level of the thresholds and other parameters [30, 35, 36]. It is not the case of DRSA, where actually no parameter should be elicited, but only some example of decision (from the past experiences of from expert knowledge) is needed to model the preference of the decision-maker.
