**Semi-Automatic Semantic Data Classification Expert System to Produce Thematic Maps**

Luciene Stamato Delazari, André Luiz Alencar de Mendonça, João Vitor Meza Bravo, Mônica Cristina de Castro, Pâmela Andressa Lunelli, Marcio Augusto Reolon Schmidt and Maria Engracinda dos Santos Ferreira

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/51848

**1. Introduction**

This chapter presents an expert system, designed to classify semantic information in a geo‐ graphic database, aiming to assist non-expert map-makers. Despite the fact that GIS science has been discussing how to deal with ordinary users and their relationship with map pro‐ duction, especially due to the popularization of GIS (*Geographic Information Systems*) soft‐ ware and webmapping technologies, there are still issues concerning map production and its quality. Some of these issues are related to data classification methods, knowledge about levels of measurement and, thus, to map symbolization itself. In Brazil, this subject can be of special interest to municipality and state government departments, NGOs and institutions which use maps for planning and for decision-making support. At least in part, problems seem to occur because of the ease of GIS use, together with employees' lack of education in cartography. In this context, an expert system seems to be a proper choice to ensure that or‐ dinary users can take correct decisions in the map-making process.

Specifically in thematic map production, there are potential ways to ensure that correct choices will be suggested for users, and these range from long-term training to artificial in‐ telligence techniques. In this context, Schmidt & Delazari [1] developed an expert system to classify data, comparing text information on class names with a file that contains a word

classification, called the "system dictionary". Originally, this software was built to assist So‐ cial Assistance Department users, from Parana state in Brazil, in their activities of social data insertion and classification. Currently, this system has evolved to a web environment and is publicly available, which reinforces the need for automatic assistance in order to avoid mis‐ takes that could impair the data analysis and, consequently, the decision-making based on this analytical process.

laws recently approved at that time. For this reason, Delazari [2] started to work on an elec‐ tronic Atlas prototype, called Social Atlas. The objective of Social Atlas was to be a tool for carrying out spatial analysis and generating maps, by means of user interaction. This was based on the needs of this bureau, specifically under the context of LOAS (Portuguese acro‐

Semi-Automatic Semantic Data Classification Expert System to Produce Thematic Maps

http://dx.doi.org/10.5772/51848

225

The original system users were social scientists who had little or no knowledge of cartogra‐ phy or of any other methods to manipulate or represent spatial data. The proposed system must lead users by map generation tasks, avoiding any mistakes which might impair analy‐ sis. Since research data includes nominal and ordinal information collected for each county in Paraná, it is mandatory to implement functions that make it possible to choose between different options for data representation and also give users the possibility of using different

In the visual analysis of geographic information, acquiring knowledge is possible if graphic solutions defined for each map provide efficient visualization of the characteristics of geo‐ graphic phenomena. Graphic solutions may represent the spatial phenomenon behavior, and emphasize important characteristics for each analysis moment. As stated by Fairbarn [3], "maybe the most important change in mapping, in the past ten years, was the appear‐ ance of a user who is also a map producer". Regardless of the fact that producing map knowledge still seems to be a cartographer's responsibility, it is impossible to expect that ev‐

The main issue in this context is how to enable ordinary users to produce good quality maps which respect cartographic design principles. In other words, map software should offer a set of tools to guide the choice of all the steps in the map production process. There are two possible options: first, to use tutorials which guide users through the stages of map creation (Yufen [4]), or second, by means of expert systems, which automate basic decisions about

The choice of an expert system was influenced by factors related to facility of development, knowledge about software, diversity of functions to be implemented and availability of *soft‐ ware* resources. Among other minor differences between expert systems (ES) and conven‐ tional systems, it is possible to rely on the ability of an ES to simulate human reasoning, inference and judgment, and derive conclusions and heuristics based on a specific domain of knowledge. This means an ES is a computer software that operates with symbolic objects (symbols), and relationships between objects (Chee & Power [9]), while conventional soft‐ ware generate results through algorithms, which manipulate numbers and character. Ac‐ cording to Hemingway et. al. [10], the structure of an expert system has significant advantages over the traditional software, because once the information is correctly inserted into the knowledge base, this may be updated, modified and supplemented. In a general way, an expert system can be conceived as a four-module system that acts as an information manager. These modules include a user interface, a set of rules, a knowledge base and the

the mapping process (Wang & Ormeling; Artimo; Su; Zhan & Buttenfield [5-8]).

nym for Organic law for social assistance) (Schmidt & Delazari [1]).

ery map built will have a cartographer as part of its genesis.

data classification methods.

inference motor (Figure 1)

This chapter is divided into three main sections. The first addresses the motivations for crea‐ tion of the expert system, considering thematic maps generation and issues related to nonspecialist users and uses. Topics discussed in this section are: ways of providing user assistance in data classification and map symbolization according to map design rules; how users can achieve reasonable understanding of GIS and spatial data as trained users, for cor‐ rect use of geospatial tools; and, since there is no unique and permanent set of rules for the‐ matic map-making, what are the main aspects to consider when developing user assistance for building good maps.

The second section describes the expert system's theoretical proposition, regarding users and the data to be mapped, and examples of rules to establish and implement in order to achieve proper data classification results. The use of IF-THEN rules for this case study is a noteworthy project element, being initially defined as a stationary software code to support recognition of database entries. The algorithm proceeds with an evaluation of the level of measurement that best suits the data representation process. This data is then classified and stored in the knowledge base. When the total amount of classes are stored, rules indicate the most suitable color ramps, among those available, in order to match the data characteristics. To insert new information, the expert system automatically examines and tests the data type; numerical data are classified to a numerical level of measurement; nominal and ordi‐ nal data are classified according to the knowledge base and the system dictionary. When us‐ ing non-numerical data – semantic classification – the level of measurement choice is more complex due to its subjective nature and requires greater attention.

Lastly, the third part will enclose the project overview demonstrating the code development for the current web environment paradigm and what could be the potential new improved functionality of this system, developed to assist users in building social maps. The results are presented together with a discussion about general aspects of system architecture, inter‐ face design and the expert system itself, with a functional point of view, in particular, on how a system can guide users in such activities.

### **2. Background**

Originally, the objective of Social Atlas was to support the *Secretaria de Estado do Trabalho, Emprego e Promoção Social* (SETP). In Parana state, in Brazil, this bureau defines social assis‐ tance policies and their execution, besides acting as a government manager, defining the al‐ location of financial resources. SETP technicians, in 2000, needed to know how counties were organized in terms of Municipal Councils and Public Funds, in order to implant social laws recently approved at that time. For this reason, Delazari [2] started to work on an elec‐ tronic Atlas prototype, called Social Atlas. The objective of Social Atlas was to be a tool for carrying out spatial analysis and generating maps, by means of user interaction. This was based on the needs of this bureau, specifically under the context of LOAS (Portuguese acro‐ nym for Organic law for social assistance) (Schmidt & Delazari [1]).

classification, called the "system dictionary". Originally, this software was built to assist So‐ cial Assistance Department users, from Parana state in Brazil, in their activities of social data insertion and classification. Currently, this system has evolved to a web environment and is publicly available, which reinforces the need for automatic assistance in order to avoid mis‐ takes that could impair the data analysis and, consequently, the decision-making based on

This chapter is divided into three main sections. The first addresses the motivations for crea‐ tion of the expert system, considering thematic maps generation and issues related to nonspecialist users and uses. Topics discussed in this section are: ways of providing user assistance in data classification and map symbolization according to map design rules; how users can achieve reasonable understanding of GIS and spatial data as trained users, for cor‐ rect use of geospatial tools; and, since there is no unique and permanent set of rules for the‐ matic map-making, what are the main aspects to consider when developing user assistance

The second section describes the expert system's theoretical proposition, regarding users and the data to be mapped, and examples of rules to establish and implement in order to achieve proper data classification results. The use of IF-THEN rules for this case study is a noteworthy project element, being initially defined as a stationary software code to support recognition of database entries. The algorithm proceeds with an evaluation of the level of measurement that best suits the data representation process. This data is then classified and stored in the knowledge base. When the total amount of classes are stored, rules indicate the most suitable color ramps, among those available, in order to match the data characteristics. To insert new information, the expert system automatically examines and tests the data type; numerical data are classified to a numerical level of measurement; nominal and ordi‐ nal data are classified according to the knowledge base and the system dictionary. When us‐ ing non-numerical data – semantic classification – the level of measurement choice is more

Lastly, the third part will enclose the project overview demonstrating the code development for the current web environment paradigm and what could be the potential new improved functionality of this system, developed to assist users in building social maps. The results are presented together with a discussion about general aspects of system architecture, inter‐ face design and the expert system itself, with a functional point of view, in particular, on

Originally, the objective of Social Atlas was to support the *Secretaria de Estado do Trabalho, Emprego e Promoção Social* (SETP). In Parana state, in Brazil, this bureau defines social assis‐ tance policies and their execution, besides acting as a government manager, defining the al‐ location of financial resources. SETP technicians, in 2000, needed to know how counties were organized in terms of Municipal Councils and Public Funds, in order to implant social

complex due to its subjective nature and requires greater attention.

how a system can guide users in such activities.

**2. Background**

this analytical process.

224 Decision Support Systems

for building good maps.

The original system users were social scientists who had little or no knowledge of cartogra‐ phy or of any other methods to manipulate or represent spatial data. The proposed system must lead users by map generation tasks, avoiding any mistakes which might impair analy‐ sis. Since research data includes nominal and ordinal information collected for each county in Paraná, it is mandatory to implement functions that make it possible to choose between different options for data representation and also give users the possibility of using different data classification methods.

In the visual analysis of geographic information, acquiring knowledge is possible if graphic solutions defined for each map provide efficient visualization of the characteristics of geo‐ graphic phenomena. Graphic solutions may represent the spatial phenomenon behavior, and emphasize important characteristics for each analysis moment. As stated by Fairbarn [3], "maybe the most important change in mapping, in the past ten years, was the appear‐ ance of a user who is also a map producer". Regardless of the fact that producing map knowledge still seems to be a cartographer's responsibility, it is impossible to expect that ev‐ ery map built will have a cartographer as part of its genesis.

The main issue in this context is how to enable ordinary users to produce good quality maps which respect cartographic design principles. In other words, map software should offer a set of tools to guide the choice of all the steps in the map production process. There are two possible options: first, to use tutorials which guide users through the stages of map creation (Yufen [4]), or second, by means of expert systems, which automate basic decisions about the mapping process (Wang & Ormeling; Artimo; Su; Zhan & Buttenfield [5-8]).

The choice of an expert system was influenced by factors related to facility of development, knowledge about software, diversity of functions to be implemented and availability of *soft‐ ware* resources. Among other minor differences between expert systems (ES) and conven‐ tional systems, it is possible to rely on the ability of an ES to simulate human reasoning, inference and judgment, and derive conclusions and heuristics based on a specific domain of knowledge. This means an ES is a computer software that operates with symbolic objects (symbols), and relationships between objects (Chee & Power [9]), while conventional soft‐ ware generate results through algorithms, which manipulate numbers and character. Ac‐ cording to Hemingway et. al. [10], the structure of an expert system has significant advantages over the traditional software, because once the information is correctly inserted into the knowledge base, this may be updated, modified and supplemented. In a general way, an expert system can be conceived as a four-module system that acts as an information manager. These modules include a user interface, a set of rules, a knowledge base and the inference motor (Figure 1)

The set of facts describes the relevant characteristics of the phenomena, in the expert's point of view. Sometimes, even the expert does not realize all the features that he/she uses to make a decision. At this step, qualitative research tools such as questionnaires and inter‐ views help to identify nuances involved in the particular decision-making process by the human expert, and, from them, it is possible to select key facts in order to build the initial

Semi-Automatic Semantic Data Classification Expert System to Produce Thematic Maps

http://dx.doi.org/10.5772/51848

227

To define a set of rules, structures can be designed using the IF-THEN <condition> <action>,

<condition>, calls a conditional proposition. This condition provides a test the outcome of which depends on the current state of the knowledge basis. Typically, it is possible to test

<action> performs some action, defined in a rule, and may even change the current state of the knowledge base, adding, modifying or removing units which are present in the knowl‐

Using an IF-THEN structure will cause the system to examine the condition of all ES rules and determine a subset of rules whose conditions are satisfied by the analysis of the work‐ ing memory. The choice of the rule to be triggered is based on a strategy of conflict resolu‐ tion. When the rule is triggered, actions specified in the THEN clause are carried out. These actions can modify the working memory, the rules library, or another specification included by the system programmer. The loop of rules is then triggered and actions will continue un‐ til there are no more conditions to be met or there is an action that terminates the program

In cartography, the use of expert systems can have a wide field of applications. Understand‐ ing the basic concepts of data classification, level of measurement and visual variables in the process of map design is a major problem for casual GIS users. It is not unusual to find maps with continuous color ramps representing discrete data, map projection problems, complex symbols with no important information, and "noisy" visualization, facts that make interpre‐

According to Schmidt & Delazari [1] there is a need to develop research on how to assist users in designing maps with GIS tools. The scientific literature presents some study cases on software and specific use and user issues. For automatic visualization, other researchers (Casner; Roth et al.; Senay & Ignatus [14 - 16]) investigated how to eliminate the need to specify, design and develop different visualizations for GIS software outputs, allowing users to focus their attention on determining and describing the information to be represented. Other initiatives are CommonGIS (Fraunhofer [17]) and GeodaTM (Anselin et al[18]). Those systems focus on HCI – Human Computer Interaction - through an interactive training as‐ sistance and EDA (Exploratory Data Analysis) assistance tool. CommonGIS adapts the inter‐ face as users explore the tool and acquire knowledge about the system. The exploration is guided by an expert system giving users hints and options. GeodaTM emphasizes data min‐ ing and spatial statistics, and includes functionality ranging from simple mapping to explor‐

working memory.

the presence or absence of certain information;

tation almost impossible (Schmidt & Delazari[1]).

where:

edge basis.

flow.

**Figure 1.** Basic structure of an expert system; Source: Adapted from Mendes [11].

The inference engine is an essential element for an expert system, since it works as the en‐ gine control that evaluates and applies rules. In the process of problem-solving, these rules must be in accordance with the information existing in the working memory (Araki[12]). Ac‐ cording to Russell & Norvig [13], automated inference engines can be grouped into four cat‐ egories: theorem proofs and logic programming languages; production systems; "frame" systems or semantic networks; and descriptive logic systems. The inference engine uses for‐ ward chaining, a method which seeks to validate the assumptions in the rules and to com‐ plete the actions (consequences), not only as a logical conclusion. The intermediate results are validated as assumptions and deduced conclusions are stored in a working memory (Russell & Norvig [13]).

The rules library and the working memory form the so-called 'knowledge base', represent‐ ing the knowledge captured from a human expert on the problem domain. When an issue is submitted to the system evaluation, this rules library interacts with the user and the infer‐ ence engine, allowing identification of the problem, possible solutions for it and the whole process that leads to conclusions. Much of the effort to develop an expert system relies on the elicitation of knowledge, i.e., how to capture and use the human knowledge in a com‐ puter application.

Rule-based systems are feasible for problems in which the solution process can be written in the form of 'IF-THEN' rules and for which the problem has no easy solution. According to Araki [12], when a system based on rules is created, it is necessary to consider the following:


The set of facts describes the relevant characteristics of the phenomena, in the expert's point of view. Sometimes, even the expert does not realize all the features that he/she uses to make a decision. At this step, qualitative research tools such as questionnaires and inter‐ views help to identify nuances involved in the particular decision-making process by the human expert, and, from them, it is possible to select key facts in order to build the initial working memory.

To define a set of rules, structures can be designed using the IF-THEN <condition> <action>, where:

<condition>, calls a conditional proposition. This condition provides a test the outcome of which depends on the current state of the knowledge basis. Typically, it is possible to test the presence or absence of certain information;

<action> performs some action, defined in a rule, and may even change the current state of the knowledge base, adding, modifying or removing units which are present in the knowl‐ edge basis.

**Figure 1.** Basic structure of an expert system; Source: Adapted from Mendes [11].

(Russell & Norvig [13]).

226 Decision Support Systems

puter application.

mation related to the system's initial state;

that should be within the scope of the problem;

The inference engine is an essential element for an expert system, since it works as the en‐ gine control that evaluates and applies rules. In the process of problem-solving, these rules must be in accordance with the information existing in the working memory (Araki[12]). Ac‐ cording to Russell & Norvig [13], automated inference engines can be grouped into four cat‐ egories: theorem proofs and logic programming languages; production systems; "frame" systems or semantic networks; and descriptive logic systems. The inference engine uses for‐ ward chaining, a method which seeks to validate the assumptions in the rules and to com‐ plete the actions (consequences), not only as a logical conclusion. The intermediate results are validated as assumptions and deduced conclusions are stored in a working memory

The rules library and the working memory form the so-called 'knowledge base', represent‐ ing the knowledge captured from a human expert on the problem domain. When an issue is submitted to the system evaluation, this rules library interacts with the user and the infer‐ ence engine, allowing identification of the problem, possible solutions for it and the whole process that leads to conclusions. Much of the effort to develop an expert system relies on the elicitation of knowledge, i.e., how to capture and use the human knowledge in a com‐

Rule-based systems are feasible for problems in which the solution process can be written in the form of 'IF-THEN' rules and for which the problem has no easy solution. According to Araki [12], when a system based on rules is created, it is necessary to consider the following: **•** A set of facts to represent the memory of the initial work. This can be any relevant infor‐

**•** A set of rules, a library built to deal with the set of facts. This should include any action

**•** A condition stipulating that a solution was found or that no solution exists.

Using an IF-THEN structure will cause the system to examine the condition of all ES rules and determine a subset of rules whose conditions are satisfied by the analysis of the work‐ ing memory. The choice of the rule to be triggered is based on a strategy of conflict resolu‐ tion. When the rule is triggered, actions specified in the THEN clause are carried out. These actions can modify the working memory, the rules library, or another specification included by the system programmer. The loop of rules is then triggered and actions will continue un‐ til there are no more conditions to be met or there is an action that terminates the program flow.

In cartography, the use of expert systems can have a wide field of applications. Understand‐ ing the basic concepts of data classification, level of measurement and visual variables in the process of map design is a major problem for casual GIS users. It is not unusual to find maps with continuous color ramps representing discrete data, map projection problems, complex symbols with no important information, and "noisy" visualization, facts that make interpre‐ tation almost impossible (Schmidt & Delazari[1]).

According to Schmidt & Delazari [1] there is a need to develop research on how to assist users in designing maps with GIS tools. The scientific literature presents some study cases on software and specific use and user issues. For automatic visualization, other researchers (Casner; Roth et al.; Senay & Ignatus [14 - 16]) investigated how to eliminate the need to specify, design and develop different visualizations for GIS software outputs, allowing users to focus their attention on determining and describing the information to be represented. Other initiatives are CommonGIS (Fraunhofer [17]) and GeodaTM (Anselin et al[18]). Those systems focus on HCI – Human Computer Interaction - through an interactive training as‐ sistance and EDA (Exploratory Data Analysis) assistance tool. CommonGIS adapts the inter‐ face as users explore the tool and acquire knowledge about the system. The exploration is guided by an expert system giving users hints and options. GeodaTM emphasizes data min‐ ing and spatial statistics, and includes functionality ranging from simple mapping to explor‐ atory data analysis, using an interactive environment and combining maps with statistical graphics (Anselin et al. [18]).

ized into three themes, or major groups, which separate information in terms of its charac‐ teristics. There are 26 different types of information in these three themes. Each one has its own classification and number of classes defined by the original map design from Delazari [2] and its condition is related to defining the data's level of measurement and knowledge

Semi-Automatic Semantic Data Classification Expert System to Produce Thematic Maps

http://dx.doi.org/10.5772/51848

229

The rules library, unlike other expert systems, is embedded in the software, and the parame‐ ters of the rules are updated by use, and can be accessed by the user. The set of rules tries to identify which kind of data has been inserted. Through application of production rules, de‐ scribed as IF-THEN, the expert system tests the type of data to insert new information. Nu‐ merical data is classified by its numerical level. Social Atlas does not distinguish between the interval or ratio level of the measurement, because, in this case, the map design does not

Nominal and ordinal data can be stored in the same knowledge base but the rules to deal with them are quite different and their functioning is based on ordering elements from the knowledge base. Therefore, any feature that indicates order, associated with class names, has to be searched for in the knowledge base. When dealing with semantic classification, the choice between one or another level of measurement demands attention, because correct or‐ der is a subjective concept. Data indicating temporal or any kind of order can be considered as ordinal data and needs to be evaluated in detail. For example, for LOAS implementation it is important to know County Council's creation data. Classes are "first semester of 1995", "before 1995", "second semester of 1995", "after 1995" and "no available data". The expert system searches the whole class, e.g. first semester of 1995, for possible ordering of catego‐

If it is not possible to define them, category names are broken into a list of words, using a 'word-wrap' function. The position of each word in the sentence is stored as an index of words. Then the list of words is compared to the dictionary and the ES tries to classify the first word of each class, and then the second word of each class, and so on, trying to estab‐ lish a hierarchical relationship between class names. This dictionary keeps words in a more generic sense, giving local and global ordering based on the index order of the words that compound the class names. Inside the dictionary there are words and prepositions like "un‐ til", "before", "between", "among" and "after", and they also work as a specific working

The dictionary functions as a full resource for carrying out efficient classifications. In this context, if a previous classification has been deleted and a new one with a similar name is inserted, the system is able to estimate a possible order for the new classification. The stop‐ ping condition, in this case, is defined if an order is or is not associated with the class names, or words associated with them. In this way, when ordering is found, for all categories, the rule library will choose the required visual schemes (color ramps, in the case of choropleth maps) to represent geographic phenomena. Also, the knowledge base must assemble the

base rank.

consider it (Schmidt & Delazari [1]).

ries (Schmidt & Delazari [1]).

new classification.

memory, keeping all the words used for classification.

Yet according to Schmidt & Delazari [1], if casual users do not understand map design con‐ cepts properly, it is important to determine how to help them to classify attributes and sym‐ bolize maps according to the principles of cartography. At the same time, it seems to be essential for cartographers to ensure that these users will achieve a minimum level of under‐ standing of the correct use of GIS. If there are no map design principles for the digital envi‐ ronment, mainly because cartographers do not know exactly what can be adopted from traditional map design theory, what should be taken into account to make these principles feasible for ordinary users? In this context, the expert system application seems to be a plau‐ sible solution, regarding the characteristics discussed above.

#### **2.1. The Social Atlas expert system**

The expert system was built as an automated information manager placed between the data‐ base and the representation device, implemented with MapObjects (ESRI) (Figure 2). The ES controls the data flow in two situations: insertion of new data and carrying out SQL queries. When new information is added, the system goes to the knowledge base in order to try to find a similar configuration, concerning class names. If not all can be found, the rules library breaks it down into isolated words to try to find any kind of order in the dictionary. The same procedure occurs when an SQL is inserted in the system but, in this case, the informa‐ tion existing in the knowledge base is filtered by the rules and presented on a thematic map.

**Figure 2.** Expert system information flow

Initial memory, or initial facts, is the name given to information located in the database. This is collected by technicians of the SETP bureau for implementation of LOAS. Data is organ‐ ized into three themes, or major groups, which separate information in terms of its charac‐ teristics. There are 26 different types of information in these three themes. Each one has its own classification and number of classes defined by the original map design from Delazari [2] and its condition is related to defining the data's level of measurement and knowledge base rank.

atory data analysis, using an interactive environment and combining maps with statistical

Yet according to Schmidt & Delazari [1], if casual users do not understand map design con‐ cepts properly, it is important to determine how to help them to classify attributes and sym‐ bolize maps according to the principles of cartography. At the same time, it seems to be essential for cartographers to ensure that these users will achieve a minimum level of under‐ standing of the correct use of GIS. If there are no map design principles for the digital envi‐ ronment, mainly because cartographers do not know exactly what can be adopted from traditional map design theory, what should be taken into account to make these principles feasible for ordinary users? In this context, the expert system application seems to be a plau‐

The expert system was built as an automated information manager placed between the data‐ base and the representation device, implemented with MapObjects (ESRI) (Figure 2). The ES controls the data flow in two situations: insertion of new data and carrying out SQL queries. When new information is added, the system goes to the knowledge base in order to try to find a similar configuration, concerning class names. If not all can be found, the rules library breaks it down into isolated words to try to find any kind of order in the dictionary. The same procedure occurs when an SQL is inserted in the system but, in this case, the informa‐ tion existing in the knowledge base is filtered by the rules and presented on a thematic map.

Initial memory, or initial facts, is the name given to information located in the database. This is collected by technicians of the SETP bureau for implementation of LOAS. Data is organ‐

sible solution, regarding the characteristics discussed above.

graphics (Anselin et al. [18]).

228 Decision Support Systems

**2.1. The Social Atlas expert system**

**Figure 2.** Expert system information flow

The rules library, unlike other expert systems, is embedded in the software, and the parame‐ ters of the rules are updated by use, and can be accessed by the user. The set of rules tries to identify which kind of data has been inserted. Through application of production rules, de‐ scribed as IF-THEN, the expert system tests the type of data to insert new information. Nu‐ merical data is classified by its numerical level. Social Atlas does not distinguish between the interval or ratio level of the measurement, because, in this case, the map design does not consider it (Schmidt & Delazari [1]).

Nominal and ordinal data can be stored in the same knowledge base but the rules to deal with them are quite different and their functioning is based on ordering elements from the knowledge base. Therefore, any feature that indicates order, associated with class names, has to be searched for in the knowledge base. When dealing with semantic classification, the choice between one or another level of measurement demands attention, because correct or‐ der is a subjective concept. Data indicating temporal or any kind of order can be considered as ordinal data and needs to be evaluated in detail. For example, for LOAS implementation it is important to know County Council's creation data. Classes are "first semester of 1995", "before 1995", "second semester of 1995", "after 1995" and "no available data". The expert system searches the whole class, e.g. first semester of 1995, for possible ordering of catego‐ ries (Schmidt & Delazari [1]).

If it is not possible to define them, category names are broken into a list of words, using a 'word-wrap' function. The position of each word in the sentence is stored as an index of words. Then the list of words is compared to the dictionary and the ES tries to classify the first word of each class, and then the second word of each class, and so on, trying to estab‐ lish a hierarchical relationship between class names. This dictionary keeps words in a more generic sense, giving local and global ordering based on the index order of the words that compound the class names. Inside the dictionary there are words and prepositions like "un‐ til", "before", "between", "among" and "after", and they also work as a specific working memory, keeping all the words used for classification.

The dictionary functions as a full resource for carrying out efficient classifications. In this context, if a previous classification has been deleted and a new one with a similar name is inserted, the system is able to estimate a possible order for the new classification. The stop‐ ping condition, in this case, is defined if an order is or is not associated with the class names, or words associated with them. In this way, when ordering is found, for all categories, the rule library will choose the required visual schemes (color ramps, in the case of choropleth maps) to represent geographic phenomena. Also, the knowledge base must assemble the new classification.

However, as the system can be relatively weak in the early stages of professional use due to the uncertainty of the initial memory and dictionary, some cases of failure or partial success may occur. In these cases, the system asks the user if the order is correct. Then the system stores the user classification, feeds the dictionary and carries out the map symbolization. As long as specialists keep supplying the knowledge base with more information, the vocabu‐ lary becomes more extensive and the system can deal with complex situations. Also, user confirmation becomes unnecessary. Thus, the system becomes a more powerful tool, espe‐ cially when experienced technicians build the knowledge base and dictionary, and distrib‐ ute them along with the Social Atlas.

cause of common server specifications and widely known browser architecture (which in‐ cludes client-side features that are constantly evolving), there are some critical issues in developing map applications for the web. Thus, if a cartographer wants to help any user on the world-wide web to make good maps by means of developing an automatic or ex‐ pert system for it, there is a need to first understand and analyze how web architecture can be handled. This section presents a potential way to figure out this issue, by means of presenting the "*how it works*" on the current "Atlas Social do Paraná" (Delazari [2]) ver‐

Semi-Automatic Semantic Data Classification Expert System to Produce Thematic Maps

process was carried out exclusively by cartographers and not by system analysts, perhaps the proposed solution is not as elegant as it could be, but discussions raised by it can be useful for interactive map designers and are currently defined as the main focus of this

Describing how an automatic system is developed can serve as the starting point for many related projects. The case study presented here is on the adoption of an automatic system inside the already existing "Atlas Social do Paraná". As previously described, the Atlas com‐ prises a huge amount of social data for the last two decades, which makes it a powerful tool, not only for government planners or public administrators, but also for ordinary citizens, all of them possibly using the atlas to gain a full understanding of social perspectives of Paraná

Since the Atlas was developed initially as an offline product, the web version has to manage

*.* Since the development

http://dx.doi.org/10.5772/51848

231

sion, presented on http://www.cartografia.ufpr.br/atlas/english1

**Figure 3.** Original expert system interface; Source: Schmidt & Delazari [1]

1 All php codes are available for download in the same page.

implementation.

state in Brazil.

the following issues:

As a last point about the ES, there is a special level of access to allow users to edit the knowl‐ edge base and database in general. Different modifications can be made to the database and this will modify the final representation aspect, and also change the knowledge base or even the dictionary. The first is a common task performed in order to update available data in the database. In this case, any information deleted or inserted will pass through the Expert Sys‐ tem. This step is necessary to update the knowledge base and the dictionary, and to keep the ES and the database synchronized.

The interaction with the ES occurs inside the Social Atlas interface (Figure 3). The dialog box is accessed from the Edition of Social Atlas menu. All other steps of the expert system run under this dialog box and users do not come into direct contact with the data or its classifi‐ cation. Themes are shown in the Themes Dialog box. Data, i.e., column name, is supplied by the Class Dialog box on the left. In this dialog box the class names are supplied and appear at the right hand side. Clicking on the 'confirm' button makes the system carry on with clas‐ sification, as described previously. In the event of failure, the right hand side buttons are en‐ abled and a message box pops up asking the user for intervention. Users sometimes require additional information storage as text. This action can be done in the 'Additional Informa‐ tion' field at the bottom of the dialog box.

### **3. New interface design and code development**

For cartographers and professional mapmakers, it seems to be hard to think about code development and its relationship to the map production process itself. There are several varieties of GIS software which can help to store and process spatial data to produce high quality geographic representations. However, it cannot be denied that web environments are changing the way map use and users are understood and considered in cartographic activities. Since the web is the lair of interactivity, one mandatory issue is about the way in which casual and ordinary users rely on geographic data to produce thematic maps. Also, there is a major issue about how cartographers can act in this environment to en‐ sure that these users are able to rely on these self-produced maps to take decisions and to analyze geographic phenomena efficiently.

Web applications, just like offline software, are dependent on programming languages and, besides the fact that websites are usually easier to design and to get working, be‐ cause of common server specifications and widely known browser architecture (which in‐ cludes client-side features that are constantly evolving), there are some critical issues in developing map applications for the web. Thus, if a cartographer wants to help any user on the world-wide web to make good maps by means of developing an automatic or ex‐ pert system for it, there is a need to first understand and analyze how web architecture can be handled. This section presents a potential way to figure out this issue, by means of presenting the "*how it works*" on the current "Atlas Social do Paraná" (Delazari [2]) ver‐ sion, presented on http://www.cartografia.ufpr.br/atlas/english1 *.* Since the development process was carried out exclusively by cartographers and not by system analysts, perhaps the proposed solution is not as elegant as it could be, but discussions raised by it can be useful for interactive map designers and are currently defined as the main focus of this implementation.

**Figure 3.** Original expert system interface; Source: Schmidt & Delazari [1]

However, as the system can be relatively weak in the early stages of professional use due to the uncertainty of the initial memory and dictionary, some cases of failure or partial success may occur. In these cases, the system asks the user if the order is correct. Then the system stores the user classification, feeds the dictionary and carries out the map symbolization. As long as specialists keep supplying the knowledge base with more information, the vocabu‐ lary becomes more extensive and the system can deal with complex situations. Also, user confirmation becomes unnecessary. Thus, the system becomes a more powerful tool, espe‐ cially when experienced technicians build the knowledge base and dictionary, and distrib‐

As a last point about the ES, there is a special level of access to allow users to edit the knowl‐ edge base and database in general. Different modifications can be made to the database and this will modify the final representation aspect, and also change the knowledge base or even the dictionary. The first is a common task performed in order to update available data in the database. In this case, any information deleted or inserted will pass through the Expert Sys‐ tem. This step is necessary to update the knowledge base and the dictionary, and to keep the

The interaction with the ES occurs inside the Social Atlas interface (Figure 3). The dialog box is accessed from the Edition of Social Atlas menu. All other steps of the expert system run under this dialog box and users do not come into direct contact with the data or its classifi‐ cation. Themes are shown in the Themes Dialog box. Data, i.e., column name, is supplied by the Class Dialog box on the left. In this dialog box the class names are supplied and appear at the right hand side. Clicking on the 'confirm' button makes the system carry on with clas‐ sification, as described previously. In the event of failure, the right hand side buttons are en‐ abled and a message box pops up asking the user for intervention. Users sometimes require additional information storage as text. This action can be done in the 'Additional Informa‐

For cartographers and professional mapmakers, it seems to be hard to think about code development and its relationship to the map production process itself. There are several varieties of GIS software which can help to store and process spatial data to produce high quality geographic representations. However, it cannot be denied that web environments are changing the way map use and users are understood and considered in cartographic activities. Since the web is the lair of interactivity, one mandatory issue is about the way in which casual and ordinary users rely on geographic data to produce thematic maps. Also, there is a major issue about how cartographers can act in this environment to en‐ sure that these users are able to rely on these self-produced maps to take decisions and to

Web applications, just like offline software, are dependent on programming languages and, besides the fact that websites are usually easier to design and to get working, be‐

ute them along with the Social Atlas.

230 Decision Support Systems

ES and the database synchronized.

tion' field at the bottom of the dialog box.

analyze geographic phenomena efficiently.

**3. New interface design and code development**

Describing how an automatic system is developed can serve as the starting point for many related projects. The case study presented here is on the adoption of an automatic system inside the already existing "Atlas Social do Paraná". As previously described, the Atlas com‐ prises a huge amount of social data for the last two decades, which makes it a powerful tool, not only for government planners or public administrators, but also for ordinary citizens, all of them possibly using the atlas to gain a full understanding of social perspectives of Paraná state in Brazil.

Since the Atlas was developed initially as an offline product, the web version has to manage the following issues:

<sup>1</sup> All php codes are available for download in the same page.

**a.** How to make the Atlas database easy to update and query by data producers who are not experts in either cartography or informatics;

so, the system interface needed to be redrawn in order to consider the browsers' actual style of navigation and the limitation of using only one window and less than 90% of the display

Semi-Automatic Semantic Data Classification Expert System to Produce Thematic Maps

http://dx.doi.org/10.5772/51848

233

Many of the answers to the issues presented were discussed during the process of imple‐ menting the Atlas. However, the interface design and website functionality were the first set of decisions to be taken, and guided further design on the database, server architecture and web services, used to make spatial data representations available on the internet. The list of interface functionalities can be divided into two steps: the system functionality itself and the webmap interface (Table 1). The initial effort on this new version was designed to build only choropleth maps, but its structure would deal with other mapping techniques, on demand.

The first step in designing the current database consisted of defining the former entities and relationships, implemented by DBMS PostgreSQL. The existing data structure – considering spatial data and attributes data to be represented in maps - was first considered from the point of view of common GIS software architecture: producing feature data by surveying, transforming it on spatial files and joining relational tables with area units to produce map symbology on specific themes. Problems arise when there are changes to spatial data, such as when a new municipality or new administration area is created, or when data producers need to load new data or to rectify existing data, since using a form specially built for updat‐

Thus, the next step was to define the database structure for spatial data. The new version of the atlas comprises the introduction of a spatial database paradigm, with spatial data organ‐ ized as tables with associated geometry information and foreign keys, in order to be related to any thematic data to be added to the database. The use of the *Postgis* spatial extension, *PostgreSQL* support for spatial data, was mandatory in the design of the database structure. Spatial data were built separately from attribute data and divided into types of area units. Since official data from the Atlas has to be from Paraná state only, there are only two spatial subsets: one for municipalities and associated data, and the other for census sectors, official area units from the Brazilian official census. One advantage of the use of this kind of data‐ base organization is that associated data like regions, micro-regions and macro-regions can be used as area units (Figure 4), using dissolve operations by means of SQL queries2, and using names as attribute fields on joining attributes with spatial area units. By means of us‐ ing the same Brazilian official geocode for census sectors and for municipalities, it is possi‐ ble to guarantee that joining the attribute information will be an effective choice for updating the database and maintaining consistency between spatial and attribute data.

The system architecture's first criterion was the use of open source and free software where possible. It was also decided that the spatial database must be available remotely, together with the instance for the spatial data server and the web server itself, in the same physical server. Based on this, the architecture (Figure 5) was defined with an *Apache* and *Tomcat*

ing data is required, apart from performance and software compiling issues.

area for map application in general.

**3.1. Database and server structure**

2 Select ST\_UNION(the\_geom) … group by 'mesoregion'



**Table 1.** Interface functionality - 'U' indicates its presence on User version and 'S' indicates its presence on Specialist version

The expert system described in the last section also had to be redesigned in order to deal with new database organization and its interface was rebuilt to consider the use of two groups of web users: specialists and general users. In the first group are the users who will build the expert database by choosing variables to construct their maps and evaluate them. The second group is those who will take advantage of this database to build their own maps, whether using data from the Atlas or using any other spatial data with attributes. Al‐ so, the system interface needed to be redrawn in order to consider the browsers' actual style of navigation and the limitation of using only one window and less than 90% of the display area for map application in general.

Many of the answers to the issues presented were discussed during the process of imple‐ menting the Atlas. However, the interface design and website functionality were the first set of decisions to be taken, and guided further design on the database, server architecture and web services, used to make spatial data representations available on the internet. The list of interface functionalities can be divided into two steps: the system functionality itself and the webmap interface (Table 1). The initial effort on this new version was designed to build only choropleth maps, but its structure would deal with other mapping techniques, on demand.

#### **3.1. Database and server structure**

**a.** How to make the Atlas database easy to update and query by data producers who are

**b.** How to make the Atlas usable in the web, combining cartographic aspects and interface

**c.** How to make the Atlas structure serviceable to provide the user with the ability to pro‐ duce maps on demand, using his own data and preserving the representation quality.

> Choosing area units for mapping (U,S) Choosing level of measurement (S)

Choosing number of classes (U,S)

Choosing color ramp (U,S) Storing user's choices (S)

Printing maps module (U, S) Searching location by text (U,S)

Finding address (U,S) Where am I (U,S)

Querying by click (U,S) Legend support (U,S)

Scale (U,S)

Choosing method for data classification (S)

Displaying thematic data table with statistics (U,S)

Uploading own data to build maps (U,S)

Next and back zoom buttons (U,S)

Latitude and longitude location (U,S) Measuring areas and distances (U,S)

**Table 1.** Interface functionality - 'U' indicates its presence on User version and 'S' indicates its presence on Specialist

The expert system described in the last section also had to be redesigned in order to deal with new database organization and its interface was rebuilt to consider the use of two groups of web users: specialists and general users. In the first group are the users who will build the expert database by choosing variables to construct their maps and evaluate them. The second group is those who will take advantage of this database to build their own maps, whether using data from the Atlas or using any other spatial data with attributes. Al‐

not experts in either cartography or informatics;

232 Decision Support Systems

design in order to best suit the audience needs;

**System functionality** Choosing thematic data (U,S)

**Webmap interface functionality** Zooming and panning (U,S)

version

The first step in designing the current database consisted of defining the former entities and relationships, implemented by DBMS PostgreSQL. The existing data structure – considering spatial data and attributes data to be represented in maps - was first considered from the point of view of common GIS software architecture: producing feature data by surveying, transforming it on spatial files and joining relational tables with area units to produce map symbology on specific themes. Problems arise when there are changes to spatial data, such as when a new municipality or new administration area is created, or when data producers need to load new data or to rectify existing data, since using a form specially built for updat‐ ing data is required, apart from performance and software compiling issues.

Thus, the next step was to define the database structure for spatial data. The new version of the atlas comprises the introduction of a spatial database paradigm, with spatial data organ‐ ized as tables with associated geometry information and foreign keys, in order to be related to any thematic data to be added to the database. The use of the *Postgis* spatial extension, *PostgreSQL* support for spatial data, was mandatory in the design of the database structure. Spatial data were built separately from attribute data and divided into types of area units. Since official data from the Atlas has to be from Paraná state only, there are only two spatial subsets: one for municipalities and associated data, and the other for census sectors, official area units from the Brazilian official census. One advantage of the use of this kind of data‐ base organization is that associated data like regions, micro-regions and macro-regions can be used as area units (Figure 4), using dissolve operations by means of SQL queries2, and using names as attribute fields on joining attributes with spatial area units. By means of us‐ ing the same Brazilian official geocode for census sectors and for municipalities, it is possi‐ ble to guarantee that joining the attribute information will be an effective choice for updating the database and maintaining consistency between spatial and attribute data.

The system architecture's first criterion was the use of open source and free software where possible. It was also decided that the spatial database must be available remotely, together with the instance for the spatial data server and the web server itself, in the same physical server. Based on this, the architecture (Figure 5) was defined with an *Apache* and *Tomcat*

<sup>2</sup> Select ST\_UNION(the\_geom) … group by 'mesoregion'

webserver, along with instances of *Geoserver* as the mapserver, *PostgreSQL* as the DBMS, PHP to process server-side data, to produce XML (SLD) symbology and to query database and Javascript libraries (*jquery*, *Ext* and *GeoExt*, *Openlayers*) to deal with client-side function‐ ality, like div display, map zoom and map legends. Last, *Curl* was used to establish the com‐ munication between PHP and *Geoserver* Rest API, which facilitates the transaction between web servers, making possible to perform map server administration tasks remotely.

specs compliant, capable of putting data on the internet throughout web services. The chos‐ en one was the *Geoserver* mapserver, a *Java*-based tool to publish spatial data, which was in‐ stalled using a war file, and configured to optimize performance in a production environment [19]. Since there is no need for vector analysis or satellite imagery, the choice was to publish data using WMS requisitions in the map server. The use of this web service displays a set of image tiles to frame the spatial data and its symbology in the user's client. Also, it uses a simple set of *xml* rules to construct symbols that can be used to get on-de‐

Semi-Automatic Semantic Data Classification Expert System to Produce Thematic Maps

http://dx.doi.org/10.5772/51848

235

The SLD (Styled Layer Descriptor) is an XML-based file format used to transmit symbols, according to OGC specification on WMS symbology. Traditionally, this file contains param‐ eters which are used together with features stored in a geospatial server, comprising sym‐ bolization information, such as color to be used in a point symbol or the thickness value for lines. These parameters are applied to one or more layers stored in the map server and dis‐ played in a static environment, i.e., once generated there is no possibility to change these pa‐ rameters except by generating a new SLD file for each different map. The SLD construction is often accomplished through direct insertion of the algorithm on the server, usually by means of writing it from a GIS software. The concept of on-demand map symbology is pre‐ sented here as an algorithm which already possesses variables to build the SLD file to be ap‐ plied to a map according to users' input, being the file stored in a server folder and read by

The file structure for SLD specification is based on rules. These mold the set of styles which sym‐ bolize a feature. In choropleth maps, for example, each rule makes it possible to generate a dif‐ ferent color value applied to an area feature, resulting from a data classification algorithm. To apply color values to spatial data features, the system's database provides a color table (Figure 6) based on what is suggested by the "Colorbrewer" software (ColorBrewer [20]). Several possi‐ ble color ramps are stored in the database and their names are then accessed by PHP code, as a list of different options for the user. The fields for the colors table are also designed to make the

mand data classification and symbols choice.

**3.2. On-demand map symbology**

WMS requisition.

*3.2.1. SLD file structure*

**Figure 5.** Piece of PHP code showing the connection and simple query to the database


**Figure 4.** Example of database table's relationship for spatial features and attributes

The PHP language is used to process server-side data, being necessary to the expert system code, since it deals with the database access, together with the creation of rules to create symbols based on users' input. The PHP code is part of the web page code itself, since a ".php" file can also handle *html* and *javascript* code and is a well-documented and easy-touse language, which offsets its lack of consistency and predictability. PHP connects the data‐ base using the pgsql extension, and a piece of code (Figure 6) can connect the database, set the default encoding to prevent from incorrectly displaying accents and make a simple query to return the average for some column, being the resultant row stored in a vector PHP array. For security reasons, it seems important to maintain the connection information (host, name of database, user and password) in a separate file, which must be included3 in the main PHP file.

Another server-side task that must be taken into account involves publishing spatial data on the web. To accomplish this, the server must be an OGC (Open Geospatial Consortium)

<sup>3</sup> Include names\_and\_passwords\_file.php; //line included in the beginning of PHP code

specs compliant, capable of putting data on the internet throughout web services. The chos‐ en one was the *Geoserver* mapserver, a *Java*-based tool to publish spatial data, which was in‐ stalled using a war file, and configured to optimize performance in a production environment [19]. Since there is no need for vector analysis or satellite imagery, the choice was to publish data using WMS requisitions in the map server. The use of this web service displays a set of image tiles to frame the spatial data and its symbology in the user's client. Also, it uses a simple set of *xml* rules to construct symbols that can be used to get on-de‐ mand data classification and symbols choice.

**Figure 5.** Piece of PHP code showing the connection and simple query to the database

#### **3.2. On-demand map symbology**

webserver, along with instances of *Geoserver* as the mapserver, *PostgreSQL* as the DBMS, PHP to process server-side data, to produce XML (SLD) symbology and to query database and Javascript libraries (*jquery*, *Ext* and *GeoExt*, *Openlayers*) to deal with client-side function‐ ality, like div display, map zoom and map legends. Last, *Curl* was used to establish the com‐ munication between PHP and *Geoserver* Rest API, which facilitates the transaction between

web servers, making possible to perform map server administration tasks remotely.

**Figure 4.** Example of database table's relationship for spatial features and attributes

3 Include names\_and\_passwords\_file.php; //line included in the beginning of PHP code

main PHP file.

234 Decision Support Systems

The PHP language is used to process server-side data, being necessary to the expert system code, since it deals with the database access, together with the creation of rules to create symbols based on users' input. The PHP code is part of the web page code itself, since a ".php" file can also handle *html* and *javascript* code and is a well-documented and easy-touse language, which offsets its lack of consistency and predictability. PHP connects the data‐ base using the pgsql extension, and a piece of code (Figure 6) can connect the database, set the default encoding to prevent from incorrectly displaying accents and make a simple query to return the average for some column, being the resultant row stored in a vector PHP array. For security reasons, it seems important to maintain the connection information (host, name of database, user and password) in a separate file, which must be included3

Another server-side task that must be taken into account involves publishing spatial data on the web. To accomplish this, the server must be an OGC (Open Geospatial Consortium) The SLD (Styled Layer Descriptor) is an XML-based file format used to transmit symbols, according to OGC specification on WMS symbology. Traditionally, this file contains param‐ eters which are used together with features stored in a geospatial server, comprising sym‐ bolization information, such as color to be used in a point symbol or the thickness value for lines. These parameters are applied to one or more layers stored in the map server and dis‐ played in a static environment, i.e., once generated there is no possibility to change these pa‐ rameters except by generating a new SLD file for each different map. The SLD construction is often accomplished through direct insertion of the algorithm on the server, usually by means of writing it from a GIS software. The concept of on-demand map symbology is pre‐ sented here as an algorithm which already possesses variables to build the SLD file to be ap‐ plied to a map according to users' input, being the file stored in a server folder and read by WMS requisition.

#### *3.2.1. SLD file structure*

in the

The file structure for SLD specification is based on rules. These mold the set of styles which sym‐ bolize a feature. In choropleth maps, for example, each rule makes it possible to generate a dif‐ ferent color value applied to an area feature, resulting from a data classification algorithm. To apply color values to spatial data features, the system's database provides a color table (Figure 6) based on what is suggested by the "Colorbrewer" software (ColorBrewer [20]). Several possi‐ ble color ramps are stored in the database and their names are then accessed by PHP code, as a list of different options for the user. The fields for the colors table are also designed to make the process of data classification easier. For each class, there is a set of 'x' different *html* colors, 'x' be‐ ing the number of classes. The field "num\_color", which varies together with the "html\_color" value, since the color changes as the number of class elements change, is defined from '1' – to be applied to features contained in first class – to 'x'.

use (Nivala [21]), since these interfaces deal with issues related both to computational inter‐ faces and map users and use. Besides that, there are new technologies which allow data or‐ ganization to be done dynamically, in a way that reduces the amount of decisions and interactions by users, when generating a cartographic representation. This can make inter‐ face use easier and also raise performance for functional approaches (de Mendonça & Dela‐

Semi-Automatic Semantic Data Classification Expert System to Produce Thematic Maps

http://dx.doi.org/10.5772/51848

237

**Figure 7.** Example for PHP code on Equal Interval Data Classification method

The interface for specialists and atlas' users was planned to be clean and to manage only es‐ sential functionality, but since there was a desire to implement an expert system, with a knowledge base fed by specialists, the decision was to make slight changes on functionality, in order to avail specialist knowledge. Thus, specialist users can identify themselves with a

zari [22]).

To build the map and construct the SLD file, users have to choose the number of classes, the level of measurement and, if the level of measurement is numeric, the data classification method to be used. These variables are merged to *xml* declarations along the PHP file with a string concatenation operator, and are used in the process of data classification (Figure 7), which defines the final SLD file, written in the server and made a default for the WMS layer, by means of setting the parameter *<IsDefault>1</IsDefault>.* Thus, the map server just serves the WMS layer. The symbology is prepared on the server by PHP code and is accessed di‐ rectly by the *openlayers* map library to build the map in the client side. It is overwritten auto‐ matically when the server executes a new classification method. The iterative process is based on users' choice built in PHP variables, accessing database views. The database can be fed to create as many classes or symbol properties as necessary for users' data needs.


**Figure 6.** Part of color table in the database

#### **3.3. Functionality and interface design**

User interaction with a computer system always occurs through the use of an interface. There are several issues about map-related systems and their interface, especially for web use (Nivala [21]), since these interfaces deal with issues related both to computational inter‐ faces and map users and use. Besides that, there are new technologies which allow data or‐ ganization to be done dynamically, in a way that reduces the amount of decisions and interactions by users, when generating a cartographic representation. This can make inter‐ face use easier and also raise performance for functional approaches (de Mendonça & Dela‐ zari [22]).

process of data classification easier. For each class, there is a set of 'x' different *html* colors, 'x' be‐ ing the number of classes. The field "num\_color", which varies together with the "html\_color" value, since the color changes as the number of class elements change, is defined from '1' – to be

To build the map and construct the SLD file, users have to choose the number of classes, the level of measurement and, if the level of measurement is numeric, the data classification method to be used. These variables are merged to *xml* declarations along the PHP file with a string concatenation operator, and are used in the process of data classification (Figure 7), which defines the final SLD file, written in the server and made a default for the WMS layer, by means of setting the parameter *<IsDefault>1</IsDefault>.* Thus, the map server just serves the WMS layer. The symbology is prepared on the server by PHP code and is accessed di‐ rectly by the *openlayers* map library to build the map in the client side. It is overwritten auto‐ matically when the server executes a new classification method. The iterative process is based on users' choice built in PHP variables, accessing database views. The database can be

fed to create as many classes or symbol properties as necessary for users' data needs.

User interaction with a computer system always occurs through the use of an interface. There are several issues about map-related systems and their interface, especially for web

applied to features contained in first class – to 'x'.

236 Decision Support Systems

**Figure 6.** Part of color table in the database

**3.3. Functionality and interface design**

**Figure 7.** Example for PHP code on Equal Interval Data Classification method

The interface for specialists and atlas' users was planned to be clean and to manage only es‐ sential functionality, but since there was a desire to implement an expert system, with a knowledge base fed by specialists, the decision was to make slight changes on functionality, in order to avail specialist knowledge. Thus, specialist users can identify themselves with a login role, in order to contribute to the knowledge database on data classification and sym‐ bolization decisions. Users, then, can take advantage of specialists' choices on similar data to make their own decisions. The interface functionalities for this two group of distinct users can be divided by map-interface functions and web interface itself, accessed by login roles: "specialist" and "user".

load form action page. The algorithm to allow user data upload (Figure 11) considers that user's data will be available for 4 hours in the database, when a trigger is activated to clean all new inserted data. After displaying user's data, the system now will use this as default

Semi-Automatic Semantic Data Classification Expert System to Produce Thematic Maps

http://dx.doi.org/10.5772/51848

239

table, to be used for every subsequent query.

**Figure 9.** (a - left): Expected interface use flow; (b - right): Expected flow for specialists

The expert system comprises a PHP set of pages which access the database in order to compare specialists' decisions with measurements about the data itself. There is a table in the database

**3.5. Expert system functionality**

Inspired by the 'openstreetmap' project interface, the main interface for "Atlas Social do Par‐ aná" (Figure 8) incorporated on its server a javascript map library – *openlayers* – integrated to a window and *css* management *javascript* library – Jquery. Both of them make possible to call predetermined functions to display maps, their symbology and hide or show functional windows on the interface, in which users can input data or make choices about the map pro‐ duction process in terms of forms. Making forms (only one allowed per tab) and the table associated with the chosen theme, available on the same page as the map is then considered a major decision in interface design. Based on this, a small and simple flow that expresses the common use for the website was devised to be followed by both casual users (Figure 9a) and specialists (Figure 9b).

**Figure 8.** General Initial Atlas interface

#### **3.4. Storing user's own data**

One of the main functions of the new Atlas is the ability of reading and storing user's data in the database. Every Atlas page has the option of upload user's spatial data, by using a *html div*, powered by *jquery* (Figure 10). In order to make this possible, it's important to configure the web server to accept uploaded data. In the current Atlas architecture, it is also important to ensure that the PHP installation is able to save temporary uploaded files in a suitable server folder and to allow *Curl* to call *Geoserver Rest API* in order to write the new layer in the map server. All upload and store steps are called by PHP files, included in the main up‐ load form action page. The algorithm to allow user data upload (Figure 11) considers that user's data will be available for 4 hours in the database, when a trigger is activated to clean all new inserted data. After displaying user's data, the system now will use this as default table, to be used for every subsequent query.

**Figure 9.** (a - left): Expected interface use flow; (b - right): Expected flow for specialists

#### **3.5. Expert system functionality**

login role, in order to contribute to the knowledge database on data classification and sym‐ bolization decisions. Users, then, can take advantage of specialists' choices on similar data to make their own decisions. The interface functionalities for this two group of distinct users can be divided by map-interface functions and web interface itself, accessed by login roles:

Inspired by the 'openstreetmap' project interface, the main interface for "Atlas Social do Par‐ aná" (Figure 8) incorporated on its server a javascript map library – *openlayers* – integrated to a window and *css* management *javascript* library – Jquery. Both of them make possible to call predetermined functions to display maps, their symbology and hide or show functional windows on the interface, in which users can input data or make choices about the map pro‐ duction process in terms of forms. Making forms (only one allowed per tab) and the table associated with the chosen theme, available on the same page as the map is then considered a major decision in interface design. Based on this, a small and simple flow that expresses the common use for the website was devised to be followed by both casual users (Figure 9a)

One of the main functions of the new Atlas is the ability of reading and storing user's data in the database. Every Atlas page has the option of upload user's spatial data, by using a *html div*, powered by *jquery* (Figure 10). In order to make this possible, it's important to configure the web server to accept uploaded data. In the current Atlas architecture, it is also important to ensure that the PHP installation is able to save temporary uploaded files in a suitable server folder and to allow *Curl* to call *Geoserver Rest API* in order to write the new layer in the map server. All upload and store steps are called by PHP files, included in the main up‐

"specialist" and "user".

238 Decision Support Systems

and specialists (Figure 9b).

**Figure 8.** General Initial Atlas interface

**3.4. Storing user's own data**

The expert system comprises a PHP set of pages which access the database in order to compare specialists' decisions with measurements about the data itself. There is a table in the database that stores every level of measurement decision, data classification method, number of classes and color ramp made by a specialist logged into the website. According to the comparison be‐ tween what is chosen by them and the data characteristics, it is possible for the system to learn the occurrence of patterns. To learn, in this case, is to store these patterns in the consolidated ta‐ ble of specialists' decisions and using this knowledge base to ordinary users. So, it will often oc‐ cur when common users upload data that is similar to the ones used by specialists' before.


**Figure 10.** Detail on the form used to upload *shapefiles*

In order to make this knowledge functional, some metrics are defined for any kind of data up‐ loaded by a user or the Atlas original data. These metrics can be divided among the four possi‐ ble decisions made by a specialist, plus the final evaluation of the produced map. At least three different specialists must take the same decision, based on relevant data characteristics match‐ ing exactly, and transmit the most positive feedback allowed by the system for the built map, in order for a decision to be stored into a consolidated decisions table. At first experimental build, the expert system can learn behavior related to two decisions, as follows.

**Figure 11.** Schema for the algorithm that provide upload and store user's spatial data

base for the ordinal text group.

**•** All other cases must be considered as nominal data.

**•** Numerical data are always composed of numeric, integer, float or similar formats;

should be asked to convert text into numbers, in the user's own database;

**•** Nominal and ordinal data are always text, character varying or other similar formats;

**•** Numerical data can be stored as text or in a similar format, since when the data is explod‐ ed, at least 2 or more characters in the sequence match. When this case occurs, the system

Semi-Automatic Semantic Data Classification Expert System to Produce Thematic Maps

http://dx.doi.org/10.5772/51848

241

**•** Ordinal data must be compared to the database field corresponding to the knowledge

#### *3.5.1. Level of measurement*

For this system, the level of measurement of data is simplified into numeric, ordinal and nominal types. To classify data in one of these three, the expert system needs first to identify the name of the column that stores what has been mapped and the distinct values found in this column. Second, it is important to identify what type of data is being classified, in terms of databases' data types. The system makes the following assumptions:

Semi-Automatic Semantic Data Classification Expert System to Produce Thematic Maps http://dx.doi.org/10.5772/51848 241

**Figure 11.** Schema for the algorithm that provide upload and store user's spatial data

that stores every level of measurement decision, data classification method, number of classes and color ramp made by a specialist logged into the website. According to the comparison be‐ tween what is chosen by them and the data characteristics, it is possible for the system to learn the occurrence of patterns. To learn, in this case, is to store these patterns in the consolidated ta‐ ble of specialists' decisions and using this knowledge base to ordinary users. So, it will often oc‐ cur when common users upload data that is similar to the ones used by specialists' before.

In order to make this knowledge functional, some metrics are defined for any kind of data up‐ loaded by a user or the Atlas original data. These metrics can be divided among the four possi‐ ble decisions made by a specialist, plus the final evaluation of the produced map. At least three different specialists must take the same decision, based on relevant data characteristics match‐ ing exactly, and transmit the most positive feedback allowed by the system for the built map, in order for a decision to be stored into a consolidated decisions table. At first experimental build,

For this system, the level of measurement of data is simplified into numeric, ordinal and nominal types. To classify data in one of these three, the expert system needs first to identify the name of the column that stores what has been mapped and the distinct values found in this column. Second, it is important to identify what type of data is being classified, in terms

the expert system can learn behavior related to two decisions, as follows.

of databases' data types. The system makes the following assumptions:

**Figure 10.** Detail on the form used to upload *shapefiles*

*3.5.1. Level of measurement*

240 Decision Support Systems


#### *3.5.2. Data classification method*

For numerical data only, the expert system measures statistics for the theme data: standard deviation, variance, mean, deviance from mean (Figure 12). These are all taken in order to measure kurtosis (Dent [23]), which is a metric to measure the flatness of a distribution. The following assumptions are considered:

as maps that should have their decisions against data to be considered as knowledge base. Using this metric will ensure that the consolidated table has only the best representations,

Semi-Automatic Semantic Data Classification Expert System to Produce Thematic Maps

http://dx.doi.org/10.5772/51848

243

Based on results, there are important remarks about the proposed interface design and code implementation. First, user testing is part of the development process, and only doing it can ensure the interface acceptability and usability. However, the proposed framework provides clear step-by-step guidance in order to allow users to produce thematic maps. First informal pre-tests show that the interface has no usability gaps and that the expert system sugges‐ tions seems to be an desirable aid, especially for those unfamiliar with cartography and map production environments. Second, there are known limitations on this first web version. In production environments, this software must allow DBMS configuration for multiple data‐ base access. Also SLD specification needs to be improved to support more complex map‐ ping techniques. Last, more research on which information about data could be analyzed in

An expert system development should consider the subject particularities. In the case of maps, use and users are mandatory issues to be taken into account when designing data storage and analysis and also the way to interact with them. The presented ES can manage not only the LOAS data, users and framework, but is now designed to cover an unpredicta‐ ble amount of uses, since users can upload and analyze their own data. The system's inter‐ face was also carefully discussed in order to present to users the most practical and simple way to interact with complex map design decisions. Usability pre-tests have been carried

After testing this first version of the online Atlas, it is intended to develop additional func‐ tionalities in order to improve the expert system concept that has been started with this re‐ search, as well as the interface use experience. The main objective is to make this system a reference, not only for LOAS technicians but also for ordinary internet users who need to get

Currently, users have the option to upload their own data, both geometry and attributes. There is work in progress for the system to recognize if the mapping method is suitable for the data characteristics. One aspect that can be discussed, regarding the analysis of numeric data types, is a mapping technique chosen against relative or absolute data. Here, absolute data are considered those not related to any other data, e.g. people counting; relative data are related to any other data and can be related to area units, e.g. population density. Be‐ sides the importance of this classification, there is no formal way to discover if data are ab‐ solute or relative. A possible solution for this problem could be to ask the user a set of questions in order to verify this information. After this questionnaire, the system would sug‐ gest the most suitable method for data classification and, consequently, the choice of map‐

order to define more adequate criteria for the system's decisions.

**4. Conclusion and further work**

out, and current feedback is positive.

ping technique.

their data symbolized according to map design expertise.

according to specialists.



**Figure 12.** Example of generated theme table and associated statistics

#### *3.5.3. Feedback on map quality*

This is simply asking the specialist: "How would you classify the quality of the representa‐ tion generated?" Answers can vary from 0 to 10, and only higher grades should be assumed as maps that should have their decisions against data to be considered as knowledge base. Using this metric will ensure that the consolidated table has only the best representations, according to specialists.

Based on results, there are important remarks about the proposed interface design and code implementation. First, user testing is part of the development process, and only doing it can ensure the interface acceptability and usability. However, the proposed framework provides clear step-by-step guidance in order to allow users to produce thematic maps. First informal pre-tests show that the interface has no usability gaps and that the expert system sugges‐ tions seems to be an desirable aid, especially for those unfamiliar with cartography and map production environments. Second, there are known limitations on this first web version. In production environments, this software must allow DBMS configuration for multiple data‐ base access. Also SLD specification needs to be improved to support more complex map‐ ping techniques. Last, more research on which information about data could be analyzed in order to define more adequate criteria for the system's decisions.
