**Least Squares Method and Empirical Modeling: A Case Study in a Mexican Manufacturing Firm**

Raúl Hernández-Molinar, Roberto Sarmiento-Rebeles and César F. Méndez-Barrios

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/63151

#### **Abstract**

Empirical modeling (EM) has been a useful approach for the analysis of different problems across a number of areas/fields of knowledge. As is known, this type of modeling is particularly helpful when parametric models due to a number of reasons cannot be constructed. Based on different methodologies and approaches (e.g., Least Squares Method, LSM), EM allows the analyst to obtain an initial understanding of the relation‐ ships that exists among the different variables that belong to a particular system or a process.

In some cases, the results from empirical models can be used to make decisions about those variables, with the intent of resolving a given problem. The investigation de‐ scribes the application of EM to the estimation of shipping costs in a Mexican manufac‐ turing firm. The results show that overall, transportation costs using an empirical model tend to be lower than costs calculated by a previous model. This demonstrates the practical and potential utility that results based on EM can have in a real-life setting.

**Keywords:** empirical modeling, exploratory data analysis, least squares, linearization, transportation logistics

#### **1. Introduction**

It is well known that researchers can use empirical modeling (EM) to have a better under‐ standing of a particular problem. This type of modeling can be improved by the expert input

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

of analysts. When investigating a particular system or process, it is always preferable to perform both exploratory/initial and confirmatory analyses of the available data and infor‐ mation. Nevertheless, in some cases, it is not possible to do the latter. This means that often‐ times, professionals in positions of authority have to make decisions about important variables and problems based solely on the results from initial/exploratory models.

This chapter describes the application of EM to investigate the variables associated with shipping costs in a Mexican manufacturing firm. The objective was to obtain a model that would offer a better idea of the variables and dynamics that determine those costs. To this end, the Mexican company formed a research team tasked with a complete and detailed analysis of the problem.

Using a Least Squares Method (LSM) approach, the team proposed a new model capable of estimating transportation costs of containers shipped in vessels from Europe to a port in Mexico. Using the proposed model, the firm's management was able to make comparisons between the actual costs incurred based on a previous model (formulated by the provider of the shipping service) and the estimated costs based with the new model.

The results show that in general, cost estimates from the new model tend to be lower than those of the previous model. These results allowed the Mexican firm to start new negotiations about their shipping costs with the provider of the transportation service

#### **2. Empirical modeling: an overview**

The main objective of this section consists in reviewing the concept called EM and some other concepts employed when an investigator begins the exploration of the information. Another important objective is to suggest the use of a linear model as an important resource to clarify and propose a fitted empirical model based on the observation of the data when a special transformation process of the variables is realized.

In reference [1] comments that empirical models are guided exclusively by data. Analysts attempt to find a model that reflects trends in data to make predictions instead of explaining behavior. In particular [1] underlines the potential utility of statistical approaches/tools (e.g., regression analysis) when doing EM. As is known an empirical model can aid researchers in acquiring an initial idea of the relationship between two or more variables that are represen‐ tative of a particular system or process. In spite of its inherent limitations, the results obtained using empirical models can sometimes help researchers when decisions need to be made with respect to the variables that intervene in the system/process under study.

Empirical knowledge can be understood as those instances when new information/knowledge is acquired by practical/experiential means. While this type of knowledge is undoubtedly valid and useful, it should be noted that in some cases, the conjectures/conclusions we make about observed data and results are based on the analyst's own experience and interpretation. This means that sometimes, impartiality and scientific rigor in the analysis of data and results might be difficult to achieve. Consequently, inconsistencies between the real-life problem and the model proposed by the analyst can be found. It is important to consider, as reference [2] suggests, that when modeling is applied to any logistics system, flexibility must be considered.

of analysts. When investigating a particular system or process, it is always preferable to perform both exploratory/initial and confirmatory analyses of the available data and infor‐ mation. Nevertheless, in some cases, it is not possible to do the latter. This means that often‐ times, professionals in positions of authority have to make decisions about important variables

This chapter describes the application of EM to investigate the variables associated with shipping costs in a Mexican manufacturing firm. The objective was to obtain a model that would offer a better idea of the variables and dynamics that determine those costs. To this end, the Mexican company formed a research team tasked with a complete and detailed analysis

Using a Least Squares Method (LSM) approach, the team proposed a new model capable of estimating transportation costs of containers shipped in vessels from Europe to a port in Mexico. Using the proposed model, the firm's management was able to make comparisons between the actual costs incurred based on a previous model (formulated by the provider of

The results show that in general, cost estimates from the new model tend to be lower than those of the previous model. These results allowed the Mexican firm to start new negotiations

The main objective of this section consists in reviewing the concept called EM and some other concepts employed when an investigator begins the exploration of the information. Another important objective is to suggest the use of a linear model as an important resource to clarify and propose a fitted empirical model based on the observation of the data when a special

In reference [1] comments that empirical models are guided exclusively by data. Analysts attempt to find a model that reflects trends in data to make predictions instead of explaining behavior. In particular [1] underlines the potential utility of statistical approaches/tools (e.g., regression analysis) when doing EM. As is known an empirical model can aid researchers in acquiring an initial idea of the relationship between two or more variables that are represen‐ tative of a particular system or process. In spite of its inherent limitations, the results obtained using empirical models can sometimes help researchers when decisions need to be made with

Empirical knowledge can be understood as those instances when new information/knowledge is acquired by practical/experiential means. While this type of knowledge is undoubtedly valid and useful, it should be noted that in some cases, the conjectures/conclusions we make about observed data and results are based on the analyst's own experience and interpretation. This means that sometimes, impartiality and scientific rigor in the analysis of data and results might be difficult to achieve. Consequently, inconsistencies between the real-life problem and the

and problems based solely on the results from initial/exploratory models.

the shipping service) and the estimated costs based with the new model.

about their shipping costs with the provider of the transportation service

respect to the variables that intervene in the system/process under study.

**2. Empirical modeling: an overview**

transformation process of the variables is realized.

of the problem.

44 Empirical Modeling and Its Applications

This being said, influential thinkers and intellectuals have vigorously debated the topic of whether full certainty can be achieved with respect to the validity and representativeness of a model. For example, reference [3] argues that empirical knowledge plays an integral role in the development of so-called "scientific knowledge." This is because scientists have the opportunity to explore and confirm particular ideas/conjectures on the basis of their own empirical findings.

Under a scientific and formal context, Exploratory Data Analysis (EDA) based on empirical information requires probability and statistical concepts. However, reference [4] mentions that there exists a moment where exploratory and confirmatory data analysis must be distinguish‐ ed between confirmatory nonparametric statistical data analysis, or modeling, and confirma‐ tory parametric statistical data analysis.

It is clear that when there is no information to propose a parametric model, an exploratory analysis using empirical knowledge to obtain an initial model and solution can be justified. After this step, the analyst can judge, based on his/her expertise, whether the initial model is an adequate representation of the relationships that exists among the different variables that are part of the problem under study. Consistent with this, reference [5] also says that when it is not possible to justify the behavior of the data, an empirical model can be utilized to obtain an initial idea vis-à-vis the nature of problem of interest.

Generally speaking, EM uses nonparametric data analysis to explore trends or behaviors within the available data. It is assumed that models based on well-defined parameters and distribution functions cannot be formulated due to incomplete data/information. This type of modeling also assumes that variables belong to sample spaces where uncertainty is present.

EM can be used to represent real-life problems that require nonanalytical methods. Examples of areas/fields where EM has proven useful include industry, science, technology, engineering, medicine, biology, and management. It should also be said that more powerful computers are of immense aid when researchers use EM, especially in those situations where high uncertainty exists.

Given the uncertainty and incompleteness associated with empirical models (along with the sometimes necessary expert input of the analysts in the definition of a model), it is evident that results and information derived from these models cannot be generalized. Adding to what has been discussed already, reference [6] notes that "Exploratory data analysis seemed new to most readers or auditors, but to me it was really a somewhat more organized form—with better or unfamiliar graphical devices—of what subject-matter analysts were accustomed to do".

We now sum up some of the salient characteristics and benefits of EM: it is mainly based on observed empirical data. However, it can also include the expert judgment/opinion of analysts. The data involved in the empirical model belongs exclusively to the realm of the system or the process that is being investigated. This means that there is no input from variables, parameters, or principles that fall outside the scope of the problem under study. Empirical models are capable of generating feasible solutions that can be helpful when investigating a particular problem. This in turn can guide analysts when decisions have to be made with respect to the variables associated with the problem of interest.

In addition, two appendices are annexed to review issues about the modeling process and outline the general numerical method that uses least squares as criteria to select an empirical model.

### **3. Case Study: estimation of the total cost of transportation to create a future budget**

#### **3.1 Background**

This section discusses the case of a firm that has operations in Mexico (heretofore referred to as MF, "Mexican firm"). We now proceed to describe briefly the problem at hand: every month a sea shipment is dispatched from Europe to a port in Mexico by an affiliate of MF. Each shipment contains items that are needed for the daily operations of MF. Originally, the cost of each shipment was based on a model calculated by the company that provides the transpor‐ tation service to MF. We will refer to this model as OM ("old model"). The shipping costs can vary according to the quantity of items that are being transported in the different containers included in the vessel. The combinations of items (and their respective quantities) that are transported in any shipment/container are determined according to MF's forecasted needs.

As part of their cost-saving initiatives, MF decided to investigate whether their transportation costs could be reduced. In particular, they decided to come up with their own cost-projection model to compare its estimates with those provided by OM. In this way, a more realistic estimation of their shipping costs could be obtained. To accomplish their objective, they decided to utilize historical empirical data to calculate a new model ("NM") that would provide a more accurate idea of the monthly costs associated with each shipment. Evidently, more accurate cost estimates can result in better budgeting decisions and its associated benefits.

To accomplish their objective, MF's top management made the decision to conduct a detailed analysis of the situation. A research team tasked with proposing a model that would be an adequate representation of the problem was formed. One of the first and most important activities of the team was the conceptualization and understanding of the different variables upon which the monthly transportation budget depends. It was observed that the cost of a given sea shipment is a function of at least one hundred variables. These variables include the value of goods, number of pallets, sea freight charges, unitary cost, and volume of the shipped items, among others.

A key step in the research process was making sure that the data pertaining to the above variables was reliable and representative of the problem to be modeled. Reference [7] warn us about the relevance in the clarification between the forecast and the planning of the variables under study. For example, MF had information about a number of variables that were not relevant to the problem (e.g., information about items that were being shipped from the USA). This meant that the database had to be depurated in great detail. Once the database was deemed reliable, the research team began to analyze the potential relationships among the set of variables of interest. Evidently, the dependent variable (transportation/shipping cost and TC) in the modeling process has to be a function of a group of independent variables such as the ones described in the previous paragraph. It needs to be specified that the main unit of analysis is the container in which the different items are transported by sea. A maritime cargo shipment usually carries several containers.

capable of generating feasible solutions that can be helpful when investigating a particular problem. This in turn can guide analysts when decisions have to be made with respect to the

In addition, two appendices are annexed to review issues about the modeling process and outline the general numerical method that uses least squares as criteria to select an empirical

This section discusses the case of a firm that has operations in Mexico (heretofore referred to as MF, "Mexican firm"). We now proceed to describe briefly the problem at hand: every month a sea shipment is dispatched from Europe to a port in Mexico by an affiliate of MF. Each shipment contains items that are needed for the daily operations of MF. Originally, the cost of each shipment was based on a model calculated by the company that provides the transpor‐ tation service to MF. We will refer to this model as OM ("old model"). The shipping costs can vary according to the quantity of items that are being transported in the different containers included in the vessel. The combinations of items (and their respective quantities) that are transported in any shipment/container are determined according to MF's forecasted needs. As part of their cost-saving initiatives, MF decided to investigate whether their transportation costs could be reduced. In particular, they decided to come up with their own cost-projection model to compare its estimates with those provided by OM. In this way, a more realistic estimation of their shipping costs could be obtained. To accomplish their objective, they decided to utilize historical empirical data to calculate a new model ("NM") that would provide a more accurate idea of the monthly costs associated with each shipment. Evidently, more accurate cost estimates can result in better budgeting decisions and its associated benefits. To accomplish their objective, MF's top management made the decision to conduct a detailed analysis of the situation. A research team tasked with proposing a model that would be an adequate representation of the problem was formed. One of the first and most important activities of the team was the conceptualization and understanding of the different variables upon which the monthly transportation budget depends. It was observed that the cost of a given sea shipment is a function of at least one hundred variables. These variables include the value of goods, number of pallets, sea freight charges, unitary cost, and volume of the shipped

A key step in the research process was making sure that the data pertaining to the above variables was reliable and representative of the problem to be modeled. Reference [7] warn us about the relevance in the clarification between the forecast and the planning of the variables under study. For example, MF had information about a number of variables that were not relevant to the problem (e.g., information about items that were being shipped from the USA). This meant that the database had to be depurated in great detail. Once the database was

**3. Case Study: estimation of the total cost of transportation to create a**

variables associated with the problem of interest.

model.

**future budget**

46 Empirical Modeling and Its Applications

**3.1 Background**

items, among others.

The research team examined a number of different types of models (e.g., linear, quadratic, and exponential) that could best fit the relationship between TC and its determinants [8]. After different tests and analyses, it was found that a linear model represented this relationship best. In particular, a linear model using the LSM was proposed. As is known, this method offers a best-fit model that minimizes the sum of the squares differences (errors) that exist between the real observations and the ideal results proposed by the model. The well-known general model is defined as follows:

$$
\hat{Y}\_i = \hat{\beta}\_0 + \hat{\beta}\_1 X\_{1i} + \hat{\beta}\_2 X\_{2i} + \dots + \hat{\beta}\_k X\_{ik} \tag{1}
$$

With respect to the defining function for this problem, the research team made the decision that the final set of independent variables should be the result of all those items that appeared at least once in the historical records. In other words, if an item was recorded as being shipped and received at least once, the research team decided to include it in the general model for TC. The proposed Least Squares Model has TC as the dependent variable that is a function of potentially more than one hundred independent variables.

As will be made clear later, the quantity of independent variables to include in the model to calculate TC for a given shipment and containers will depend on previous records of shipped items. Put differently, records could suggest that TC be defined by, for example, 80 items in one month, while 70 items could be used to estimate TC in the next month.

#### **3.2. A comparison between OM and NM estimates of shipping costs**

We now proceed to exemplify the differences between the estimated costs using the model originally proposed by the transportation company (OM) and the model resulted from the analysis by MF's research team (NM). The results in **Table 1** are based on data provided by them. More specifically, the costs under the OM column reflect historical records (i.e., they are costs pertaining to completed shipments). The calculations in the NM column reflect the estimated costs had this model been used for a particular completed shipment

#### *3.2.1. Using MLS method for estimating the total cost based on shipping part costs*

It is clear that linearization process is useful when several first order variables are participating in a model reference [1]. In the present study, at least hundred variables can be interacting to define the total cost of the shipment transportation.

In this case, several variables (more than hundred) were considered to estimate the cost per shipment, for instance, value of goods, number of pallets, sea freight charges, volume, and unitary cost. After a serious selection process based on historical information and the expertise of the personal, a matrix considering shipment identifier and the cost of each of the parts is created. Using the historical information, a vector with the *β<sup>i</sup>* coefficients is estimated using the LMS, and these are used to estimate the cost assigned to each shipment.

The LSM determines the best fit that minimizes the sum of squares magnitudes between the observed responses and those that are predicted by the model. A detailed explanation related to the method can be reviewed in references [9–12].

We know it is possible to predict the *Y* values by using the estimated model parameter values. We also know that the values can be generated from the following model

$$
\hat{Y}\_i = \hat{\beta}\_0 + \hat{\beta}\_1 X\_{1i} + \hat{\beta}\_2 X\_{2i} + \dots + \hat{\beta}\_k X\_{ik} \tag{2}
$$

The sum of the squares deviations generated from the observed values of *Y* and corresponding values predicted using the regression model estimated.

$$\sum\_{i=1}^{n} \left( Y\_i - \hat{Y}\_i \right)^2 = \sum\_{i=1}^{n} \left( Y\_i - \left( \hat{\beta}\_0 + \hat{\beta}\_1 X\_{1i} + \hat{\beta}\_2 X\_{2i} + \dots + \hat{\beta}\_k X\_{ik} \right) \right)^2 \tag{3}$$

We need to recall that least squares solution consists in finding the values of estimators

$$
\beta\_0, \beta\_1, \dots, \beta\_k,\tag{4}
$$

which are called least squares estimators. The minimum sum of squares is called the residual sum of squares, the sum of squares of the error, and the sum of squares due to regression. Based on the estimated values, the estimated budget is defined for each shipment.

#### **3.3. Constructing the estimated budget using an empirical model**

Based on the linear model generated, an empirical model to forecast a budget considering the total cost on the budget is proposed. The coefficients estimates for determining the shipment cost per part in the corresponding container are generated using the Least Squares estimation method. **Table 1** shows an example for the estimation on 11 containers.

**Table 2** shows the estimated values generated with the MLS method for each shipment freight. It is evident that the cost associated to the land freight is constant. The estimated cost values were determined using a multiple linear model, which consider several factors were chosen by the experienced personal in the company. The empirical model that suggests the budget for the future is showed in **Figure 3**.

Least Squares Method and Empirical Modeling: A Case Study in a Mexican Manufacturing Firm http://dx.doi.org/10.5772/63151 49


**Table 1.** A comparison between historical records of shipping costs (OM column) and estimated costs using NM for 11 containers.

#### **Figure 1** also illustrates the calculations made in **Table 1**.

In this case, several variables (more than hundred) were considered to estimate the cost per shipment, for instance, value of goods, number of pallets, sea freight charges, volume, and unitary cost. After a serious selection process based on historical information and the expertise of the personal, a matrix considering shipment identifier and the cost of each of the parts is

The LSM determines the best fit that minimizes the sum of squares magnitudes between the observed responses and those that are predicted by the model. A detailed explanation related

We know it is possible to predict the *Y* values by using the estimated model parameter values.

The sum of the squares deviations generated from the observed values of *Y* and corresponding

( ) ( ( )) <sup>2</sup> <sup>2</sup> 0 11 2 2

We need to recall that least squares solution consists in finding the values of estimators

<sup>10</sup> , , ..., , bb

Based on the estimated values, the estimated budget is defined for each shipment.

**3.3. Constructing the estimated budget using an empirical model**

method. **Table 1** shows an example for the estimation on 11 containers.

 b

which are called least squares estimators. The minimum sum of squares is called the residual sum of squares, the sum of squares of the error, and the sum of squares due to regression.

Based on the linear model generated, an empirical model to forecast a budget considering the total cost on the budget is proposed. The coefficients estimates for determining the shipment cost per part in the corresponding container are generated using the Least Squares estimation

**Table 2** shows the estimated values generated with the MLS method for each shipment freight. It is evident that the cost associated to the land freight is constant. The estimated cost values were determined using a multiple linear model, which consider several factors were chosen by the experienced personal in the company. The empirical model that suggests the budget

 b

*YY Y X X X* bb

ˆ ˆ ˆ ˆ ˆ *n n i i i i i k ki*

 b

å å - = - + + + ¼+ (3)

*<sup>i</sup> <sup>i</sup> k ki* (2)

*<sup>k</sup>* (4)

 b

coefficients is estimated using

created. Using the historical information, a vector with the *β<sup>i</sup>*

to the method can be reviewed in references [9–12].

48 Empirical Modeling and Its Applications

values predicted using the regression model estimated.

1 1

= =

*i i*

for the future is showed in **Figure 3**.

the LMS, and these are used to estimate the cost assigned to each shipment.

We also know that the values can be generated from the following model

bb

0 11 2 2 ˆ ˆ ˆ <sup>ˆ</sup> <sup>ˆ</sup> *Y XX X <sup>i</sup>* = + + + ¼+

 b

**Figure 1.** Real and estimated budget.

From the 11 comparisons between OM and NM estimates, it can be observed that the net difference is negative in four instances. However, the cumulative net difference shows that overall, NM offers a lower estimate of the shipping costs (savings of \$9,916.96 in the total budget). This suggests that from MF's perspective, their proposed model (NM) could be used to obtain lower estimates of their transportation costs. This overall difference is made clear once the LSM estimates of both OM and NM are calculated.

**Figure 2** illustrates the difference between these estimates. These two linear models have been estimated based on the OM and NM values. It is clear that NM estimates are, in general, lower than OMs. As was said before, this suggests that from MF's perspective, the use of NM's calculations would benefit them in the long run.

**Figure 2.** Comparison between real and estimated budget.

In order to probe the validity of the proposed model (NM) we can observe that in most of the cases the goodness of the model is associated with well-balanced residual values above or below a reference axis. This permits to be sure that there is no overestimation or underesti‐ mation of the predicted values.

#### **4. Conclusion and further research**

This case shows that EM can help in the forecasting process. Undoubtedly, modeling is usually a very common tool given the complexity and accuracy required in transportation problems as it is mentioned in references [2,14–19]. The described case also shows that the selection of the model is very important in any planning activity.

Despite some special programs that are able to generate the proposed models automatically, it has been made clear when information is not available or practically unknown, EM is an option that could help in the generation of structure, method, and formal knowledge. It is important to recall that the main objective in this approach is to find the best model that can represent the relationship between the variables under study, and EM is useful to do it.

The empirical model proposed is pioneering the decisions in the corporation, and it has been implemented with success. There is still interest in the improvement of criteria to upgrade the multiple linear models to estimate the containers' cost, but until now this proposal has given good results. Although this is a novel and simple approach, it is possible to mention that the combination of available data with the experience of personnel has been helpful for decisionmakers.

The LSM is used as an algorithm to generate estimates for a new model that the MF has been considered sufficient and pertinent to produce significant savings. The case study has been helpful to propose the relevant data to study and estimate relations in assigning the shipping cost, based also on the experience and knowledge of the company experts. The method helped in the construction of one empirical model supported for a linearization process and has provoked significant changes in the planning process of each monthly budget.

The model proposed in this research has provided successful results, however, the team continues using other exploratory data techniques to improve it. It is expected that in the near future, it would be possible to release other options to propose better forecast of the shipment freight budget. Further studies can be conducted using parametric models generated with statistical tools or through a deep analysis using polynomials to suggest more effective transformations.

The model to forecast the shipment freight budget proposed in this research has provided successful results; this conducts to better profits and sustainable growth.

Furthermore, the research team continues using other exploratory data techniques to improve the model. It is expected that in the near future it would be possible to release other options to propose better forecasts. Also, further studies can be conducted using parametric models generated with statistical tools or through a deep analysis using polynomials to suggest more effective transformations.

### **Appendix A**

**Figure 2** illustrates the difference between these estimates. These two linear models have been estimated based on the OM and NM values. It is clear that NM estimates are, in general, lower than OMs. As was said before, this suggests that from MF's perspective, the use of NM's

In order to probe the validity of the proposed model (NM) we can observe that in most of the cases the goodness of the model is associated with well-balanced residual values above or below a reference axis. This permits to be sure that there is no overestimation or underesti‐

This case shows that EM can help in the forecasting process. Undoubtedly, modeling is usually a very common tool given the complexity and accuracy required in transportation problems as it is mentioned in references [2,14–19]. The described case also shows that the selection of

Despite some special programs that are able to generate the proposed models automatically, it has been made clear when information is not available or practically unknown, EM is an option that could help in the generation of structure, method, and formal knowledge. It is important to recall that the main objective in this approach is to find the best model that can represent the relationship between the variables under study, and EM is useful to do it.

The empirical model proposed is pioneering the decisions in the corporation, and it has been implemented with success. There is still interest in the improvement of criteria to upgrade the multiple linear models to estimate the containers' cost, but until now this proposal has given good results. Although this is a novel and simple approach, it is possible to mention that the combination of available data with the experience of personnel has been helpful for decision-

calculations would benefit them in the long run.

50 Empirical Modeling and Its Applications

**Figure 2.** Comparison between real and estimated budget.

**4. Conclusion and further research**

the model is very important in any planning activity.

mation of the predicted values.

makers.

#### **A.1. The modeling process**

A model can be conceptualized as a mathematical description which is generated using knowledge, experience, and experts opinions, but based on data that were registered previ‐ ously. As references [8,13] indicate, the data help in identifying the geometric or physical tendency of a potential model and those values that correspond to characteristic values representing relevant parameters. An appropriate model suggests adjustment, or simplicity under a practical approach, and this must be conducted based on the good quality of the used information.

In general, the modeling process requires the consideration of the following issues:


During the analysis of modelling process, the main idea is to elaborate a predictive model that helps to propose a better solution, and consequently to suggest an improvement in the indicators of the system. In order to do this is convenient for the identification of those trends or feasible models, which can be used as a reference during the process.

Another important aspect in modeling is to guarantee that data is representative of the problem under study. This requires a deep analysis of the relationships between the variables or specific sources, and to clearly point out the obtained empirical model destination.

One of the advantages in using EM is that they can conduct the right answer most of the time and does not require very formal information. This can be useful when a solution must be implemented promptly because the empirical model will be based only in the available information.

However, there is confusion about the goodness of using theoretical models instead of empirical models. It is not possible to declare that one type of model is better that the other because it depends of the specific context they are applied. The empirical models are useful when a theoretical model is not available. It is clear that the objective is to model scenarios with the best performance in order to solve a given problem or a simulation.

It is very common to apply empirical models when certain events in nature are not character‐ ized by theoretical models, as those related to climate, air, environmental contamination, shipping, lifetime in active products, friction mechanisms trends, and etcetera.

Sometimes the use of data is not easy or is very expensive because they require long time to be obtained or not available for special causes. When this occurs, the EM is a practical option to create scenarios to simulate the behavior of the variables of interest.

It is known that many scientific, social, or engineering observations are generated through experimentation or observing the situation under study. Records of these values are stored in a data base. The information is analyzed and reported using several types of plots of the associated points.

With the available information, the investigators can apply different methods to propose formulas (equations) to formally represent the behavior of data. In most of the cases, the adjustment process considers the possibility of determining a function, to use transformed data that must be fitted to the observed values.

This approach indicates that it is very likely to propose similar results to those that a process sample would represent. Based on this, the researcher would be able to promptly represent the variables tendency under study.

#### **A.2. Description of the modeling process**

**•** To achieve a detailed process of analysis of the obtained results that support the resulting

During the analysis of modelling process, the main idea is to elaborate a predictive model that helps to propose a better solution, and consequently to suggest an improvement in the indicators of the system. In order to do this is convenient for the identification of those trends

Another important aspect in modeling is to guarantee that data is representative of the problem under study. This requires a deep analysis of the relationships between the variables or specific

One of the advantages in using EM is that they can conduct the right answer most of the time and does not require very formal information. This can be useful when a solution must be implemented promptly because the empirical model will be based only in the available

However, there is confusion about the goodness of using theoretical models instead of empirical models. It is not possible to declare that one type of model is better that the other because it depends of the specific context they are applied. The empirical models are useful when a theoretical model is not available. It is clear that the objective is to model scenarios

It is very common to apply empirical models when certain events in nature are not character‐ ized by theoretical models, as those related to climate, air, environmental contamination,

Sometimes the use of data is not easy or is very expensive because they require long time to be obtained or not available for special causes. When this occurs, the EM is a practical option

It is known that many scientific, social, or engineering observations are generated through experimentation or observing the situation under study. Records of these values are stored in a data base. The information is analyzed and reported using several types of plots of the

With the available information, the investigators can apply different methods to propose formulas (equations) to formally represent the behavior of data. In most of the cases, the adjustment process considers the possibility of determining a function, to use transformed

This approach indicates that it is very likely to propose similar results to those that a process sample would represent. Based on this, the researcher would be able to promptly represent

**•** To construct a detailed report indicating the way that the solution must be applied.

or feasible models, which can be used as a reference during the process.

sources, and to clearly point out the obtained empirical model destination.

with the best performance in order to solve a given problem or a simulation.

shipping, lifetime in active products, friction mechanisms trends, and etcetera.

to create scenarios to simulate the behavior of the variables of interest.

data that must be fitted to the observed values.

the variables tendency under study.

alternatives

52 Empirical Modeling and Its Applications

information.

associated points.

In general, the modeling process can be described in several steps. Readers interested in this topic can also review in reference [1]:


In case more or other data are collected or given the context has been changed, the iteration in the modelling process can be repeated. In general, the steps above mentioned can be sum‐ marized as is illustrated in **Figure A.1**

**Figure A.1.** The Modeling Process.

### **Appendix B**

**7.** The adjusted models are used to compare against the corresponding phenomena, for example: using a well-validated model can lead in applying the property of unbiasedness. If there are other models, it is possible to analyze them at this moment. If in Steps 4 or 3, a model is not the appropriated; one can seek for other feasible models. There is the possibility that more than one model could be used, and there is an interest in propose

**8. Selection of the model** The valid models are chosen, and they are analyzed. Some criteria are generated to select the better option. It is possible to use the results of the tests or the comparison with other related models. It is important to consider that the models

**9. Implementation of the proposed model(s)** The analysis of results based on the model selected will help to simulate several scenarios useful to generate the final reports. A

In case more or other data are collected or given the context has been changed, the iteration in the modelling process can be repeated. In general, the steps above mentioned can be sum‐

them.

54 Empirical Modeling and Its Applications

proposed can be used in the future.

marized as is illustrated in **Figure A.1**

**Figure A.1.** The Modeling Process.

process of polishing is suggested in this step.

#### **B.1. Types of empirical models**

It is common to employ theoretical frameworks (based on mathematical and statistical concepts) [1,8,13] to construct models following the use of data base to define constant values. This represents characteristic values (parameters) considering a defined model.

This process sometimes is denominated the fitting model. Although the model does not adjust well to the observed data, they would be accepted; assuming the presence of some errors; and the definition is useful to explain the tendency of the studied situation. When we use this type of processes, the models are called analytical models.

In EM, it is considered that the use of data (observations) is based on sample observations or data that are coming from experiments or simple observations of the studied reality. This leads to the seeking for some trends or additional knowledge. The searching is oriented to explain the presence of certain dependent variables.

In other words, in EM, the main idea is to get rapidly the best tendency of the information and use it to find and propose a model that would be useful to make some decisions that contribute in the solution of a specific problem. **Table B.1** shows some types of typical and useful transformations to create empirical models.


**Table B.1.** Types of linear transformations.

#### **B.2. Linearization process in Empirical Modeling**

To propose the best model based on the obtained observations (data values), in EM it is very helpful to linearize a data set, transforming and adjusting a simple model (linearized) based on a transformation processes assuming the simulation of a continuous variable *x*. Some models can be linearized using the obtained data.

If the functions have some of these forms, the linearization process can be achieved trans‐ forming the models considering the relationship with a linear model [1,8]. Keep in mind that if *y* = *axb* , then *ln*(*y*) = *ln*(*a*) + *bln*(*x*); *ln y* = *ln a* + *b ln x*. ; So if *y* is a power function, *ln*(*y*) is a linear function of *ln*(*x*).

In modelling certain situations, there is a special interest in some aspects associated with the nature of the values that correspond to the variables under study. Sometimes, the linearization is achieved for a set of variables interacting simultaneously, using a numerical algorithm. One algorithm is called LSM, which is based on the minimization of the corresponding residuals.

**Figure B.1** shows the form of a model using *x* as predictor variable and *y* as explained or response variable.

**Figure B.1.** Several examples of functions.

#### **B.3. Parameters computation when using LSM**

In order to compute the parameters *a; b; c*, …, shown in **Table B.1**, the following general procedure can be adopted. Let's consider the function

Least Squares Method and Empirical Modeling: A Case Study in a Mexican Manufacturing Firm http://dx.doi.org/10.5772/63151 57

$$f\left(\mathbf{x},\;p\right) = \sum\_{j=0}^{n} a\_j \mathbf{x}^j \tag{5}$$

along with the following cost function

$$J\left(p\right) = \sum\_{i=0}^{m} \left(\mathbf{y}\_i - f\left(\mathbf{x}\_i, p\right)\right)^2\tag{6}$$

Where *p* is the vector of parameters to be determined, i.e., *p* = [*a* <sup>0</sup> *a* 1 … a*n*]*<sup>T</sup>*. Then, to determine parameter *p*, the best fit to the set of data ,(*x <sup>i</sup>* , *y <sup>i</sup>* ), corresponds to the parameter *p* which minimizes the cost function *J*, and we know from calculus that such a parameter must satisfy:

$$\nabla\_p J(p) = 0\tag{7}$$

Based on this:

**B.2. Linearization process in Empirical Modeling**

models can be linearized using the obtained data.

if *y* = *axb*

function of *ln*(*x*).

56 Empirical Modeling and Its Applications

response variable.

**Figure B.1.** Several examples of functions.

**B.3. Parameters computation when using LSM**

procedure can be adopted. Let's consider the function

To propose the best model based on the obtained observations (data values), in EM it is very helpful to linearize a data set, transforming and adjusting a simple model (linearized) based on a transformation processes assuming the simulation of a continuous variable *x*. Some

If the functions have some of these forms, the linearization process can be achieved trans‐ forming the models considering the relationship with a linear model [1,8]. Keep in mind that

In modelling certain situations, there is a special interest in some aspects associated with the nature of the values that correspond to the variables under study. Sometimes, the linearization is achieved for a set of variables interacting simultaneously, using a numerical algorithm. One algorithm is called LSM, which is based on the minimization of the corresponding residuals. **Figure B.1** shows the form of a model using *x* as predictor variable and *y* as explained or

In order to compute the parameters *a; b; c*, …, shown in **Table B.1**, the following general

, then *ln*(*y*) = *ln*(*a*) + *bln*(*x*); *ln y* = *ln a* + *b ln x*. ; So if *y* is a power function, *ln*(*y*) is a linear

$$\sum\_{i=1}^{n} \mathbf{x}\_i^{\prime} \mathbf{y}\_i \; \; + \sum\_{k=0}^{n} a\_k \sum\_{i=1}^{m} \mathbf{x}\_i^{\prime \*k} = \mathbf{0}. \; \text{for} \; j = \mathbf{0}, \mathbf{1}, \; \text{\textquotedbl{}n} \tag{8}$$

Since this holds for each *j* ∈ {0, 1, ⋯, *n*}, then, such a equation can be structured in a convenient way:

$$
\begin{bmatrix}
\mathbf{m} & \sum\_{i=1}^{n} \mathbf{x}\_{i} & \sum\_{i=1}^{n} \mathbf{x}\_{i}^{2} & \cdots & \sum\_{i=1}^{n} \mathbf{x}\_{i}^{n} \\
\sum\_{i=1}^{m} \mathbf{x}\_{i} & \sum\_{i=1}^{m} \mathbf{x}\_{i}^{2} & \sum\_{i=1}^{m} \mathbf{x}\_{i}^{3} & \cdots & \sum\_{i=1}^{m} \mathbf{x}\_{i}^{n+1} \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
\sum\_{i=1}^{m} \mathbf{x}\_{i}^{n} & \sum\_{i=1}^{m} \mathbf{x}\_{i}^{n+1} & \sum\_{i=1}^{m} \mathbf{x}\_{i}^{n+2} & \cdots & \sum\_{i=1}^{m} \mathbf{x}\_{i}^{2n}
\end{bmatrix}
\begin{bmatrix}
a\_{0} \\
\vdots \\
a\_{1} \\
\vdots \\
\vdots \\
a\_{n}
\end{bmatrix} = \begin{bmatrix}
\sum\_{i=1}^{n} \mathbf{y}\_{i} \\
\sum\_{i=1}^{n} \mathbf{x}\_{i} \mathbf{y}\_{i} \\
\vdots \\
\sum\_{i=1}^{n} \mathbf{x}\_{i}^{n} \mathbf{y}\_{i} \\
\vdots \\
\sum\_{i=1}^{n} \mathbf{x}\_{i}^{n} \mathbf{y}\_{i}
\end{bmatrix} \tag{9}
$$

*Remark*

It is clear from the above expression that such an equation can be written as a system of linear equations *Ap=B*. Then, to solve it for a large amount of parameters, several numerical methods can be applied (e.g., Gauss–Seidel, Jacobi, between others).

#### **Author details**

Raúl Hernández-Molinar\* , Roberto Sarmiento-Rebeles and César F. Méndez-Barrios

\*Address all correspondence to: raul.hernandez@uaslp.mx

Autonomous University of San Luis Potosi, San Luis Potos, México

#### **References**


[14] Habib Mamun (2010). Supply chain management: theory and its future perspectives, *International Journal of Business, Management and Social Sciences*, 1: 79--87.

**Author details**

**References**

Raúl Hernández-Molinar\*

58 Empirical Modeling and Its Applications

Bartlett Publishers.

Princeton University.

Hoboken: Wiley.

Columbus: Wiley.

\*Address all correspondence to: raul.hernandez@uaslp.mx

*American Association*. 74, 365, 105–120.

*Operations Management*, 18, 138–151.

2nd Ed. New York: Springer Texts in Statistics.

[11] Weisberg S. (1980). *Applied Linear Regression*. New York: Wiley.

squares. *Quarterly of Applied Mathematic*s, 2 164–168.

*sophical Transactions of the Royal Society A,* 222, 309–368.

Autonomous University of San Luis Potosi, San Luis Potos, México

, Roberto Sarmiento-Rebeles and César F. Méndez-Barrios

[1] Brian Albright. (2010). *Mathematical Modeling with Excel*. Massachussets: Jones and

[2] Barad M. and Sapir D., (2003). Flexibility in logistics systems-modeling and perform‐

[4] Parzen Emmanuel. (1979). Nonparametric statistical data modeling. *Journal of the*

[5] Fisher R. A. (1922). On the mathematical foundations of theoretical statistics. *Philo‐*

[6] Tukey J. W. (1993). *Exploratory Data Analysis: Past, Present, and Future.* Princeton:

[7] Oliva R. and Watson N. (2009). Managing functional biases in organizational forecasts: A case study of consensus forecasting in supply chain planning. *Production and*

[8] Thompson J. R. (2011) *Empirical Model Building: Data, Models, and Reality,* 2nd Edition.

[10] Sen A., and Srivastava M. (1994). *Regression Analysis, Theory, Methods, and Applications*.

[12] Levenberg K. (1994). A method for the solution of certain nonlinear problems in least

[13] Gilat, A., and Subramaniam V. (2010). *Numerical Methods for Engineers and Scientists*.

[9] Draper N. R. and Smith H. (1981). *Applied Regression Analysis*. New York: Wiley.

ance evaluation. *International Journal of Production Economics*, 85, 155–170.

[3] Russell Bertrand. (1961). *The Basic Writings*. George Allen & Unwin Ltd, London.

