3.1. Determining the consumption profiles

In order to dynamically determine consumer profiles, first we considered a series of algorithms based on classification and clustering techniques. In order to implement and test the model, we used a data set with hourly electricity consumption recorded in different US cities between January 1, 2014 and December 31, 2014. Each record contains values for the following types of consumption: heating, cooling, ventilators, indoor lighting, outdoor lighting, water heating, household equipment (washing machine and refrigerator) and other interior devices (TV and computer). Data were imported into Oracle Database 11 g R2 in the LOAD\_PROFILE\_T table with approximately 1,900,000 hourly records for 212 consumers. We analyzed the distribution of electricity consumption at different value ranges, consumption types and time periods as shown in Figure 1.

The analyses shows that the consumption curve has the same aspect as the consumption for heating and interior equipment, which makes these types of consumption significant attributes for the total consumption value.

Data being imported into Oracle Database, we consider data mining algorithms developed in Oracle SQL Developer. So, for the first method, we approached support vector machines (SVMs) classification method and we build six profiles (classes) and the profiles with the most cases (over 30,000) have the highest degree of accuracy (about 90%), which can be considered a


Figure 1. Data set statistics.

good result for classification. Performing classes' analyses, we observed that the profiles are very sensitive to changes in consumer behavior, due to the fact that classes with a small number of items recorded the highest prediction errors.

To eliminate these shortcomings, we considered it useful to apply a second solution for determining profiles dynamically by using clustering methods. For building the profiles, we applied the K-means method and for measuring similarity within a cluster, the variance (the sum of the squares of the differences between the main element and each element) is used, being the best clusters in which the variance is small. We analyzed the confidence level for each cluster and it is noticeable that the confidence is high, in most cases being over 85%. Regarding the clustering rules, from our results we noticed that the grouping rules do not take into account the attributes such as water heating, fans, cooling, household equipment, indoor/ outdoor lighting, but only heating and total consumption (the most important attributes). This may be due to the fact that we choose a small number of clusters comparing to the data set population. In conclusion, the lower the number of clusters, the more people in the group and less sensitive to changes in consumer behavior.

In order to divide the obtained profiles into smaller groups, we choose another clustering method in order to establish consumption patterns. So, we refined the K-mean results and we applied O-cluster method (Orthogonal partitioning clustering). This method is owned by Oracle Corporation [37] and uses a recursive data grouping algorithm through orthogonal data partitioning. On top of the previous 6 profiles determined by K-means, we build 10 subclusters, representing consumption patterns for each profile on hourly intervals. Analyzing the training rules and the weight of each consumption category in each cluster, we noticed that they have a varied composition, each cluster identifying a primary profile determined by the K-means method and one or more consumption patterns determined by the O-cluster method. For example, we considered the distribution of consumption patterns of a consumer within the P5 profile within 24 h. Figure 2 shows profile P5 split into 10 patterns (T1, …, T10) for a detailed perspective on electricity consumption.

Figure 2. Profile P5 patterns with O-cluster.

devices. Thus, consumers can analyze their consumption for heating, cooling, washing, lighting and other home appliances and they can schedule it based on ToUT. In Section 5, we proposed an informatics solution that provides friendly user interfaces and integrates methods for con-

In order to dynamically determine consumer profiles, first we considered a series of algorithms based on classification and clustering techniques. In order to implement and test the model, we used a data set with hourly electricity consumption recorded in different US cities between January 1, 2014 and December 31, 2014. Each record contains values for the following types of consumption: heating, cooling, ventilators, indoor lighting, outdoor lighting, water heating, household equipment (washing machine and refrigerator) and other interior devices (TV and computer). Data were imported into Oracle Database 11 g R2 in the LOAD\_PROFILE\_T table with approximately 1,900,000 hourly records for 212 consumers. We analyzed the distribution of electricity consumption at different value ranges, consumption types and time periods as shown

The analyses shows that the consumption curve has the same aspect as the consumption for heating and interior equipment, which makes these types of consumption significant attributes

Data being imported into Oracle Database, we consider data mining algorithms developed in Oracle SQL Developer. So, for the first method, we approached support vector machines (SVMs) classification method and we build six profiles (classes) and the profiles with the most cases (over 30,000) have the highest degree of accuracy (about 90%), which can be considered a

sumption optimization and micro-generation forecasting for electricity consumers.

3. Forecasting the electricity consumption

3.1. Determining the consumption profiles

124 Advanced Applications for Artificial Neural Networks

in Figure 1.

for the total consumption value.

Figure 1. Data set statistics.

The patterns build with O-cluster refine the clusters and gives a better understanding about consumption behavior regarding smaller groups of consumers and thus adjusting the ToUT for these groups. Also, the consumption patterns shape more accurately the consumer's dynamic behavior within 24 h, the profiles being in fact an approximation of the variation of hourly consumption. The deviations of the actual consumption compared to the average consumption of the profile are small, which again validates the clustering model.

As an option to clustering methods, we approached also a third method based on artificial neural networks (ANN). In Matlab R2015a, we imported data from Oracle Database from LOAD\_PROFILE\_T table and we organized input vectors as x(t) ∈R<sup>n</sup> , where n = 13 for each consumption type (heating, ventilation, indoor lighting, etc.) and t represents time interval (hours) between January 1, 2014 and December 31, 2014.

We developed a self-organizing maps (SOM) algorithm, setting the following parameters for the neural network:


The network is initialized with random values for each neuron. We used the trainbu training function that adjusts weights and bias after each iteration. We plotted the results and observed the distribution of the input set in Figure 3:

From the representation of the consumption curves corresponding to six clusters, it can be observed a clear delimitation between profiles P2 and P5. Also, a difference of approx. 30% of the evening consumption peak is observed between P6 and P1, P3, P4 (Figure 4).

Following the analysis of the obtained results, we noticed a correct and efficient grouping of the consumer profiles using the self-organizing neural networks.

Figure 3. Distances among clusters distribution.

Figure 4. Profiles obtained with SOM.

The patterns build with O-cluster refine the clusters and gives a better understanding about consumption behavior regarding smaller groups of consumers and thus adjusting the ToUT for these groups. Also, the consumption patterns shape more accurately the consumer's dynamic behavior within 24 h, the profiles being in fact an approximation of the variation of hourly consumption. The deviations of the actual consumption compared to the average

As an option to clustering methods, we approached also a third method based on artificial neural networks (ANN). In Matlab R2015a, we imported data from Oracle Database from

consumption type (heating, ventilation, indoor lighting, etc.) and t represents time interval

We developed a self-organizing maps (SOM) algorithm, setting the following parameters for

The network is initialized with random values for each neuron. We used the trainbu training function that adjusts weights and bias after each iteration. We plotted the results and observed

From the representation of the consumption curves corresponding to six clusters, it can be observed a clear delimitation between profiles P2 and P5. Also, a difference of approx. 30% of

Following the analysis of the obtained results, we noticed a correct and efficient grouping of

the evening consumption peak is observed between P6 and P1, P3, P4 (Figure 4).

the consumer profiles using the self-organizing neural networks.

, where n = 13 for each

consumption of the profile are small, which again validates the clustering model.

LOAD\_PROFILE\_T table and we organized input vectors as x(t) ∈R<sup>n</sup>

• SOM architecture—2D with 2 3 neurons/layer (dimensions) = [2 3];

• number of steps for initially processing the input space (coverSteps) = 100;

(hours) between January 1, 2014 and December 31, 2014.

the neural network:

• initial neighbor (initNeighbor) = 2;

126 Advanced Applications for Artificial Neural Networks

the distribution of the input set in Figure 3:

Figure 3. Distances among clusters distribution.

• network topology (topologyFcn) = 'hextop' and

• distance between neurons (distanceFcn) = 'linkdist'.

A short comparison of the results obtained with the three analyzed methods is summarized in Table 1.

From the analysis, we can conclude that for the determination of dynamic consumption profiles, which surprising a series of consumption patterns, the optimal method is the clustering method, and for the determination of clearly delimited profiles the most efficient method is the use of self-organizing maps.

#### 3.2. Consumption forecasting solution with ANN

Analyzing the consumption data set for 212 consumers during 4–6 weeks, a regular pattern is observed between working days or weekdays (Monday to Friday) and some differences in weekend or holidays. Therefore, for load forecasting hourly aggregated at grid operator or electricity supplier level for a typical day of the week, we can consider an autoregressive model. In this section, we approach and compare two methods for forecasting electricity consumption: statistics methods based on ARIMA and autoregressive artificial neural networks.

Autoregressive-moving-average (ARMA) models are suitable for stationary series, but most of the series are non-stationary, their mean and variance not being constant over time. The ARMA model was adapted for non-stationary time series that become stationary by differentiation,


Table 1. Comparison of the profiles obtained by SVM, K-means and O-cluster and SOM.

the resultant models being called autoregressive integrated moving average ARIMA (p, d, q). The ARIMA model (p, d, q) consists of three parts: autoregressive (AR), where p represents the autoregression order, d represents the order of differentiation required for staging the series (I) and the moving average, q being the order of the moving average. Unlike autoregression, the moving average describes phenomena with certain irregularities. Moving average is described by the following equation:

$$Y\_t = \varepsilon + \theta\_1 \varepsilon\_{t-1} + \theta\_2 \varepsilon\_{t-2} + \dots + \theta\_p \varepsilon\_{t-p} + e\_t \tag{1}$$

where Yt is the consumption, c is a constant coefficient and the θ are the parameters of the moving average and et represents the time series error.

To evaluate the results of the analysis, we used the mean squared error (MSE) and also mean absolute percentage error (MAPE) to compare the accuracy of the forecast obtained in various variants of the ARIMA model.

Data from the LOAD\_PROFILE\_T table were imported into the SAS Guide Enterprise 7.1. Starting from the input data set, we applied the autoregressive integrated moving average models. In Table 2, we presented MAPE for the AR model first order, ARMA(1,1) and ARIMA(1,1,1).

Table 2 shows that MAPE is the lowest in the autoregressive model, the accuracy of the electricity consumption forecast being the best (about 93%). The accuracy of other forecasts is over 70%. In all analyses, the degree of correlation indicates an average or poor inverse dependence.

In addition to ARIMA models, we approached the autoregressive neural networks in Matlab. We built the LOAD\_PROFILE\_HOURLY virtual table based on the LOAD\_PROFILE\_T table and the LOAD\_PROFILE\_SOM\_6 table, which includes six consumption profiles previously determined by the self-organizing maps. For simulation we considered a single profile—P6 with the largest number of consumers (6197).

Due to the structure of the input data and the fact that there is an autoregressive component of electricity consumption during a typical week, we have built a nonlinear autoregressive neural network (narnet). We configured ANN parameters as follows:


We considered 50 neurons in the hidden layer and a single input y(t)—the total consumption determined according to the formula:


Table 2. ARIMA models' results.

$$y(t) = f(y(t-1), \dots, y(t-d))\tag{2}$$

where d represents the number of records considered delays. For the first iteration of the model, we considered d = 5 and for the second iteration with better results d = 10. The architecture of the network is shown in Figure 5.

For the hidden layer, we used a bipolar sigmoid activation function and a linear activation function for the output layer. As for the training algorithm, Matlab provides the following algorithms: the Levenberg-Marquardt (LM) algorithm (trainlm), the Bayesian Regularization (BR) algorithm (trainbr) and the Scaled Conjugate Gradient (SCG) algorithm (trainscg). We developed the autoregressive neural network and compared the results obtained with the three training algorithms. The performance of the network is very good, the mean square error (MSE) being 0.0046 attained at epoch 936 for the BR training algorithm and the correlation coefficient R between the prediction and the actual value is 0.996 (Figure 6).

Figure 5. The architecture of the autoregressive neural network.

the resultant models being called autoregressive integrated moving average ARIMA (p, d, q). The ARIMA model (p, d, q) consists of three parts: autoregressive (AR), where p represents the autoregression order, d represents the order of differentiation required for staging the series (I) and the moving average, q being the order of the moving average. Unlike autoregression, the moving average describes phenomena with certain irregularities. Moving average is described

where Yt is the consumption, c is a constant coefficient and the θ are the parameters of the

To evaluate the results of the analysis, we used the mean squared error (MSE) and also mean absolute percentage error (MAPE) to compare the accuracy of the forecast obtained in various

Data from the LOAD\_PROFILE\_T table were imported into the SAS Guide Enterprise 7.1. Starting from the input data set, we applied the autoregressive integrated moving average models. In Table 2, we presented MAPE for the AR model first order, ARMA(1,1) and ARIMA(1,1,1).

Table 2 shows that MAPE is the lowest in the autoregressive model, the accuracy of the electricity consumption forecast being the best (about 93%). The accuracy of other forecasts is over 70%. In all analyses, the degree of correlation indicates an average or poor inverse dependence.

In addition to ARIMA models, we approached the autoregressive neural networks in Matlab. We built the LOAD\_PROFILE\_HOURLY virtual table based on the LOAD\_PROFILE\_T table and the LOAD\_PROFILE\_SOM\_6 table, which includes six consumption profiles previously determined by the self-organizing maps. For simulation we considered a single profile—P6

Due to the structure of the input data and the fact that there is an autoregressive component of electricity consumption during a typical week, we have built a nonlinear autoregressive neural

We considered 50 neurons in the hidden layer and a single input y(t)—the total consumption

Model MAPE [%] AR(1) 7.29 MA(1) 24.45 ARMA(1,1) 29.05 ARIMA(1,1,1) 24.97

Yt ¼ c þ θ1et�<sup>1</sup> þ θ2et�<sup>2</sup> þ … þ θpet�<sup>p</sup> þ et (1)

by the following equation:

128 Advanced Applications for Artificial Neural Networks

variants of the ARIMA model.

moving average and et represents the time series error.

with the largest number of consumers (6197).

• feedbackDelays—number of delays;

• trainFcn—training function.

Table 2. ARIMA models' results.

determined according to the formula:

network (narnet). We configured ANN parameters as follows:

• hiddenSizes—number of neurons in the hidden layer;

Figure 6. Results for R coefficient for BR algorithm.

From the error histogram (Figure 7), it can be observed that the errors are between �0.13 and +0.12, which can be considered an acceptable distribution.

We trained the network using the three algorithms (LM, RB and SCG), the best results being recorded using the Bayesian Regularization algorithm, although the Levenberg-Marquardt algorithm recorded good results with an increased performance in training.

In Table 3, the results obtained with autoregressive neural networks are compared with stochastic methods (ARMA, ARIMA and AR).

The accuracy of ANN algorithms is better (about 95%) compared to the accuracy of stochastic models. Also, the Levenberg-Marquardt and Bayesian regularization algorithms are also superior regarding the lowest MSE. The R coefficient and error distribution for neural network algorithms are better than AR, MA, ARMA and ARIMA models.

Figure 7. Errors histogram.


Table 3. Autoregressive neural networks versus stochastic methods.
