3.1. First attempts in building predictive systems for HHG experiments

The particular cases of HHG experiments that were envisaged refer to the interaction of ultrashort and intense laser pulses with overdense plasmas (plasmas with density higher than the critical density). At the most basic level, this mechanism can be understood as the reflection of the incident laser and of its subsequently created harmonics on the oscillating plasma surface (oscillating mirror model OMM [110]). Since the plasma density is higher than the critical one, the laser cannot penetrate the plasma and thus it reflects on its surface. This surface is not flat and it exhibits an oscillatory movement due to the laser-induced heating mechanisms. While it is true that the yielded spectra depends a lot on the on the initial conditions—laser intensity, pulse duration, incidence angle, plasma density—the key factor is in fact the optimization of the resonance absorption as this fundamental process may account for up to 30% of the laser energy being absorbed by the plasma. Practically, the incident electromagnetic wave excites a plasma electron wave of the same frequency and the second harmonic results out of the mix between the plasma electron wave and the electromagnetic laser pump, hence its frequency being the double of the incident wave's. Although the second harmonic is mainly reflected, part of it can propagate inside the plasma and excite a wave of the same frequency, that in turn, by mixing with the incident laser pump yields the third order harmonic. Moreover, it was also demonstrated that there is a correlation between the nonlinear, ponderomotively driven plasma surface motion and the production of energetic electrons [111, 112]. A pronounced asymmetry of longitudinal oscillations in a steep density profile is known to lead to wave breaking which in turn causes fractions of electrons to be irreversibly accelerated into the target. This kinetic process results in further absorption of energy from the laser. Furthermore, the accelerated fast electrons can themselves drive Langmuir waves, in the overdense region as well as in the ramps that form in front of the target, eventually leading to the generation of harmonics. This mechanism, namely, coherent wave excitation (CWE) [113] is the main responsible for HHG at moderate intensities. Further increase in laser intensities improves the prospects for efficient surface high order harmonics generation and, in principle, with relativistic lasers, high harmonics intensities may even exceed the intensity of the focused pulse by several orders of magnitude.

Within a query, a MapReduce stage is followed by other stages. Tez checks the dependence between them and dispatches the independent ones to be executed in parallel. Another decision towards optimization concerns performing map joins instead of shuffle ones as the map joins minimize data movement and leverage on subsequent localized execution due to the fact that the hash map on every node is integrated into a global in-memory table and solely this table is being streamed, hence joins are made faster. A compromise has to be made, though, by provisioning larger Tez containers (much larger than the YARN ones) and by allocating one CPU and some GBs of memory per each of the containers. The performance of Hive queries can also be improved by enabling compression at the various stages, from table creation to intermediate data and final output. So, for these purposes a conversion to the ORC file format was done as these files result in 78% compression as compared to the initial text ones. Therefore, a search

Finally, to a reasonable extent, data intensive workloads also benefit from in-memory processing. Tez allows speculative executions to be attempted on faster nodes according to the Longest Approximate Time to End (LATE) strategy. These approaches were found to result in an overall speed performance improvement between one and one and a half orders of magnitude. In the case of iterative jobs, such as cost based function optimizations, an alleviation of up to 20 times in

This subsection has so far been discussing just the underlying infrastructure used for building the predictive systems for laser-plasma interaction experiments optimizations, focusing not as much on the hardware but on the tools and tricks deployed for making the big data processing run faster and on less resources. However, some attention must be given also to the conceptual

The particular cases of HHG experiments that were envisaged refer to the interaction of ultrashort and intense laser pulses with overdense plasmas (plasmas with density higher than the critical density). At the most basic level, this mechanism can be understood as the reflection of the incident laser and of its subsequently created harmonics on the oscillating plasma surface (oscillating mirror model OMM [110]). Since the plasma density is higher than the critical one, the laser cannot penetrate the plasma and thus it reflects on its surface. This surface is not flat and it exhibits an oscillatory movement due to the laser-induced heating mechanisms. While it is true that the yielded spectra depends a lot on the on the initial conditions—laser intensity, pulse duration, incidence angle, plasma density—the key factor is in fact the optimization of the resonance absorption as this fundamental process may account for up to 30% of the laser energy being absorbed by the plasma. Practically, the incident electromagnetic wave excites a plasma electron wave of the same frequency and the second harmonic results out of the mix between the plasma electron wave and the electromagnetic laser pump, hence its frequency being the double of the incident wave's. Although the second harmonic is mainly reflected, part of it can propagate inside the plasma and excite a wave of the same frequency, that in turn, by mixing with the

3. Migrating from machine learning algorithms to deep learning

3.1. First attempts in building predictive systems for HHG experiments

through 1 TB of data brings now only 5 seconds of latency.

94 Machine Learning - Advanced Techniques and Emerging Applications

design of the predictive systems. This is displayed in Figure 2.

latency was obtained.

The goal of developing and deploying predictive modeling for HHG experiments was to have an estimate of the maximum order of the highest observable harmonic, along with the intensity, duration, wavelength of the various high harmonics and their conversion efficiency, given a particular laser interacting with a particular kind of plasma. The available data set consisted mainly of simulation data obtained by running various PIC codes but also from experimental data collected from the published scientific literature. Initially the data set amounted to 2 TBs but with the passing of time it reached about 5 TBs so the last predictions using deep learning were performed taking full advantage of the 5 TBs.

The first attempts in performing predictive modeling for high order harmonics generation experiments [100, 101] involved, on one hand commodity hardware with lower performances than the cloud currently used, without any GPUs and, on the other, an earlier version of Hadoop, installed and configured without any of the optimizations introduced in the meantime. This combination implied, first of all, long running times –up to several hours—just for MapReduce and further ones for the machine learning algorithms implemented with Mahout. Each additional TB of data was yet another challenge for the system and its available resources. Supervised learning made an obvious choice, consequently the most popular of the universal functional approximators [114], the MLP, was chosen as a starting point due to its versatility. Using its famous backpropagation algorithm (BKP) [115, 116] for error minimization during training, the MLP solves problems stochastically being able to provide approximate solutions even for extremely complex tasks. The high degree of connectivity between the nodes and the increased nonlinearity of this neural network cause its generalization ability to be among the best, coping rather well even with noisy and missing data. However this comes at the expense of significant running times in the training phase. While increasing the number of hidden layers is likely to lead to the improvement of overall performances, potentially revealing key features embedded in the data, adding too many of them was beyond the old system's capabilities, thus bottlenecks were reached very quickly.

The training set's input values are the laser intensity, laser wavelength, pulse duration, polarization, incidence angle and the type of plasma (introduced as ionization degree and elemental Z number) and its initial density. The desired output values in the training set are the maximum order of the highest observable harmonic, intensity values for different harmonics (including the highest one), harmonics' wavelengths, durations as well as their conversion efficiencies. About 85% of the entire data formed the training set while the rest served as a test set and these percentages were hanged on to during the whole time up to the latest deep learning implementations. Multiple MLP topologies were tested, with different types and numbers of neurons, different numbers of hidden layers, batch or incremental training with various optimization algorithms. Deciding upon the number of neurons in the input layer depends mainly on the number of parameters that define a laser-plasma interaction scenario. The number of neurons in the output layer is generally a function of the yields that need to be classified or predicted. The number of hidden layers and the number of neurons within a layer were empirically determined. Hence, three of the investigated MLPs—henceforth labeled MLP1, MLP2 and MLP3, respectively—were found to exhibit satisfactory behavior in terms of accuracy. However the running hours were discouraging especially since, according to the results, it was obvious that an upgrade towards adding more hidden layers and more neural units was imminent. MLP1 has an input layer consisting of 8 Adaline neurons, two hidden layers, each with 12 sigmoidal neurons and an output layer of 5 sigmoidal units. It was trained with batch training, while the cost function was defined in terms of mean squared error (MSE) and optimized with Steepest Descent. MLP2 has three hidden layers, each with 10 sigmoidal neurons. The second difference from MLP1 is that its cost function was optimized with resilient backpropagation. Finally, MLP3 has two hidden layers, each with 11 sigmoidal units and it deploys the Levenberg-Marquardt algorithm for finding the global minimum of the cost function. For two HHG scenarios, Table 1 displays the prediction results obtained with each of the three MLPs. Within the first scenario, laser's parameters are as follows: I <sup>¼</sup> <sup>2</sup> � 1018 <sup>W</sup>=cm2, <sup>λ</sup><sup>0</sup> <sup>¼</sup> 800 nm, polarization <sup>p</sup>, pulse duration <sup>τ</sup><sup>0</sup> <sup>¼</sup> 150 fs, incidence angle <sup>α</sup> <sup>¼</sup> <sup>45</sup>� , interacting with an aluminum overdense plasma of electronic density equal to ne <sup>¼</sup> 4nc <sup>¼</sup> <sup>6</sup>:<sup>875</sup> � 1021cm�3. For the second scenario, the laser parameters are: I <sup>¼</sup> 1019W=cm2, <sup>λ</sup><sup>0</sup> <sup>¼</sup> 800 nm, polarization <sup>p</sup>, pulse duration <sup>τ</sup><sup>0</sup> <sup>¼</sup> 100 fs, incidence angle with the plasma surface <sup>α</sup> <sup>¼</sup> <sup>60</sup>� , while the aluminum plasma has a density of ne <sup>¼</sup> 8nc <sup>¼</sup> <sup>1</sup>:<sup>375</sup> � <sup>10</sup>22cm�3. The obtained predictions were in good agreement with PIC simulations as well as the literature data. However, it is easy to notice that the predicted intensities of the highest observable harmonic are lower in comparison to both theory and PIC results. This is caused by several factors, one of them being the heterogeneity of the available interaction data and the fact that the sets were minimally processed for cleaning during the "machine learning stages". As the collected information originates from multiple sources, it is obvious that the errors affecting the recorded values have different distribution functions. Furthermore, for a particular interaction scenario, we may have several experimentally determined values for the intensity of the highest observable harmonic and several numerical results. This constitutes redundant data, its principal negative effect being the overfitting. For the MLP based predictive modeling, all the redundant data was kept as it was, without any merging or advanced filtering. Overfitting is known to produce unrealistic predictions in MLPs even with noise free data, let alone with redundancy or sparsity. On the other hand, for certain scenarios, there was no available reference. Hence, the problem of missing information was solved by running a modified version of LPIC++ and recording the corresponding yields. In spite of having applied sampling and some filtering in order to assemble equilibrated training sets, a certain degree of incipient overfitting was detected in case of MLP1 and MLP2, thus some relative underestimation or overestimation was to be expected.

Highest observable harmonic

Wavelength nm

http://dx.doi.org/10.5772/intechopen.72844

Conv. efficiency

97

Duration fs

Overcoming Challenges in Predictive Modeling of Laser-Plasma Interaction Scenarios. The Sinuous Route from…

Max. Ord. Intensity

Scenario 1

Scenario 2

Comparative results.

Table 1. Predictive modeling of HHG Scenarios 1 and 2.

W=cm2

Lit. Data <sup>50</sup> <sup>2</sup> <sup>10</sup><sup>11</sup> <sup>20</sup> <sup>16</sup> <sup>10</sup><sup>7</sup> PIC Data <sup>58</sup> <sup>2</sup>:<sup>1</sup> 1011 <sup>19</sup> 13.8 <sup>10</sup><sup>7</sup> MLP1 54 1011 21 14.4 10<sup>7</sup> MLP2 56 1011 20 14.8 10<sup>7</sup> MLP3 52 1011 19 15.4 10<sup>7</sup> DNN1 <sup>52</sup> <sup>2</sup>:<sup>18</sup> <sup>10</sup><sup>11</sup> <sup>21</sup> 16.3 <sup>10</sup><sup>7</sup> DNN2 <sup>51</sup> <sup>2</sup>:<sup>1</sup> 1011 <sup>19</sup> <sup>15</sup> <sup>10</sup><sup>7</sup> EL1 50.6 <sup>2</sup>:<sup>02</sup> <sup>10</sup><sup>11</sup> <sup>20</sup> 15.8 <sup>10</sup><sup>7</sup> EL2 <sup>50</sup> <sup>1</sup>:<sup>98</sup> <sup>10</sup><sup>11</sup> <sup>20</sup> <sup>16</sup> <sup>10</sup><sup>7</sup> CNN1 <sup>51</sup> <sup>2</sup>:<sup>12</sup> <sup>10</sup><sup>11</sup> <sup>20</sup> 16.2 <sup>10</sup><sup>7</sup> CNN2 50.2 <sup>2</sup>:<sup>04</sup> <sup>10</sup><sup>11</sup> <sup>20</sup> 15.4 <sup>10</sup><sup>7</sup> CNN3 <sup>50</sup> <sup>2</sup> <sup>10</sup><sup>11</sup> <sup>20</sup> 16.06 <sup>10</sup><sup>7</sup> EL4 <sup>50</sup> <sup>2</sup>:<sup>06</sup> <sup>10</sup><sup>11</sup> <sup>20</sup> <sup>16</sup> <sup>10</sup><sup>7</sup> EL5 50.4 <sup>1</sup>:<sup>96</sup> <sup>10</sup><sup>11</sup> <sup>20</sup> 16.02 <sup>10</sup><sup>7</sup>

Lit. Data <sup>72</sup> <sup>2</sup> <sup>10</sup><sup>11</sup> <sup>12</sup> <sup>11</sup> <sup>10</sup><sup>7</sup> PIC Data <sup>76</sup> <sup>2</sup>:<sup>1</sup> 1011 <sup>11</sup> 10.5 <sup>10</sup><sup>7</sup> MLP1 <sup>74</sup> <sup>1</sup>:<sup>5</sup> 1011 <sup>12</sup> 10.8 <sup>10</sup><sup>7</sup> MLP2 76 1011 11 10.5 10<sup>7</sup> MLP3 <sup>72</sup> <sup>2</sup> <sup>10</sup><sup>11</sup> <sup>12</sup> <sup>11</sup> <sup>10</sup><sup>7</sup> DNN1 <sup>74</sup> <sup>2</sup>:<sup>16</sup> <sup>10</sup><sup>11</sup> <sup>13</sup> 11.5 <sup>10</sup><sup>7</sup> DNN2 <sup>73</sup> <sup>2</sup>:<sup>08</sup> <sup>10</sup><sup>11</sup> <sup>11</sup> <sup>10</sup> <sup>10</sup><sup>7</sup> EL1 72.4 <sup>2</sup> <sup>10</sup><sup>11</sup> <sup>12</sup> 10.8 <sup>10</sup><sup>7</sup> EL2 <sup>72</sup> <sup>2</sup> <sup>10</sup><sup>11</sup> <sup>12</sup> <sup>11</sup> <sup>10</sup><sup>7</sup> CNN1 <sup>73</sup> <sup>2</sup>:<sup>06</sup> <sup>10</sup><sup>11</sup> <sup>11</sup> <sup>12</sup> <sup>10</sup><sup>7</sup> CNN2 72.08 <sup>2</sup> <sup>10</sup><sup>11</sup> <sup>12</sup> <sup>11</sup> <sup>10</sup><sup>7</sup> CNN3 <sup>72</sup> <sup>2</sup> <sup>10</sup><sup>11</sup> <sup>12</sup> <sup>11</sup> <sup>10</sup><sup>7</sup> EL4 <sup>72</sup> <sup>2</sup> <sup>10</sup><sup>11</sup> <sup>12</sup> <sup>11</sup> <sup>10</sup><sup>7</sup> EL5 72.05 <sup>2</sup>:<sup>08</sup> <sup>10</sup><sup>11</sup> <sup>13</sup> 10.6 <sup>10</sup><sup>7</sup>


Table 1. Predictive modeling of HHG Scenarios 1 and 2.

efficiencies. About 85% of the entire data formed the training set while the rest served as a test set and these percentages were hanged on to during the whole time up to the latest deep learning implementations. Multiple MLP topologies were tested, with different types and numbers of neurons, different numbers of hidden layers, batch or incremental training with various optimization algorithms. Deciding upon the number of neurons in the input layer depends mainly on the number of parameters that define a laser-plasma interaction scenario. The number of neurons in the output layer is generally a function of the yields that need to be classified or predicted. The number of hidden layers and the number of neurons within a layer were empirically determined. Hence, three of the investigated MLPs—henceforth labeled MLP1, MLP2 and MLP3, respectively—were found to exhibit satisfactory behavior in terms of accuracy. However the running hours were discouraging especially since, according to the results, it was obvious that an upgrade towards adding more hidden layers and more neural units was imminent. MLP1 has an input layer consisting of 8 Adaline neurons, two hidden layers, each with 12 sigmoidal neurons and an output layer of 5 sigmoidal units. It was trained with batch training, while the cost function was defined in terms of mean squared error (MSE) and optimized with Steepest Descent. MLP2 has three hidden layers, each with 10 sigmoidal neurons. The second difference from MLP1 is that its cost function was optimized with resilient backpropagation. Finally, MLP3 has two hidden layers, each with 11 sigmoidal units and it deploys the Levenberg-Marquardt algorithm for finding the global minimum of the cost function. For two HHG scenarios, Table 1 displays the prediction results obtained with each of the three MLPs. Within the first scenario, laser's parameters are as follows: I <sup>¼</sup> <sup>2</sup> � 1018 <sup>W</sup>=cm2,

96 Machine Learning - Advanced Techniques and Emerging Applications

<sup>λ</sup><sup>0</sup> <sup>¼</sup> 800 nm, polarization <sup>p</sup>, pulse duration <sup>τ</sup><sup>0</sup> <sup>¼</sup> 150 fs, incidence angle <sup>α</sup> <sup>¼</sup> <sup>45</sup>�

pulse duration <sup>τ</sup><sup>0</sup> <sup>¼</sup> 100 fs, incidence angle with the plasma surface <sup>α</sup> <sup>¼</sup> <sup>60</sup>�

relative underestimation or overestimation was to be expected.

with an aluminum overdense plasma of electronic density equal to ne <sup>¼</sup> 4nc <sup>¼</sup> <sup>6</sup>:<sup>875</sup> � 1021cm�3. For the second scenario, the laser parameters are: I <sup>¼</sup> 1019W=cm2, <sup>λ</sup><sup>0</sup> <sup>¼</sup> 800 nm, polarization <sup>p</sup>,

num plasma has a density of ne <sup>¼</sup> 8nc <sup>¼</sup> <sup>1</sup>:<sup>375</sup> � <sup>10</sup>22cm�3. The obtained predictions were in good agreement with PIC simulations as well as the literature data. However, it is easy to notice that the predicted intensities of the highest observable harmonic are lower in comparison to both theory and PIC results. This is caused by several factors, one of them being the heterogeneity of the available interaction data and the fact that the sets were minimally processed for cleaning during the "machine learning stages". As the collected information originates from multiple sources, it is obvious that the errors affecting the recorded values have different distribution functions. Furthermore, for a particular interaction scenario, we may have several experimentally determined values for the intensity of the highest observable harmonic and several numerical results. This constitutes redundant data, its principal negative effect being the overfitting. For the MLP based predictive modeling, all the redundant data was kept as it was, without any merging or advanced filtering. Overfitting is known to produce unrealistic predictions in MLPs even with noise free data, let alone with redundancy or sparsity. On the other hand, for certain scenarios, there was no available reference. Hence, the problem of missing information was solved by running a modified version of LPIC++ and recording the corresponding yields. In spite of having applied sampling and some filtering in order to assemble equilibrated training sets, a certain degree of incipient overfitting was detected in case of MLP1 and MLP2, thus some

, interacting

, while the alumi-

Another aspect to be noted is the fact that all the MLPs discussed in this chapter feature hidden and output layers of sigmoidal units and this is the most important factor responsible for underestimation. The sigmoid activation function has a non-zero mean being prone to cause non-zero values in the Hessian matrix of the objective function, hence modifying the global minimum of the latter. A high number of sigmoidal neurons in a network strongly influences the weights adjustment during training, specifically, the corresponding weights in the last layers tend to take very small values (close to zero) and this saturation can last a very long time. To a good extent, the effect was mitigated by using a random initialization of weights, not only in the very beginning but also during the training process. Respectively, after observing a persistent saturation situation for a number of epochs, I performed some adjustments by adding small random values to the stagnating weights. This was found to improve the MLP's estimations on one hand and to increase the predicted values on another. Perhaps this was also one of the causes in the overestimation of certain parameters. A slightly better and more stable behavior was observed in case of MLP3, having required far less additive procedures of random values to the weights. Comparatively with the other two, the errors during training were smaller, the convergence faster and the predicted values for the high order harmonics were, in general, closer to the literature data, owing to the Levenberg–Marquardt algorithm, an algorithm known to improve the overall convergence speed due to the combination between Newton's method and Steepest Descent.

As stated above, on the course of interaction, the laser heats the plasma through various mechanisms. Inherently, some of the electrons acquire a lot of energy and become "hot", having very high temperatures, much higher than the plasma temperature. The percentage of hot electrons is very low but, in spite of this, their effects are not always negligible and, for certain experiments, even damaging. For an HHG experiment, a high percentage of hot electrons can disturb the oscillations of the plasma surface, a situation that affects the reflection of the laser, the CWE mechanism and consequently the HHG. For instance, a strong Brunel effect [111] leads to more thermal electrons. Consequently, it is important to have an accurate estimation of electron temperatures within the plasma along with the corresponding fractions of particles. For this purpose, another MLP (MLP4) was designed since the previous three gave only modest evaluations. Input values in the training set incorporate apart from the previously stated ones, the plasma's initial electronic temperature. The desired output values are electron temperatures accompanied by the estimated percentages of electrons that have these temperatures and the corresponding time moments. The best performing topology was found to be an MLP with 9 Adaline neurons in the input layer, 2 hidden layers, each with 11 sigmoidal units and an output layer with 3 neurons, also sigmoidal. The training was performed incrementally and the cost function defined in terms of MSE and optimized with Levenberg-Marquardt. For the same interaction conditions discussed above plus two additional cases (for Scenario 1, the incidence angle was modified to 30 from the normal to the plasma surface, this constituting Scenario 3 while for the same parameters in Scenario 2, the incidence angle was changed to 45 , this being labeled Scenario 4.) prediction results are shown in four graphs below. Figures 3 and 4 display comparatively the percentage of electrons estimated to have a temperature above 10 keV at different time moments and above 100 keV, respectively. Figure 3 refers to Scenarios 1 and 3, while Figure 4 concerns the second and the fourth. The procedures of random initialization and adjustment (during training) of weights were also applied in an attempt of

improving MLP4's performance. However, it is the belief of this author that the combination between the network's topology, the sampling of available interaction data, the random additions and the incremental training, have led to some significant overestimations of the percentages of electrons (some 10%) in certain cases as values reported in the literature are smaller.

Figure 4. The variation in the percentage of electrons that exceed 10 and 100 keV, for interaction conditions consistent to Scenario 2 and Scenario 4. (a) Refers to the percentage of electrons that exceed 10 keV in temperature while. (b) Refers to

Figure 3. The variation in the percentage of electrons that exceed 10 and 100 keV, for interaction conditions consistent to Scenario 1 and Scenario 3. (a) Refers to the percentage of electrons that exceed 10 keV in temperature while. (b) Refers to

Overcoming Challenges in Predictive Modeling of Laser-Plasma Interaction Scenarios. The Sinuous Route from…

http://dx.doi.org/10.5772/intechopen.72844

99

those exceeding 100 keV.

those exceeding 100 keV.

Prior to migrating towards deep learning, some trials were made with an unsupervised network, namely a SOM. The same training sets were used just that the data was differently organized, namely one entry in the training set consists of a 5 10 matrix. The matrix's columns stand for: Overcoming Challenges in Predictive Modeling of Laser-Plasma Interaction Scenarios. The Sinuous Route from… http://dx.doi.org/10.5772/intechopen.72844 99

Another aspect to be noted is the fact that all the MLPs discussed in this chapter feature hidden and output layers of sigmoidal units and this is the most important factor responsible for underestimation. The sigmoid activation function has a non-zero mean being prone to cause non-zero values in the Hessian matrix of the objective function, hence modifying the global minimum of the latter. A high number of sigmoidal neurons in a network strongly influences the weights adjustment during training, specifically, the corresponding weights in the last layers tend to take very small values (close to zero) and this saturation can last a very long time. To a good extent, the effect was mitigated by using a random initialization of weights, not only in the very beginning but also during the training process. Respectively, after observing a persistent saturation situation for a number of epochs, I performed some adjustments by adding small random values to the stagnating weights. This was found to improve the MLP's estimations on one hand and to increase the predicted values on another. Perhaps this was also one of the causes in the overestimation of certain parameters. A slightly better and more stable behavior was observed in case of MLP3, having required far less additive procedures of random values to the weights. Comparatively with the other two, the errors during training were smaller, the convergence faster and the predicted values for the high order harmonics were, in general, closer to the literature data, owing to the Levenberg–Marquardt algorithm, an algorithm known to improve the overall convergence speed due to the combination between

As stated above, on the course of interaction, the laser heats the plasma through various mechanisms. Inherently, some of the electrons acquire a lot of energy and become "hot", having very high temperatures, much higher than the plasma temperature. The percentage of hot electrons is very low but, in spite of this, their effects are not always negligible and, for certain experiments, even damaging. For an HHG experiment, a high percentage of hot electrons can disturb the oscillations of the plasma surface, a situation that affects the reflection of the laser, the CWE mechanism and consequently the HHG. For instance, a strong Brunel effect [111] leads to more thermal electrons. Consequently, it is important to have an accurate estimation of electron temperatures within the plasma along with the corresponding fractions of particles. For this purpose, another MLP (MLP4) was designed since the previous three gave only modest evaluations. Input values in the training set incorporate apart from the previously stated ones, the plasma's initial electronic temperature. The desired output values are electron temperatures accompanied by the estimated percentages of electrons that have these temperatures and the corresponding time moments. The best performing topology was found to be an MLP with 9 Adaline neurons in the input layer, 2 hidden layers, each with 11 sigmoidal units and an output layer with 3 neurons, also sigmoidal. The training was performed incrementally and the cost function defined in terms of MSE and optimized with Levenberg-Marquardt. For the same interaction conditions discussed above plus two additional cases (for Scenario 1, the

Scenario 3 while for the same parameters in Scenario 2, the incidence angle was changed to 45

this being labeled Scenario 4.) prediction results are shown in four graphs below. Figures 3 and 4 display comparatively the percentage of electrons estimated to have a temperature above 10 keV at different time moments and above 100 keV, respectively. Figure 3 refers to Scenarios 1 and 3, while Figure 4 concerns the second and the fourth. The procedures of random initialization and adjustment (during training) of weights were also applied in an attempt of

from the normal to the plasma surface, this constituting

,

Newton's method and Steepest Descent.

98 Machine Learning - Advanced Techniques and Emerging Applications

incidence angle was modified to 30

Figure 3. The variation in the percentage of electrons that exceed 10 and 100 keV, for interaction conditions consistent to Scenario 1 and Scenario 3. (a) Refers to the percentage of electrons that exceed 10 keV in temperature while. (b) Refers to those exceeding 100 keV.

Figure 4. The variation in the percentage of electrons that exceed 10 and 100 keV, for interaction conditions consistent to Scenario 2 and Scenario 4. (a) Refers to the percentage of electrons that exceed 10 keV in temperature while. (b) Refers to those exceeding 100 keV.

improving MLP4's performance. However, it is the belief of this author that the combination between the network's topology, the sampling of available interaction data, the random additions and the incremental training, have led to some significant overestimations of the percentages of electrons (some 10%) in certain cases as values reported in the literature are smaller.

Prior to migrating towards deep learning, some trials were made with an unsupervised network, namely a SOM. The same training sets were used just that the data was differently organized, namely one entry in the training set consists of a 5 10 matrix. The matrix's columns stand for:

plasma (ionization degree, initial electronic temperature, initial plasma density, final plasma density, maximum plasma density), laser (intensity, wavelength, pulse duration, polarization, incidence angle) and 8 columns characterizing 8 different high order harmonics including the highest one (order, intensity, wavelength, duration, conversion efficiency). Several topologies were tested. However, just one of them yielded satisfactory results, namely a 2D network. The neurons' positions into the map were optimized based on Euclidian distance minimization and the competitive learning principle [117, 118]. SOM1 has a total of 16 21 nodes, disposed on a regular rectangular grid, with 16 nodes for mapping the harmonics' intensity and 21 for the orders of the harmonics. While a color code was employed for duration of pulses, the

wavelengths and conversion efficiencies were derived computationally and written in an additional text file accompanying the map. The large number of nodes in this network is the consequence of the need for a better visualization of the final results. However, this weighs considerably in terms of number of training epochs and computation time and it was found that the SOM required far more resources than the MLPs and it took longer to train. In principle, it would be ideal to add more units and some improvements in terms of algorithms, along with the elimination of the accompanying text file and the associated computationally derived values. This basically means a SOM with more than two dimensions which, at the time, it was nearly impossible to implement. Hence, I desisted to pursue the development of predictive modeling using unsupervised learning. For exemplification, MLP performances in predicting high order harmonics and their features—for the interaction conditions described in Scenario 1—are displayed in Table 2, together with the SOM's and the results obtained from PIC simulations. The agreement between the forecasts of the MLPs and the ones of SOM is quite good, the values

Overcoming Challenges in Predictive Modeling of Laser-Plasma Interaction Scenarios. The Sinuous Route from…

http://dx.doi.org/10.5772/intechopen.72844

101

3.2. Deep learning: Towards improved predictive systems for HHG experiments

In the view of building better predictive systems and even recommender systems for optimized laser-plasma interaction experiments, hardware upgrades were firstly made. Apart from adding an extra cluster node, replacing the storage hard drives with increased capacity ones in all computers and adding extra 8GB of RAM to all of them, a total of four GeForce GTX Titan were attached to the cluster, one by node. At the most basic level, deep learning networks can be viewed as modified MLPs that contain a high number of units and layers and are algorithmically more complex than the classical MLPs. Hence, the GPUs provide support for heavy computations. The Docker engine was installed on the GPU nodes along with the necessary Nvidia drivers and the nvidia-docker. A Docker image containing Theano, TensorFlow, Keras, Caffe, cuDNN and of course CUDA 8.0 and Ubuntu 14.04 was downloaded from GitHub, built and deployed as a container on the GPUs. All the deep learning based predictive modeling systems described in this chapter were discovered (structurally), trained, built and tested using these libraries. The optimal ones were implemented and deployed on the Hadoop cluster. The containerization of GPU applications provides important benefits such as reproducible builds, ease of deployment, isolation of individual devices running across heterogeneous driver/toolkit environments, requiring only Nvidia drivers to be installed. The images are agnostic of the Nvidia driver, with the required character devices and driver files being mounted when starting the

The designation of the deep learning based predictive modeling systems were, for start, the same HHG experiments. However, the data lake increasingly incorporates other related interaction data. It is expected that more available information on what happens during various experiments performed in similar conditions will help to better understand the physics of interaction and consequently, to foresee what phenomena might occur. Huge data sets needed for training after having been subject to MapReduce—have to be transferred to the GPU nodes. While the GPU memory system provides a higher bandwidth as compared to the CPU memory system, transferring data between the main memory and GPU memory is very slow. Copying via DMA to and from the GPU over the PCIe bus involves expensive context switches that reduce the available bandwidth considerably. This is why directives such as "gmp shared" and "gmp

being within the same range.

container on the target machine.


Comparative results for harmonics of orders 10, 20, 30, 40 and 50.

Table 2. Predictive modeling of HHG Scenario 1 using a SOM.

wavelengths and conversion efficiencies were derived computationally and written in an additional text file accompanying the map. The large number of nodes in this network is the consequence of the need for a better visualization of the final results. However, this weighs considerably in terms of number of training epochs and computation time and it was found that the SOM required far more resources than the MLPs and it took longer to train. In principle, it would be ideal to add more units and some improvements in terms of algorithms, along with the elimination of the accompanying text file and the associated computationally derived values. This basically means a SOM with more than two dimensions which, at the time, it was nearly impossible to implement. Hence, I desisted to pursue the development of predictive modeling using unsupervised learning. For exemplification, MLP performances in predicting high order harmonics and their features—for the interaction conditions described in Scenario 1—are displayed in Table 2, together with the SOM's and the results obtained from PIC simulations. The agreement between the forecasts of the MLPs and the ones of SOM is quite good, the values being within the same range.
