3.2 Quantum neural reinforcement learning, robotics, and quantum adaptive computation

Quantum robotics involves the need for the development of quantum adaptive algorithms that allow the robot to process alternatives and select appropriate actions using quantum rules [6, 13–15], that is, to incorporate decisions in quantum AI. In this context, there are two major types of artificial agents that one may consider:


The first type of agent is addressed as a classical robot dealing with problems at a classical level but whose computational substrate is run via cloud access on a quantum computer, thus, pointing toward a possible future where quantum computation is incorporated on different robotic systems by way of the internet of things and cloud-based access to quantum devices.

The second type of agents corresponds to quantum software robots (quantum bots) that are implemented within a quantum computer and can be used for the adaptive management of target quantum registers and for the purpose of more complex adaptive computation [6, 13–15].

This second type of agents forms the basis for AI solutions aiming at intelligent quantum computing systems with application in quantum internet technologies and, also, possible quantum adaptive error correction.

This latter point (quantum adaptive error correction) must draw specifically on the empirical implementation in physical devices, since it is this implementation that may ultimately test the best adaptive algorithms for quantum error correction. A basic direction, in this case, regards echo strengthening, in order to diminish the echoes coming from alternatives that do not fall in an intended computation.

We do not address this last point here, but rather illustrate the implementation of the first type of agent in the context of an adaptive computation of a classical gamble, namely, optimal action selection in a classical gambling problem through quantum neural reinforcement learning (QNRL).

In this case, the artificial agent is dealing with a classical problem and implementing its decision processing on a QUANN, namely, the agent has an action set described by 2<sup>d</sup> binary strings; following an evolutionary computation framework, we use d-length genetic codes to address actions, so that the actions' codes are comprised of d loci, each with two alleles, 0 and 1.

Now, given each alternative action, the agent is offered a classical gamble on a measurable space ð Þ Ω; ℘ where ℘ is a sigma-algebra of subsets of Ω and Ω ¼ f g w0; w1; …; wN�<sup>1</sup> is the set of rewards for the gamble, which we consider, in this example, to be discrete, although the results also apply to continuous reward spaces and (classical) probability distributions.

Now, for each action genetic code s∈ A<sup>d</sup> <sup>2</sup> there is a corresponding gamble probability measure P<sup>s</sup> that is offered to the agent, so that the conditional expected value for the reward w can be calculated as:

Quantum Neural Machine Learning: Theory and Experiments DOI: http://dx.doi.org/10.5772/intechopen.84149

and experimental implementation of a form of reinforcement learning using QUANNs, namely the quantum neural reinforcement learning (QNRL) and its

3.2 Quantum neural reinforcement learning, robotics, and quantum adaptive

algorithms that allow the robot to process alternatives and select appropriate actions using quantum rules [6, 13–15], that is, to incorporate decisions in quantum AI. In this context, there are two major types of artificial agents that

Quantum robotics involves the need for the development of quantum adaptive

• classical agents that implement classical actions but whose cognitive substrate

• quantum agents that implement quantum operations on a quantum target.

The second type of agents corresponds to quantum software robots (quantum bots) that are implemented within a quantum computer and can be used for the adaptive management of target quantum registers and for the purpose of more

This second type of agents forms the basis for AI solutions aiming at intelligent quantum computing systems with application in quantum internet technologies

This latter point (quantum adaptive error correction) must draw specifically on the empirical implementation in physical devices, since it is this implementation that may ultimately test the best adaptive algorithms for quantum error correction. A basic direction, in this case, regards echo strengthening, in order to diminish the echoes coming from alternatives that do not fall in an intended computation.

We do not address this last point here, but rather illustrate the implementation of the first type of agent in the context of an adaptive computation of a classical gamble, namely, optimal action selection in a classical gambling problem through

Now, given each alternative action, the agent is offered a classical gamble on a

<sup>2</sup> there is a corresponding gamble prob-

Ω ¼ f g w0; w1; …; wN�<sup>1</sup> is the set of rewards for the gamble, which we consider, in this example, to be discrete, although the results also apply to continuous reward

ability measure P<sup>s</sup> that is offered to the agent, so that the conditional expected value

In this case, the artificial agent is dealing with a classical problem and implementing its decision processing on a QUANN, namely, the agent has an action set described by 2<sup>d</sup> binary strings; following an evolutionary computation framework, we use d-length genetic codes to address actions, so that the actions' codes are

measurable space ð Þ Ω; ℘ where ℘ is a sigma-algebra of subsets of Ω and

classical level but whose computational substrate is run via cloud access on a quantum computer, thus, pointing toward a possible future where quantum computation is incorporated on different robotic systems by way of the internet of

The first type of agent is addressed as a classical robot dealing with problems at a

connection to quantum robotics and quantum adaptive computation.

Artificial Intelligence - Applications in Medicine and Biology

computation

one may consider:

is quantum computational;

things and cloud-based access to quantum devices.

and, also, possible quantum adaptive error correction.

quantum neural reinforcement learning (QNRL).

comprised of d loci, each with two alleles, 0 and 1.

spaces and (classical) probability distributions. Now, for each action genetic code s∈ A<sup>d</sup>

for the reward w can be calculated as:

108

complex adaptive computation [6, 13–15].

$$E[\boldsymbol{w}|\mathbf{s}] = \sum\_{n=0}^{N-1} \boldsymbol{w}\_n P\_\mathbf{s}[\boldsymbol{w}\_n] \tag{28}$$

The goal for the agent is to select the action that maximizes this conditional expected reward, that is:

$$\mathbf{s}^\* = \underset{\mathbf{s}}{\text{arg }\max} \, E[w|\mathbf{s}] \tag{29}$$

To solve the optimization problem in Eq. (29), we use a variant of QNRL, which applies modular networked learning [16], in the sense that, instead of a single neural network for a single problem, we expand the cognitive architecture and work with a modular system of neural networks.

Modular networked learning (MNL) was addressed in [16] and applied to financial market prediction, where, instead of a single problem and a single target, one uses an expanded cognitive architecture to work on multiple targets with a module assigned to each target and possible links between the modules used to map links between subproblems of a more complex problem.

For modular neural networks, the resulting cognitive architecture resembles an artificial brain with specialized "brain regions" devoted to different tasks and connections between different neural modules corresponding to connections between different brain regions. In the present case, the agent's "artificial brain" (as shown in Figure 4) is comprised of three "brain regions" connected with each other for a specific functionality, where the first module (first brain region) corresponds to the action exploration region, the second module (second brain region) corresponds to the reward processing region, and the third module to the decision region.

The connections between the modules follow the hierarchical process associated with the necessary quantum reinforcement learning for each action, Figure 4 expresses this relation. The reinforcement learning, in this case, is a form of quantum search, implemented on the above modular structure, that proceeds in two stages: the exploration stage and the exploitation stage.

In the exploration stage, the agent's first brain region, taking advantage of quantum superposition, explores with equal weights, in parallel, each alternative initial action and the second brain region processes the conditional expected rewards; this last processing is based on optimizing quantum circuits [6], where

Figure 4. Modular structure for the reward maximization problem.

the unitary operator for the second brain region incorporates the optimization itself.

The second brain region will work as a form of oracle in the remaining adaptive computation and allows for the agent's artificial brain to implement an optimal expected reward-seeking dynamics.

Now, in the second phase of the exploration stage, the synaptic connections from the first to the second brain region are activated, leading to a quantum entangled dynamics between the two brain regions, where the first region acts as the control (input layer) and the second as the target (output layer).

Thus, at the end of the exploration stage, the first two brain regions exhibit an entangled dynamics. This is a basic point of quantum strategic cognition, in the sense that the processing of the alternative courses of action is not localized in a specific neuron or neurons, but rather it leads to quantum correlations between different brain regions; these connections allow the artificial brain to efficiently select the best course of action, from the evaluation of the alternatives and rewards.

In the exploitation stage, the synaptic connections from the first brain region (the action exploration region) to the third brain region (the decision region) are activated first, so that the decision region is first processing the explored alternative actions, becoming entangled with the action exploration region; then, the synaptic connections between the reward processing region and the decision region are activated for the conditional expected reward processing by the decision module. In this way, the decision module makes the transition for the optimal action, consulting the "oracle" (which is the reward processing module) only once.

The artificial brain thus takes advantage of quantum entanglement in order to adaptively output the optimal action. Formalizing this dynamics, the artificial brain is initialized in a nonfiring probe and response dynamics so that the initial density is:

$$
\hat{\rho}\_{\mathbf{0}} = |\mathbf{0}\rangle\langle\mathbf{0}|^{\otimes 3d} \tag{30}
$$

¼ cos

arg max <sup>s</sup>, k

DOI: http://dx.doi.org/10.5772/intechopen.84149

<sup>ρ</sup>^<sup>1</sup> <sup>¼</sup> <sup>U</sup>^ <sup>1</sup>ρ^0U^ <sup>1</sup>

ated with each alternative neural pattern.

<sup>U</sup>^ <sup>2</sup> <sup>¼</sup> <sup>∑</sup>

†

s∈ A<sup>d</sup> 2

¼ ∑ r, s∈ <sup>A</sup><sup>d</sup> 2

<sup>U</sup>^ <sup>3</sup> <sup>¼</sup> <sup>∑</sup> s∈ A<sup>d</sup> 2

†

¼ ∑ <sup>r</sup>, <sup>s</sup><sup>∈</sup> <sup>A</sup><sup>d</sup> 2

j i<sup>s</sup> h j <sup>s</sup> <sup>⊗</sup> <sup>d</sup>

j i<sup>r</sup> h j <sup>s</sup> <sup>⊗</sup> <sup>d</sup>

j is h j s ⊗ 1

is the only echo that it gets.

<sup>ρ</sup>^<sup>2</sup> <sup>¼</sup> <sup>U</sup>^ <sup>2</sup>ρ^1U^ <sup>2</sup>

module (the third brain region).

actions, by way of the operator:

which leads to the density:

<sup>ρ</sup>^<sup>3</sup> <sup>¼</sup> <sup>U</sup>^ <sup>3</sup>ρ^2U^ <sup>3</sup>

actions.

111

neural activity.

0

B@

E w½ � js <sup>2</sup> <sup>π</sup>

Quantum Neural Machine Learning: Theory and Experiments

†

1

CAj i <sup>0</sup> <sup>þ</sup> sin

arg max <sup>s</sup>, k

0

B@

after the first phase of the of exploration stage, the resulting density is given by:

Thus, the neural field is probing, for the first brain region, each alternative neural pattern (each alternative action) with equal weight, the response dynamics also comes, for the first brain region, from each alternative neural pattern with equal weight, which means that the echoes for the first brain region are independent from the echoes for the remaining brain regions and show an equal intensity associ-

On the other hand, for the second brain region, the neural field exhibits a reward-seeking dynamics that is adaptive with respect to the optimal action; that is, the probing dynamics is directed toward the optimal action and the response dynamics also comes from the optimal action, so that, due to the adaptive unitary propagation, the second brain region is projecting over the optimum value, and this

The third brain region still has a projective dynamics toward the nonfiring

Now, for the second phase of the exploration stage, we have the operator:

which leads to the density after the second phase of the exploration stage:

<sup>k</sup>¼<sup>1</sup> <sup>s</sup><sup>∗</sup> <sup>k</sup> ⊕ rk �

Thus, after the second phase, the first and second brain regions exhibit an entangled probe and response dynamics, where the neural field, for second brain region, is effectively computing both the rewards and the explored actions.

Next comes the exploitation stage with the neural processing for the decision

The first step of the exploitation stage is the processing of the initially explored

<sup>k</sup>¼<sup>1</sup> <sup>s</sup><sup>∗</sup> <sup>k</sup> ⊕ rk �

That is, the probe and response dynamics for the third brain region are correlated and coincident with the probe and response dynamics for the first brain region, so that the third brain region is effectively computing the initially explored

<sup>k</sup>¼<sup>1</sup> ð Þ 1 � sk 1

� � <sup>s</sup><sup>∗</sup>

^ <sup>⊗</sup> <sup>d</sup> ⊗ <sup>d</sup>

j i<sup>r</sup> h j <sup>s</sup> <sup>⊗</sup> <sup>d</sup>

<sup>k</sup>¼<sup>1</sup> ð Þ 1 � sk 1

� � <sup>s</sup><sup>∗</sup>

� � " # <sup>⊗</sup> <sup>1</sup>

^ <sup>þ</sup> skσ^<sup>1</sup>

<sup>k</sup> ⊕ sk � � �

<sup>2</sup><sup>d</sup> <sup>⊗</sup> j i <sup>0</sup> h j <sup>0</sup> <sup>⊗</sup> <sup>d</sup> (36)

^ <sup>þ</sup> skσ^<sup>1</sup>

<sup>k</sup> ⊕ sk � �

� � (37)

� ⊗ j ir h j s <sup>2</sup><sup>d</sup> (38)

E w½ � js <sup>2</sup> <sup>π</sup>

¼ þj iþh j <sup>⊗</sup> <sup>d</sup> <sup>⊗</sup> <sup>s</sup><sup>∗</sup> j i <sup>s</sup><sup>∗</sup> h j <sup>⊗</sup> j i <sup>0</sup> h j <sup>0</sup> <sup>⊗</sup> <sup>d</sup> (34)

1

CAj i<sup>1</sup> <sup>¼</sup> <sup>s</sup><sup>∗</sup>

k � � �

^ <sup>⊗</sup> <sup>d</sup> (35)

Now, we denote by s<sup>∗</sup> <sup>k</sup> the k-th bit in the string s<sup>∗</sup>, and use the following notation for the maximization in Eq. (29) evaluated at the k-th bit:

$$\mathbf{s}\_k^\* = \underset{\mathbf{s}\_2, k}{\text{arg }\max} \, E[w|\mathbf{s}] \tag{31}$$

Using this notation, the first phase of the exploration stage is given by the unitary operator:

$$\begin{split} \hat{U}\_{1} &= \\ &= \hat{U}\_{WH} \, ^{\otimes d} \otimes ^{d}\_{k=1} \left( \cos \left( \frac{\arg \max E[\boldsymbol{\nu} | \mathbf{s}]}{2} \pi \right) \hat{1} - i \sin \left( \frac{\arg \max E[\boldsymbol{\nu} | \mathbf{s}]}{2} \pi \right) \hat{o}\_{2} \right) \otimes \hat{1}^{\otimes d} \end{split} \tag{32}$$

The operator incorporates the optimization dynamics into the conditional quantum gates' parameters themselves. Since we have:

$$\left(\cos\left(\frac{\underset{\mathbf{s},k}{\underset{\mathbf{s},k}}\underset{\mathbf{z}}{E[w|\mathbf{s}]}}{2}\pi\right)\hat{\mathbf{1}} - i\sin\left(\frac{\underset{\mathbf{s},k}{\underset{\mathbf{s},k}}\underset{\mathbf{z}}{E[w|\mathbf{s}]}}{2}\pi\right)\hat{\sigma}\_2\right)|0\rangle\tag{33}$$

Quantum Neural Machine Learning: Theory and Experiments DOI: http://dx.doi.org/10.5772/intechopen.84149

the unitary operator for the second brain region incorporates the optimization

computation and allows for the agent's artificial brain to implement an optimal

the control (input layer) and the second as the target (output layer).

ing the "oracle" (which is the reward processing module) only once.

for the maximization in Eq. (29) evaluated at the k-th bit:

s∗

arg max s, k

1

CA^<sup>1</sup> � <sup>i</sup>sin

0 B@

tum gates' parameters themselves. Since we have:

E w½ � js <sup>2</sup> <sup>π</sup> <sup>k</sup> ¼ arg max s, <sup>k</sup>

Using this notation, the first phase of the exploration stage is given by the

E w½ � js <sup>2</sup> <sup>π</sup>

1

The operator incorporates the optimization dynamics into the conditional quan-

0

B@

CA^<sup>1</sup> � <sup>i</sup>sin

arg max <sup>s</sup>, <sup>k</sup>

Now, in the second phase of the exploration stage, the synaptic connections from the first to the second brain region are activated, leading to a quantum entangled dynamics between the two brain regions, where the first region acts as

The second brain region will work as a form of oracle in the remaining adaptive

Thus, at the end of the exploration stage, the first two brain regions exhibit an entangled dynamics. This is a basic point of quantum strategic cognition, in the sense that the processing of the alternative courses of action is not localized in a specific neuron or neurons, but rather it leads to quantum correlations between different brain regions; these connections allow the artificial brain to efficiently select the best course of action, from the evaluation of the alternatives and

In the exploitation stage, the synaptic connections from the first brain region (the action exploration region) to the third brain region (the decision region) are activated first, so that the decision region is first processing the explored alternative actions, becoming entangled with the action exploration region; then, the synaptic connections between the reward processing region and the decision region are activated for the conditional expected reward processing by the decision module. In this way, the decision module makes the transition for the optimal action, consult-

The artificial brain thus takes advantage of quantum entanglement in order to adaptively output the optimal action. Formalizing this dynamics, the artificial brain is initialized in a nonfiring probe and response dynamics so that the initial

<sup>ρ</sup>^<sup>0</sup> <sup>¼</sup> j i <sup>0</sup> h j <sup>0</sup> <sup>⊗</sup> <sup>3</sup><sup>d</sup> (30)

arg max s, k

0 B@

E w½ � js <sup>2</sup> <sup>π</sup>

1

CA<sup>σ</sup>^<sup>2</sup>

1

E w½ � js (31)

E w½ � js <sup>2</sup> <sup>π</sup>

1 CA<sup>σ</sup>^<sup>2</sup>

CAj i <sup>0</sup> (33)

1

CA <sup>⊗</sup> ^<sup>1</sup> ⊗ d

(32)

<sup>k</sup> the k-th bit in the string s<sup>∗</sup>, and use the following notation

itself.

rewards.

density is:

unitary operator:

<sup>¼</sup> <sup>U</sup>^ WH

cos

0

B@

110

0

B@

<sup>⊗</sup> <sup>d</sup> ⊗ <sup>d</sup>

<sup>k</sup>¼<sup>1</sup> cos

arg max <sup>s</sup>, <sup>k</sup>

0 B@

<sup>U</sup>^ <sup>1</sup> <sup>¼</sup>

Now, we denote by s<sup>∗</sup>

expected reward-seeking dynamics.

Artificial Intelligence - Applications in Medicine and Biology

$$=\cos\left(\frac{\underset{\mathbf{s}\_{\mathsf{b}},k}{\operatorname{arg\,max}}E[w|\mathbf{s}]}{2}\pi\right)|0\rangle+\sin\left(\frac{\underset{\mathbf{s}\_{\mathsf{b}},k}{\operatorname{arg\,max}}E[w|\mathbf{s}]}{2}\pi\right)|1\rangle=|\mathbf{s}\_{k}^\*\rangle$$

after the first phase of the of exploration stage, the resulting density is given by:

$$\hat{\rho}\_1 = \hat{U}\_1 \hat{\rho}\_0 \hat{U}\_1^\dagger = |+\rangle\langle+|^{\otimes d} \otimes |\mathbf{s}^\*\rangle\langle\mathbf{s}^\*| \otimes |\mathbf{0}\rangle\langle\mathbf{0}|^{\otimes d} \tag{34}$$

Thus, the neural field is probing, for the first brain region, each alternative neural pattern (each alternative action) with equal weight, the response dynamics also comes, for the first brain region, from each alternative neural pattern with equal weight, which means that the echoes for the first brain region are independent from the echoes for the remaining brain regions and show an equal intensity associated with each alternative neural pattern.

On the other hand, for the second brain region, the neural field exhibits a reward-seeking dynamics that is adaptive with respect to the optimal action; that is, the probing dynamics is directed toward the optimal action and the response dynamics also comes from the optimal action, so that, due to the adaptive unitary propagation, the second brain region is projecting over the optimum value, and this is the only echo that it gets.

The third brain region still has a projective dynamics toward the nonfiring neural activity.

Now, for the second phase of the exploration stage, we have the operator:

$$\hat{U}\_2 = \left[ \sum\_{\mathbf{s} \in \mathcal{A}\_2^d} |\mathbf{s}\rangle\langle\mathbf{s}| \otimes\_{k=1}^d \left( (\mathbf{1} - s\_k)\hat{\mathbf{1}} + s\_k \hat{\sigma}\_1 \right) \right] \otimes \hat{\mathbf{1}}^{\otimes d} \tag{35}$$

which leads to the density after the second phase of the exploration stage:

$$\hat{\rho}\_2 = \hat{U}\_2 \hat{\rho}\_1 \hat{U}\_2^\dagger = \sum\_{\mathbf{r}\_1 \ \mathbf{s} \in \mathcal{A}\_2^d} \frac{|\mathbf{r}\rangle\langle\mathbf{s}| \otimes\_{k=1}^d |\mathbf{s}\_k^\* \oplus r\_k\rangle\langle\mathbf{s}\_k^\* \oplus s\_k|}{\mathfrak{I}^d} \otimes |\mathbf{0}\rangle\langle\mathbf{0}|^{\otimes d} \tag{36}$$

Thus, after the second phase, the first and second brain regions exhibit an entangled probe and response dynamics, where the neural field, for second brain region, is effectively computing both the rewards and the explored actions.

Next comes the exploitation stage with the neural processing for the decision module (the third brain region).

The first step of the exploitation stage is the processing of the initially explored actions, by way of the operator:

$$\hat{U}\_3 = \sum\_{\mathbf{s} \in \mathcal{A}\_2^d} |\mathbf{s}\rangle\langle\mathbf{s}| \otimes \hat{\mathbf{1}}^{\otimes d} \otimes\_{k=1}^d \left( (\mathbf{1} - \mathfrak{s}\_k)\hat{\mathbf{1}} + \mathfrak{s}\_k \hat{\sigma}\_1 \right) \tag{37}$$

which leads to the density:

$$\hat{\rho}\_{3} = \hat{U}\_{3}\hat{\rho}\_{2}\hat{U}\_{3}^{\dagger} = \sum\_{\mathbf{r}\_{2}\mathbf{s}\in\mathbf{A}\_{2}^{d}} \frac{|\mathbf{r}\rangle\langle\mathbf{s}|\otimes\_{k=1}^{d}\big|\mathbf{s}\_{k}^{\*}\oplus r\_{k}\big\rangle\langle\mathbf{s}\_{k}^{\*}\oplus s\_{k}|\otimes\big|\mathbf{r}\rangle\langle\mathbf{s}|}{2^{d}}\tag{38}$$

That is, the probe and response dynamics for the third brain region are correlated and coincident with the probe and response dynamics for the first brain region, so that the third brain region is effectively computing the initially explored actions.

Now, the second step for the third brain region results from the activation of the synaptic links with the second brain region, leading to the conditional unitary operator:

$$\hat{U}\_4 = \sum\_{\mathbf{s} \in \mathcal{A}\_2^d} \hat{\mathbf{1}}^{\otimes d} \otimes |\mathbf{s}\rangle\langle\mathbf{s}| \otimes\_{k=1}^d \left( (\mathbf{1} - s\_k)\hat{\mathbf{1}} + s\_k \hat{\sigma}\_1 \right) \tag{39}$$

Under this operator, we get the final density:

$$\begin{split} \boldsymbol{\hat{\rho}}\_{4} &= \boldsymbol{\hat{U}}\_{4} \boldsymbol{\hat{\rho}}\_{3} \boldsymbol{\hat{U}}\_{4}^{\dagger} = \\ &= \sum\_{\mathbf{r}\_{2}} \frac{|\mathbf{r}\rangle \langle \mathbf{s}| \otimes \boldsymbol{\mathbb{P}}\_{k=1}^{d} (\left| \mathbf{s}\_{k}^{\*} \oplus r\_{k} \right\rangle \langle \mathbf{s}\_{k}^{\*} \oplus s\_{k} | \otimes \left| r\_{k} \oplus \left( \mathbf{s}\_{k}^{\*} \oplus r\_{k} \right) \rangle \langle s\_{k} \oplus \left( \mathbf{s}\_{k}^{\*} \oplus s\_{k} \right) | \rangle}{\mathbf{2}^{d}} \end{split} \tag{40}$$

Since we have the Boolean equality p ⊕ ð Þ¼ q ⊕ p q, this means that the above density can be simplified, so that the neural field's probe and response dynamics for the third brain region projects over the optimal action:

$$\hat{\rho}\_4 = \hat{U}\_4 \hat{\rho}\_3 \hat{U}\_4^\dagger = \left( \sum\_{\mathbf{r}\_2, \mathbf{s} \in A\_2^d} \frac{|\mathbf{r}\rangle\langle\mathbf{s}| \otimes\_{k=1}^d |\mathbf{s}\_k^\* \oplus r\_k\rangle\langle\mathbf{s}\_k^\* \oplus s\_k|}{\mathfrak{I}^d} \right) \otimes |\mathbf{s}^\*\rangle\langle\mathbf{s}^\*| \tag{41}$$

The third brain region's computation takes advantage of the entangled dynamics between the first and second brain regions to learn the optimal action. For the final density, while the first and second brain regions exhibit an entangled probe and response dynamics, the third brain region is always projecting over the optimum.

It is important to stress how QNRL takes advantage of quantum entanglement such that the neural field for the third brain region followed each alternative action and then the reward processing dynamics to find the optimum in all these alternative paths, so that the optimal action is always followed by the agent.

As an example of the above problem, let us consider the case where we the reward set is Ω ¼ �f g 1; 1 , and that there are two possible actions 0 and 1 leading, respectively, to the classical probability measures P<sup>0</sup> and P1, with P0½ �¼ w ¼ 1 0:4 and P1½ �¼ w ¼ 1 0:6; then, we get the probabilities of selection for each gamble and device shown in Table 1, for 8192 repeated experiments.

As expected, the QASM simulator always selects the action 1, which is the best performing action by the conditional expected payoff criterion. The Tenerife device selects the correct action with a proportion of 0.778, while the Melbourne device selects the correct action with a proportion of 0.647. If, instead of the above gamble profile, we had P0½ �¼ w ¼ 1 0:6 and P1½ �¼ w ¼ 1 0:4, the optimal choice would be the action 0; in this case, as shown in Table 2, the QASM simulator, again, selects the correct action each time. The Tenerife device, in turn, selects the correct action with a 0.857 frequency and the Melbourne device with a 0.814 frequency.

is the optimal action. If we run the experiment with the same number of shots on the Melbourne device, then, as shown in Figure 5, the output 11 is still the dominant action, however, with a proportion of 0.370, the second dominant action being non-

Results for four actions using the Melbourne device ("ibmq\_16\_melbourne"), with 8192 shots used, and probability profiles given by: P00½ �¼ w ¼ 1 0:6, P01½ �¼ w ¼ 1 0:4, P10½ �¼ w ¼ 1 0:8 and P11½ �¼ w ¼ 1 0:9.

Therefore, the first qubit tends to be measured with the right pattern with a proportion of 0.679 (0.309 + 0.370); the probability of the second qubit being correct given that the first is correct is only about 0.54492 (0.370/0.679). This suggests that the deviation may be due to the entanglement with the environment

The above algorithm was implemented using Qiskit and Python's Object Oriented Programming (OOP); the code, shown in the appendix, exemplifies how OOP can be integrated with quantum computation for implementing quantum AI on any

residual and with a value of 0.309 occurs for the output 10.

Device Action

Quantum Neural Machine Learning: Theory and Experiments

DOI: http://dx.doi.org/10.5772/intechopen.84149

Device Action

Table 1.

Table 2.

Figure 5.

113

QASM 0 1 Tenerife 0.222 0.778 Melbourne 0.353 0.647

QASM 1 0 Tenerife 0.857 0.143 Melbourne 0.814 0.186

Results for two alternative actions using the QASM simulator, the Tenerife device (ibmqx4) and the Melbourne device (ibmq\_16\_melbourne); in each case, 8192 shots were used, with P0½ �¼ w ¼ 1 0:4 and P1½ �¼ w ¼ 1 0:6.

Results for two actions using the QASM simulator, the Tenerife device (ibmqx4) and the Melbourne device (ibmq\_16\_melbourne); in each case, 8192 shots were used, with P0½ �¼ w ¼ 1 0:6 and P1½ �¼ w ¼ 1 0:4.

0 1

0 1

significantly deviating the second qubit from the correct pattern.

In Figure 5, we show the Melbourne device's results<sup>4</sup> when we have four actions for the same rewards profile, and the probabilities are P00½ �¼ w ¼ 1 0:6, P01½ �¼ w ¼ 1 0:4, P10½ �¼ w ¼ 1 0:8, and P11½ �¼ w ¼ 1 0:9, still setting the rewards to Ω ¼ �f g 1; 1 .

In this case, if we run the experiment on the QASM backend, with 8192 shots, we get the action encoded by the string 11 with relative frequency equal to 1, which

<sup>4</sup> We can only use the Melbourne device since the Tenerife device does not have the required capacity in terms of number of quantum registers.
