**8.1 Meta-learning: learning to learn fast**

Meta-learning, also known as "learning to learn", intends to design models that can learn new skills or adapt to new environments rapidly with a few training examples. There are three common approaches: 1) learn an efficient distance metric (metric-based); 2) use (recurrent) network with external or internal memory (model-based); 3) optimize the model parameters explicitly for fast learning (optimization-based).

We expect a good meta-learning model capable of well adapting or generalizing to new tasks and new environments that have never been encountered during training time. The adaptation process, essentially a mini learning session, happens during test but with a limited exposure to the new task configurations. Eventually, the adapted model can complete new tasks. This is why meta-learning is also known as learning to learn.

### **8.2 Define the meta-learning problem**

A good meta-learning model should be trained over a variety of learning tasks and optimized for the best performance on a distribution of tasks, including potentially unseen tasks. Each task is associated with a dataset D, containing both feature vectors and true labels. The optimal model parameters are:

$$\theta^\* = \arg\min\_{\theta} \mathbb{E}\_{\mathcal{D} \sim p(\mathcal{D})} [\mathcal{L}\_\theta(\mathcal{D})] \tag{3}$$

It looks very similar to a normal learning task, but *one dataset* is considered as *one data sample.*

*Few-shot classification* is an instantiation of meta-learning in the field of supervised learning. The dataset D is often split into two parts, a support set *S* for learning and a prediction set *B* for training or testing, D ¼ h i *S*, *B :* Often we consider a *K*-shot *N*-class classification task: the support set contains K labeled examples for each of *N*-classes.

**Figure 4** shows an example of 4 shot 2-class image classification.

**Figure 4.**

*An example of 4-shot 2-class image classification. (image thumbnails are from Pinterest).*

#### *Quest for I (Intelligence) in AI (Artificial Intelligence): A Non-Elusive Attempt DOI: http://dx.doi.org/10.5772/intechopen.96324*

From section 4.2, we understand that crystallized intelligence ð Þ *Gc* is an outgrowth of fluid intelligence *G <sup>f</sup>* � �*:* Thus the performance of crystallized intelligence is influenced by a trace of fluid intelligence, though ð Þ *Gc* and *G <sup>f</sup>* � � are two separate distinct components in putative test of intelligence. Also from **Figure 2**, we understand that the component reasons *Gf* � � and acquired knowledge ð Þ *Gc* are derived from the top level mental (neutral) energy g. Hence to make a very 'crude approximate' of three stratum theory of C–H–C (see **Figure 3**), we adopt deep metalearning approach where we integrate the power of deep learning approach into meta-learning. The *G <sup>f</sup>* � � and ð Þ *Gc* are both derived from the top level mental (neutral) energy as shown in **Figure 2** and try to follow the hierarchy of three layers to derive the broad and narrow abilities to perform the specific task of given job. Here we consider the term 'crude approximation, because the top level of **Figure 2** or **Figure 3** can never be reached by the present state of art of artificial neural network. Specially "mood" at the top level of **Figure 3** is a biological phenomenon which generates sufficient mental energy (neural energy) inside the brain under favorable mental conditions. Hence, under such circumstances we assume sufficient neutral (mental) energy is generated for C–H–C theory to perform lower level of cognitive process like crystalized and fluid intelligence.
