**3. Basics of PCFG**

In this section, we explain basic concepts of PCFG.

The context-free grammar (CFG) *G* is defined by four variables *G* = {N , T , R, B}, where the meanings of these variables are listed below.


It is important to note that the terms "non-terminal" and "terminal" in CFG are different from those in GP (for example in symbolic regression problems, not only variables *x*, *y* but also sin, + are treated as terminals in CFG). In CFG, sentences are generated by applying production rules to non-terminal symbols, which are generally given by

$$A \to \mathfrak{a} \quad (A \in \mathcal{N}, \mathfrak{a} \in (\mathcal{N} \cup \mathcal{T})^\*). \tag{1}$$

3. Branch node is an element of N

*α*1*α*<sup>2</sup> ··· *α<sup>k</sup>* is an element of R

We define the following production rules.

following derivation:

4. If children of *A* ∈ N are *α*1*α*<sup>2</sup> ··· *α<sup>k</sup>* (*α<sup>i</sup>* ∈ (N ∪T )) from left, production rule *A* →

Programming with Annotated Grammar Estimation 53

We next explain CFG with an example. We now consider a univariate function *f*(*x*) composed

N = {�*expr*�,�*op*2�,�*op*1�,�*var*�,�*const*�}, T = {+, −, ×, ÷, sin, cos, exp, log, *x*, *C*}.

> �*expr*� → �*op*2� �*expr*� �*expr*� �*expr*� → �*op*1� �*expr*� �*expr*� → �*var*� �*expr*� → �*const*� �*op*2� → + �*op*2� → − �*op*2� → × �*op*2� → ÷ �*op*1� → sin �*op*1� → cos �*op*1� → exp �*op*1� → log �*var*� → *x*

13 �*const*� → *C* (constant)

�*expr*� → �*op*2� �*expr*� �*expr*� → + �*expr*� �*expr*�

*G*reg derives univariate functions by applying the production rules. Suppose we have the

→ + �*op*2� �*expr*� �*expr*� �*expr*� → + + �*expr*� �*expr*� �*expr*� → + + �*op*1� �*expr*� �*expr*� �*expr*� → + + log �*expr*� �*expr*� �*expr*� → + + log �*var*� �*expr*� �*expr*� → + + log *x* �*expr*� �*expr*� → + + log *x* �*var*� �*expr*� → + + log *x x* �*expr*� → + + log *x x* �*const*� → + + log *xxC*.

of sin, cos, exp, log and arithmetic operators (+, −, × and ÷). A grammar *G*reg can be
