**3.1 Methodology**

**Figure 1** depicts the methodology of our proposed approach, our approach works as black box which receive an image of a fixed size and produces a 2D anatomical key point of every person in the image. After performing the needed preprocessing, the image is passed through a feed forward convolutional neural network. The architecture has two separate branches that runs simultaneously


### **Figure 1.**

*Schematic diagram of a multistage architecture. Two parallel branches feeds forward the network. Heat maps predicts the approximation while part association vectors predict association and orientations.*

### **3.2 Part detection using heat-maps**

The heat maps produced by convolutional neural net are highly reliable supporting features. The heat maps are set of matrices that stores the confidence that the network has that a pixel contains a body joint. As many as 16 matrices for each of the true body joints. The heat map specifies the probability that a particular joint exist within a particular pixel location. The very idea of having heat maps provide support in predicting the joint location. The visual representation of heat maps could give an intuition of a presence of body joint. The darker the shade or sharper the peak represents a high probability of a joint. Several peaks represent a crowded image representing one peak for one person (**Figure 2**).

Calculating the confidence map or heat maps *C*<sup>∗</sup> *jk* for each joint requires some prior information for comparison. Let *xjk* be the empirical position of a body joint j of the person k. These confidence maps at any position m can be created by using the empirical position *xjk*.The value of confidence map at location p in *C*<sup>∗</sup> *jk* is given by

$$\mathbf{C}\_{jk}^\*(m) = \exp^{\left(\frac{\omega^2}{\sigma^2}\right)} \tag{1}$$

where σ is spread from the mean and *Δ* is the absolute difference of *xjk* and m.

*Articulated Human Pose Estimation Using Greedy Approach DOI: http://dx.doi.org/10.5772/intechopen.99354*

**Figure 2.**

*Illustration of one segment of the pipeline i.e. predicting heat maps through neural network. It gives the confidence metric with regards to the presence of the particular body part in the given pixel.*

All the confidence maps get aggregated by the network to produce the final confidence map. The final confidence map is generated by the network obtained from the aggregation of the individual maps.

$$\mathbf{C}\_{jk}^\*(m) = \max\left(\mathbf{C}\_{jk}^\*(m)\right) \tag{2}$$

These confidence maps are rough approximations, but we need the value for that joint. We need to extract value from the hot spot. For the final aggregated confidence map we take the max of the peak value while suppressing the rest.

### **3.3 Greedy part association vector**

The problem that comes while detecting the pose is that even if we have all the anatomical key points how we are going to associate them. The hotspot or the key points itself have no idea of the context on how they are connected. One way to approach this problem is to use a geometrical line midpoint formula. But the given approach would suffer when the image is crowded as it would tend to give false association. The reason behind the false association is the limitation of the approach as it tend to encode only the position of the pair and not the orientations and also it reduces the base support to a single point. In order to address this issue, we want to implement a greedy approach known as greedy part association vector which will preserve the position along with the orientation across the entire area of pair support. Greedy part association vectors are the 2D vector fields that provides information regarding the position and the orientation of the pairs. These are a set of coupled pair with one representing x axis and the other representing the y axis. There are around 38 GPAVs per pair and numerically index as well (**Figure 3**).

Consider a limb j with 2 points at *x*<sup>1</sup> and *x*<sup>2</sup> for *kth* person in the image. The limb will have many points between *x*<sup>1</sup> and *x*2. The greedy part association vector at any point c between *x*<sup>1</sup> and *x*<sup>2</sup> for *kth* person in the image represented by *G*<sup>∗</sup> *<sup>j</sup>*,*<sup>k</sup>* can be calculated as.

$$\mathbf{G}\_{j,k}^{\*} = \hat{\mathbf{c}} \text{ if } \mathbf{c} \text{ is on link j and person k.}\\\text{Or } \mathbf{0} \text{ otherwise.}\tag{3}$$

where ^*c* a unit vector along the direction of limb equivalent to

**Figure 3.**

*Illustration of the other segment greedy part vectors, preserving the position along with the orientations and finally associates the joints through greedy parsing.*

$$\frac{\varkappa\_2 - \varkappa\_1}{\sqrt{\varkappa\_2^2 - \varkappa\_1^2}}\tag{4}$$

The empirical value of final greedy part association vector will be the average of GPAVs of all the person in the image.

$$\mathbf{G}\_{j}^{\*} = \frac{\sum\_{k} \mathbf{G}\_{j,k}^{\*}}{n\_{j}(c)} \tag{5}$$

where *G*<sup>∗</sup> *<sup>j</sup>*,*<sup>k</sup>* is the greedy part association vector at any point and *nj*ð Þ*c* is the total number of vectors at the same point c among all people.

### **3.4 Multi person pose estimation**

After getting the part candidates using non-maximum suppression, we need to associate those body parts to forms pairs. For each body part there are n numbers of part candidates for association. On an abstract level one-part can form association with every possible part candidate forming a complete graph (**Figure 4**).

For example, we have detected a set of plausible neck candidates and a set of hip candidates. For each neck candidates there is a possible connection with the right hip candidates giving a complete bipartite graph having the nodes as part candidates and the edges as possible connections. We need to associate only the optimal part giving rise to a problem of N dimensional matching problem which itself a NP hard problem. In order to solve this optimal matching problem, we need to assign weights to each of possible connection. This is where the greedy part association vectors come into the

### **Figure 4.**

*(a–d) Solving the assignment problem for associating body joints to form the right pair. Assigning weights to each possible connection with the help of greedy association vectors.*

pipeline. These weights are assigned using the aggregated greedy part association vector.

In order to measure the association between two detected part candidates. We need to integrate over the predicted greedy part association vector found in previous section, along these two detected part candidates. This integral will give assign a score to each of the possible connections and store the scores in a complete bipartite graph. We need the find the directional orientation of the limb with respect to these detected part candidates. Empirically we have two detected part candidates namely *t*<sup>1</sup> and *t*<sup>2</sup> and the predicted part association vector *Gj*. An integral over the curve will give a measure of confidence in their association.

$$E = \int\_{i=0}^{i=1} G\_j(c(m)) \, d\hat{.} dm \tag{6}$$

where *Gj*ð Þ *c m*ð Þ greedy part association vector and ^ *d* is a unit vector along the direction two non-zero vectors *t*<sup>1</sup> and *t*2.

After assigning weights to the edges our aim is to find the edge for a pair of joints with the maximum weight. For this we choose the most intuitive approach. We started with sorting the scores in descending manner followed by selecting the connection with the max score. We then move to the next possible connection if none of the parts have been assigned a score, this is a final connection. Repeat the third step until done.

The final step involves merging the whole detected part candidates with optimal scores and forming a complete 2D stick figure of human structure. One way to approach this problem is that let us say each pair of part candidates belong a unique person in the image that way we have a set of humans i.e. f g *H*1,*H*2,*H*3, … *:Hk* where k is the total number of final connection. Each human in the set contain a pair i.e. pair of body parts. Let represent the pairs as a tuple of indices one in x direction and one in y direction. *Hi* ¼ *midx*, *mx*, *my* � �,ð*nidx*, *nx*, *ny*<sup>Þ</sup> � �. Now comes the merging we conclude that if two human set shares any index coordinates with other set means that they share a body part. We merge the two sets and delete the other. We perform the same steps for all of the sets until no two human share a part ultimately giving a human structure.
