**1. Introduction**

Human pose estimation is a complex field of study in artificial intelligence, which requires a depth knowledge of computer vision, calculus, graph theory and biology. Initially this work start by introducing an image to a computer through camera and detect humans in the image known as object detection, as one of computer vision problem. In real world detecting an object from an image [1] and estimating its posture [2, 3] is two different aspects of objects. The latter is a very challenging and complex task. Images are filled with occluded objects, humans in close proximity, occlusions or spatial interference makes the task even more strenuous. One way of solving this problem is to use single person detector for estimation known as top down parsing [4– 9]. This approach suffers from preconceived assumptions and lacks robustness. The approach is biased towards early decisions which makes it hard to recover if failed. Besides this, the computational time complexity is commensurate with the number of people in the image which makes it not an ideal approach for practical purpose. On a contrary the bottom up approach seems to perform well as compare to its counterpart. However earlier bottom up versions could not able to reduce the computational

complexity as it unable to sustain the benefits of being consistent. For instance, the pioneering work E. Insafutdinov et al. Proposed a bottom up approach that simultaneous detects joints and label them as part candidates [10]. Later it associates them to individual person. Even solving the combinatorial optimization problem over a complete graph is itself NP hard. Another approach built on with stronger joint detectors based on ResNet [11] and provides ranking based on images, significantly improved its runtime but still performs in the order of minutes per image. The approach also requires a separate logistic regression for precise regression. After studying sufficient approaches and their shortcomings in the literature of image processing and object detection, this chapter introduces a efficient approach for human pose estimation.

### **1.1 Contribution of the work**

Optimizing the current state of the art results and introducing a new approach to solving this problem is the highlight of this chapter. In this chapter, we presented a bottom up parsing technique which uses a non-parametric representation, features for localizing anatomical key points for individuals. We further introduced a multistage architecture with two parallel branches one of the branches estimates the body joints via hotspots while the other branch captures the orientations of the joints through vectors This proposed approach is based on bottom up parsing, localizes the anatomical key points and associates them using greedy parsing technique known as greedy part association vectors. These 2D vectors aims to provide not only the encoded translator position but also the respective directional orientations of body parts. This approach also able to decouple the dependency of number of persons with running time complexity. Our approach has resulted in competitive performance on some of the best public benchmarks. The model maintains its accuracy while providing real time performance.

This chapter comprises of 6 sections: Section 2 discussed related work, in Section 3, proposed methodology is explained, in details with algorithms, in Section 4 results are discussed, and finally the chapter is concluded with future work in Section 5.
