*4.3.2 Example: generating virtual legs based on arm movement using VHNN*

A neural network is trained to generate the kinetic status of hip, knees, and feet according to the kinetic status of shoulders, elbows, and arms captured by 4D sensors [90]. As illustrated in **Figure 11 (a)**–**(d)**, four network architectures are investigated in this research: (a) multiple layer perceptron (MLP); (b) denoising autoencoder (a classical autoencoder architecture); (c) visible and hierarchical neural network with two subsystems (VHNN2); and (c) VHNN with four subsystems (VHNN4). It can be observed that VHNN splits the input tensor and then feeds the split tensor into multiple smaller, parallelized autoencoders. Thus, data for each joint can be calculated in parallel with their own respective autoencoder. The aforementioned parallelized autoencoder pipelines are simplified stacked autoencoders, allowing for optimization of specific, key tasks rather than one large task. A video playlist of the generation of virtual legs based on VHNN may be found at [92].

As illustrated in **Figure 9**, the generated kinetics of virtual limbs can be corrected using time-series models such as ARIMA.

As illustrated in **Table 1**, the proposed VPNN architecture has proven to have overall superior results compared to previous work. Decreased training time compared to previous autoencoders architectures can be observed due to the parallelization of simpler autoencoders, increasing efficiency by easing optimization. This is done by allowing autoencoders to train on specific gestures in a whole movement. In addition, it does not exhibit data-hungry tendencies that state-ofthe-art models exhibit, allowing it to be trained on small amounts of data.

Lower ground truth error can be seen in the VPNN-AE-2 versus VPNN-AE-4. This is due to training data having no anomalies that real-time data can exhibit. While VPNN-AE-2 with single-correlation works better when testing against ground truth data, VPNN-AE-4 with double-correlation works better in real-time as the patient may not follow the Tai-Chi movements correctly. This causes worse ground truth error as the added complexity of the architecture increases noise

#### **Figure 11.**

*Generation of virtual legs from moving arms using various architecture: (a) MLP; (b) denoising autoencoder (a classical architecture); (c) two thread (subsystem) visible and hierarchical autoencoder neural network (VHNN-AE-2); (d) four-thread VHNN (VHNN-AE-4) (notes: LL-LA indicates virtual left-leg induced by left-arm; LL-RA indicates virtual right-leg induced by right arm) (online video: [92]).*


**Table 1.**

*Time-performance of virtual-legs generation using visible and hierarchical autoencoder neural network (VHNN), which is derived from human anatomy (Intel Core i9-7900X, 1x NVIDIA GTX1080 Ti, 64GB RAM; MLP does not employ GPU).*

during output, but enables better patient-error tolerance. Because of this additional noise produced of VPNN-AE-4, improvements through larger training datasets, more sophisticated pre- and post-processing of data, as well as improved NN architecture could be achieved.
