**Acknowledgements**

This work was funded in part through a fundamental research collaboration partnership between Sorbonne Université, CNRS, Institut *∂*' Alembert and Facebook Reality Labs. This work was funded in part by the RASPUTIN project (ANR-18-CE38-0004, https://rasputin.lam.jussieu.frrasputin.lam.jussieu.fr) and an associated "Innov'up Faisabilité" grant from the Région Île de France. Portions of this work have been carried out in the context of the Sonicom project, that has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No. 101017743.
