Estimating Body and Hand Motion in an Ego-sensed World

https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/86/0c/75/860c75aa-068a-18b9-1cb5-600f803acdd4/mza_17177667092256625558.jpg/600x600bb.jpg

AI Illuminated

The AI Illuminators

25 episodes

3 days ago

A new way to keep up with AI research. Delivered to your ears. Illuminated by AI. Part of the GenAI4Good initiative.

Courses

Education

RSS

All content for AI Illuminated is the property of The AI Illuminators and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

A new way to keep up with AI research. Delivered to your ears. Illuminated by AI. Part of the GenAI4Good initiative.

Courses

Education

https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_episode/42256170/42256170-1729356040419-15a008acd4807.jpg

Estimating Body and Hand Motion in an Ego-sensed World

AI Illuminated

15 minutes 56 seconds

1 year ago

Estimating Body and Hand Motion in an Ego-sensed World

[00:00] Introduction to EgoAllo system

[00:38] Challenges in egocentric motion estimation

[01:20] Importance of spatial/temporal invariance

[02:11] Comparison of conditioning parameterizations

[02:57] Integration of hand observations

[03:50] Global alignment phase

[04:28] Guidance losses in sampling

[05:03] Handling longer sequences

[05:35] Evaluation results

[06:30] System limitations and future work

[07:13] Implications for other egocentric tasks

[08:05] Advantages of diffusion models

[09:07] Use of synthetic datasets

[09:53] Promising research directions

[10:43] Impact on future motion capture systems

[11:41] Comparison to traditional methods

[12:31] Improved hand estimation accuracy

[13:25] SLAM data inaccuracies impact

[14:09] Levenberg-Marquardt optimizer usage

[15:14] Adapting to complex environments

Authors: Brent Yi, Vickie Ye, Maya Zheng, Lea Müller, Georgios Pavlakos, Yi Ma, Jitendra Malik, Angjoo Kanazawa

Affiliation: UC Berkeley, UT Austin

Abstract: We present EgoAllo, a system for human motion estimation from a head-mounted device. Using only egocentric SLAM poses and images, EgoAllo guides sampling from a conditional diffusion model to estimate 3D body pose, height, and hand parameters that capture the wearer's actions in the allocentric coordinate frame of the scene. To achieve this, our key insight is in representation: we propose spatial and temporal invariance criteria for improving model performance, from which we derive a head motion conditioning parameterization that improves estimation by up to 18%. We also show how the bodies estimated by our system can improve the hands: the resulting kinematic and temporal constraints result in over 40% lower hand estimation errors compared to noisy monocular estimates.

Project page: https://egoallo.github.io/