Manuel Kaufmann

I am a computer scientist with a background in human-centric 3D Computer Vision. I previously worked as a tech lead in the executive team of the ETH AI Center and as postdoc at the AIT Lab at ETH Zurich, Department of Computer Science. I obtained my PhD in the AIT Lab under superivsion of Prof. Otmar Hilliges, co-advised by Prof. Markus Gross. I have completed my MSc at ETH Zurich in the AIT lab and earned my BSc in Computer Science from the University of Basel under the supervision of Prof. Thomas Vetter.

I taught Machine Perception at ETH Zurich. All lecture materials including video recordings from 2025 are publicly available.

Check out aitviewer, an open-source interactive viewer for sequences of 3D data that we created at the AIT lab.

Scholar  |  LinkedIn  |  GitHub

profile photo

Research

I am interested in multimodal vision-language models, human-centric multimodal data acquisition pipelines, 3D human pose estimation from wearable sensors (e.g., IMUs and electromagnetic sensors) and single or multi-view images, 3D reconstruction of human avatars from images and video, and human motion modelling. For a selection of publications, see below.

Learning Vision-Language Alignment in Unified LLMs with 24 Text Tokens per Image

Nicola Irmiger, Yixuan Xu, Raphael Kreft, Aram Davtyan, Manuel Kaufmann, Imanol Schlag

IWSDS, 2026

Adapting a pre-trained LLM for flexible multimodal understanding by compressing images into 24 discrete tokens and training with a two-stage next-token prediction approach.

paper

RHINO: Reconstructing Human Interactions with Novel Objects from Monocular Videos

Lixin Xue, Chengwei Zheng, Georgios Paschalidis, Chen Guo, Manuel Kaufmann, Juan Zarate, Dimitrios Tzionas

CVPR, 2026

A framework that reconstructs a human, an unseen object, and the static scene in a common world frame from monocular RGB video, without requiring known 3D shapes or cameras.

project page | arXiv

Gaussian Wardrobe: Compositional 3D Gaussian Avatars for Free-Form Virtual Try-on

Zhiyi Chen*, Hsuan-I Ho*, Tianjian Jiang, Jie Song, Manuel Kaufmann, Chen Guo

3DV, 2026

A compositional 3D Gaussian avatar representation that enables free-form virtual try-on of clothing from multiple sources.

project page | arXiv | video | code

PriorAvatar: Efficient and Robust Avatar Creation from Monocular Video Using Learned Priors

Tianjian Jiang, Hsuan-I Ho, Manuel Kaufmann, Jie Song

SIGGRAPH Asia (TOG), 2025

Robust and efficient monocular avatar reconstruction using a 3D human prior (a multi-person feature codebook of shapes and appearances) learned from scans to guide 3D Gaussian fitting.

paper

PHD: Personalized 3D Human Body Fitting with Point Diffusion

Hsuan-I Ho, Chen Guo, Po-Chen Wu, Ivan Shugurov, Chengcheng Tang, Abhay Mittal, Sizhe An, Manuel Kaufmann*, Linguang Zhang*

ICCV, 2025

Personalized 3D human body fitting using point diffusion for improved accuracy in body pose and shape reconstruction.

project page | arXiv | video

ODHSR: Online Dense 3D Reconstruction of Humans and Scenes from Monocular Videos

Zetong Zhang, Manuel Kaufmann, Lixin Xue, Jie Song, Martin Oswald

CVPR, 2025

An online Gaussian Splatting-based system for dense 3D reconstruction of both humans and their surrounding scenes from monocular video input.

project page | video

EgoHDM: An Online Egocentric-Inertial Human Motion Capture, Localization, and Dense Mapping System

Bonan Liu*, Handi Yin*, Manuel Kaufmann, Jinhao He, Sammy Christen, Jie Song, Pan Hui

SIGGRAPH Asia (TOG), 2024

An egocentric-inertial system that jointly performs human motion capture, localization, and dense scene mapping online.

project page | arXiv | video | code

ReLoo: Reconstructing Humans Dressed in Loose Garments from Monocular Video in the Wild

Chen Guo*, Tianjian Jiang*, Manuel Kaufmann, Chengwei Zheng, Julien Valentin, Jie Song, Otmar Hilliges

ECCV, 2024

Reconstructing humans wearing loose-fitting garments from in-the-wild monocular videos, handling challenging clothing dynamics.

project page | arXiv | video | code

WorldPose: A World Cup Dataset for Global 3D Human Pose Estimation

Tianjian Jiang, Johsan Billingham, Sebastian Müksch, Juan Zarate, Nicolas Evans, Martin Oswald, Marc Pollefeys, Otmar Hilliges, Manuel Kaufmann, Jie Song

ECCV, 2024

A large-scale dataset captured from 2022 FIFA World Cup broadcasts for benchmarking global 3D human pose estimation in sports scenarios.

project page | paper | video | dataset

HSR: Holistic 3D Human-Scene Reconstruction from Monocular Videos

Lixin Xue, Chen Guo, Chengwei Zheng, Fangjinhua Wang, Tianjian Jiang, Hsuan-I Ho, Manuel Kaufmann, Jie Song, Otmar Hilliges

ECCV, 2024

A holistic approach to jointly reconstructing humans and their surrounding 3D scene from a single monocular video.

project page | paper | video | code

MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild

Zeren Jiang*, Chen Guo*, Manuel Kaufmann, Tianjian Jiang, Julien Valentin, Otmar Hilliges, Jie Song

CVPR, 2024  (Oral Presentation, Top 3.3%)

Reconstructing multiple interacting people from a single monocular in-the-wild video, handling occlusions and close contact.

project page | arXiv | video | dataset | code

EMDB: The Electromagnetic Database of Global 3D Human Pose and Shape in the Wild

Manuel Kaufmann, Jie Song, Chen Guo, Kaiyue Shen, Tianjian Jiang, Chengcheng Tang, Juan Zarate, Otmar Hilliges

ICCV, 2023

A new dataset of global 3D human pose and shape in-the-wild, captured using electromagnetic sensors for accurate ground truth.

project page | paper | video | dataset | code

ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation

Zicong Fan, Omid Taheri, Dimitrios Tzionas, Muhammed Kocabas, Manuel Kaufmann, Michael Black, Otmar Hilliges

CVPR, 2023

A large-scale dataset capturing dexterous bimanual manipulation of articulated objects, useful for hand-object interaction research.

project page | paper | video | code

X-Avatar: Expressive Human Avatars

Kaiyue Shen*, Chen Guo*, Manuel Kaufmann, Juan Zarate, Julien Valentin, Jie Song, Otmar Hilliges

CVPR, 2023

Expressive human avatars that capture body pose, facial expressions, and hand gestures in a unified model.

project page | arXiv | video | code

Hi4D: 4D Instance Segmentation of Close Human Interaction

Yifei Yin, Chen Guo, Manuel Kaufmann, Juan Zarate, Jie Song, Otmar Hilliges

CVPR, 2023

A dataset and method for 4D instance segmentation of two people engaged in close physical interaction.

project page | arXiv | video | dataset | code

A Spatio-temporal Transformer for 3D Human Motion Prediction

Emre Aksan, Manuel Kaufmann, Peng Cao, Otmar Hilliges

3DV, 2021

A spatio-temporal transformer architecture that jointly models spatial and temporal dependencies for 3D human motion prediction.

project page | paper | video | code

EM-POSE: 3D Human Pose Estimation from Sparse Electromagnetic Trackers

Manuel Kaufmann, Yi Zhao, Chengcheng Tang, Lingling Tao, Christopher Twigg, Jie Song, Robert Wang, Otmar Hilliges

ICCV, 2021

3D human pose estimation from sparse wireless electromagnetic trackers.

project page | paper | video | dataset | code

Convolutional Autoencoders for Human Motion Infilling

Manuel Kaufmann, Emre Aksan, Jie Song, Fabrizio Pece, Remo Ziegler, Otmar Hilliges

3DV, 2020

Convolutional autoencoders for filling in missing frames in human motion sequences, enabling cleanup of noisy or incomplete motion capture data.

project page | arXiv | video | code

Structured Prediction Helps 3D Human Motion Modelling

Emre Aksan*, Manuel Kaufmann*, Otmar Hilliges

ICCV, 2019

Incorporating the skeletal structure of the human body into the output layer improves 3D human motion modelling.

project page | paper | video | code

Deep Inertial Poser: Learning to Reconstruct Human Pose from Sparse Inertial Measurements in Real Time

Yinghao Huang*, Manuel Kaufmann*, Emre Aksan, Michael Black, Otmar Hilliges, Gerard Pons-Moll

ACM Transactions on Graphics (Proc. SIGGRAPH Asia), 2018

Real-time 3D human pose reconstruction from only six sparse IMU sensors, learning-based for the first time.

project page | paper | video | dataset | code

Academic Service

Reviewing

2026CVPR, ECCV
2025CVPR, ICCV, SIGGRAPH
2024CVPR, ECCV, SIGGRAPH, SIGGRAPH Asia
2023CVPR, ICCV, 3DV
2022CVPR, ECCV, 3DV, SIGGRAPH Asia, ACM UIST
2021ICCV, SIGGRAPH, SIGGRAPH Asia
2020ACM UIST, CVPR, ECCV, Eurographics, WACV
2019Biocybernetics and Biomedical Engineering

Area Chair

20253DV

Outstanding Reviewer Awards

2025CVPR
2022ECCV

Design inspired by Jon Barron's website. Thanks for providing the source code.