JustShape: Exploring Co-Speech Gestures for Multimodal LLM-Powered 3D Parametric Modeling

Abstract

This paper explores how speech and co-speech gestures can be combined for 3D parametric modeling. The system uses a multimodal large language model to interpret gesture-accompanied speech and translate user intent into executable parametric modeling operations. The work introduces a more natural and expressive interaction paradigm for 3D modeling and demonstrates the potential of LLMs in spatial intent understanding and parameter extraction.

Publication
In CHI Conference on Human Factors in Computing Systems (CHI) 2026