This paper explores how speech and co-speech gestures can be combined for 3D parametric modeling. The system uses a multimodal large language model to interpret gesture-accompanied speech and translate user intent into executable parametric modeling operations. The work introduces a more natural and expressive interaction paradigm for 3D modeling and demonstrates the potential of LLMs in spatial intent understanding and parameter extraction.