Dynamic-eDiTor: Training-Free Text-Driven 4D Scene Editing with Multimodal Diffusion Transformer

Dong In Lee, Hyungjun Doh, Seunggeun Chi, Runlin Duan, Sangpil Kim, Karthik Ramani

June, 2026

Abstract

This work focuses on 4D scene editing and presents a training-free text-driven approach for modifying dynamic scenes. By leveraging a multimodal diffusion transformer, the method enables temporally consistent edits based on user instructions. Compared with conventional image editing methods, this work extends controllable generation into the spatiotemporal domain and highlights flexibility and precision in dynamic scene manipulation.

Type

Conference paper

Publication

In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026