Free-Form Motion Control: A Synthetic Video Generation Dataset with Controllable Camera and Object Motions

1Fudan University     2DAMO Academy, Alibaba Group     3Nanyang Technological University    

Figure 1. Generation pipeline of videos in the proposed Synthetic Dataset for Free-Form Motion Control (SynFMC). The figure presents an example for generating synthetic video with three objects: (1) First, the HDRI environment map and objects matching the environment are selected as the assets. (2) Then, the motion types of objects and camera are selected for trajectory generation. (3) The center region shows the resulting 3D animation sequence used for rendering. The rendered video and annotations are demonstrated in the last row.


Abstract

Controlling the movements of dynamic objects and the camera within generated videos is a meaningful yet challenging task. Due to the lack of datasets with comprehensive motion annotations, existing algorithms can not simultaneously control the motions of both camera and objects, resulting in limited controllability over generated contents. To address this issue and facilitate the research in this field, we introduce a Synthetic Dataset for Free-Form Motion Control (SynFMC). The proposed SynFMC dataset includes diverse objects and environments and covers various motion patterns according to specific rules, simulating common and complex real-world scenarios. The complete 6D pose information facilitates models learning to disentangle the motion effects from objects and the camera in a video. To validate the effectiveness and generalization of SynFMC, we further propose a method, Free-Form Motion Control (FMC). FMC enables independent or simultaneous control of object and camera movements, producing high-fidelity videos. Moreover, it is compatible with various personalized text-to-image (T2I) models for different content styles. Extensive experiments demonstrate that the proposed FMC outperforms previous methods across multiple scenarios.

1. Visualization of SynFMC

Environment Categories

⭐ The environments in SynFMC span five types: ground, near ground, sky, water surface, and underwater.

Ground
0442a954 d321dde4
Near Ground
02221fb0 bbe97d18
Sky
002b4dce 26ed56e6
Water Surface
c791ddbb e5e9eb29
Underwater
c791ddbb e5e9eb29


Scene Categories

⭐ SynFMC contains 26K videos divided into four groups: 6K static single-object, 6K static multi-object, 8K dynamic single-object, and 6K dynamic multi-object. Static means fixed object locations in world space while the camera remains movable.

Static Single-Object
26ed56e6
Dynamic Single-Object
c791ddbb
Static Multi-Object
e5e9eb29
Dynamic Multi-Object
c791ddbb


Auxiliary Annotation of SynFMC

⭐ Besides 6D poses of objects and the camera, SynFMC also provides auxiliary annotations, including instance segmentation maps, depth maps, and descriptions of both visual content and motion.

0442a954

2. Architecture of FMC

⭐ The following figure presents the architecture of FMC, where the Object Motion Controller (OMC) perceives the orientation and size of objects in the camera coordinate system by accepting 6D poses.

Figure 2. The architecture of FMC. In the first stage, we randomly sample the images from synthetic videos and update the parameters from injected Domain Adapter. Next, the modules from Camera Motion Controller (CMC) are learned. It consists of two parts: Camera Encoder and Camera Adapter, where the Camera Adapter is introduced into the temporal modules. Finally, we train the Object Encoder from Object Motion Controller (OMC). It receives the object pose features, which are repeated in the corresponding object region. We use Gaussian blur kernel centered at the centroid to prevent the need of precise masks. Then, the output is multiplied by the coarse masks to modulate the features in the main branch.

3. Results of FMC

Independent Control of Camera / Object

⭐ The first/last two examples are the results from independent control of camera/object:
canyon rim with a view of red rocks
02221fb0 02221fb0
cactus in the garden
002b4dce 002b4dce
a balloon floating over the road
bbe97d18 bbe97d18
a butterfly flying over the ground
bbe97d18 bbe97d18

Simultaneous Control of Camera & Object

⭐ The first/last two examples are results from static/dynamic single-object scene:
a yellow mushroom on the road
0442a954 0442a954
a cat in the grass covered with leaves
0442a954 0442a954
a butterfly is flying over the ground
0442a954 0442a954
a balloon floating in the cloudy sky
0442a954 0442a954



⭐ The first/last two examples are results from static/dynamic multi-object scene:
a deer and a man in the grass
0442a954 0442a954
two birds in meadow
0442a954 0442a954
two UFOs are flying over the city
0442a954 0442a954
a shark and a yellow fish are swimming in the sea
0442a954 0442a954

BibTeX

Please consider to cite SynFMC if it helps your research.
@article{SynFMC,
        title={{Free-Form Motion Control}: A Synthetic Video Generation Dataset with Controllable Camera and Object Motions},
        author={Shuai, Xincheng and Ding, Henghui and Qin, Zhenyuan and Luo, Hao and Ma, Xingjun and Tao, Dacheng},
        journal={arXiv},
        year={2025}
      }