Disentangled Scale Control for Robotic Policies
MIT CSAIL · Leslie Kaelbling Lab · 2025Abstract
We develop compact, interpretable latent representations for fine-grained robotic manipulation. Our approach learns parameterizations where a small number of latent dimensions can continuously control policy scales (e.g., door-opening angle, motion speed) while preserving smoothness and interpretability for downstream task adaptation. The learned representation supports gradual adjustments rather than discrete switches, enabling humans to understand and manipulate scale-related parameters in downstream tasks.
Figure 1: Overview of the disentangled scale control framework. A beta-VAE architecture preserves spatial and temporal structure of 6-DoF manipulation trajectories while learning interpretable scale factors.
Method
Our framework addresses three key properties for policy parameterization:
- Compactness: A low-dimensional representation that provides easy, meaningful specifications for humans and efficient sampling for machines.
- Smoothness: Each parameter is a clear, continuous factor of variation, allowing continuous control over policy scales.
- Interpretability: The representation is disentangled, with scale-related parameters explicitly represented so that humans can understand and manipulate them.
We built a novel beta-VAE architecture with convolutional layers that preserves spatial and temporal structure of 6-DoF manipulation trajectories. The training objective combines pairwise ranking loss and masked KL divergence to capture continuous policy scales.
Figure 2: Beta-VAE architecture with custom loss combining pairwise ranking and masked KL divergence.
Results
Key achievements of this work include:
- Novel beta-VAE architecture preserving spatial and temporal structure of 6-DoF manipulation trajectories
- Custom loss combining pairwise ranking and masked KL divergence to capture continuous policy scales
- Trajectory collection pipeline using MetaWorld simulation for model validation
- Inverse kinematics visualization tool using Random Forest for real-time policy evaluation
- LLM-assisted scale perception module to automate labeling and enable generalized policy learning
Figure 3: Visualization of learned latent space showing continuous scale control across manipulation tasks.
Interactive Demo
Citation
@article{tang2025scale,
title = {Disentangled Scale Control for Robotic Policies},
author = {Tang, Yuer and Mao, Jiayuan},
year = {2025},
institution = {MIT CSAIL}
}