Disentangled Scale Control for Robotic Policies

MIT CSAIL · Leslie Kaelbling Lab · 2025

Paper (coming soon) Code (coming soon)

Abstract

We develop compact, interpretable latent representations for fine-grained robotic manipulation. Our approach learns parameterizations where a small number of latent dimensions can continuously control policy scales (e.g., door-opening angle, motion speed) while preserving smoothness and interpretability for downstream task adaptation. The learned representation supports gradual adjustments rather than discrete switches, enabling humans to understand and manipulate scale-related parameters in downstream tasks.

Method overview figure — coming soon

Figure 1: Overview of the disentangled scale control framework. A beta-VAE architecture preserves spatial and temporal structure of 6-DoF manipulation trajectories while learning interpretable scale factors.

Method

Our framework addresses three key properties for policy parameterization:

Compactness: A low-dimensional representation that provides easy, meaningful specifications for humans and efficient sampling for machines.
Smoothness: Each parameter is a clear, continuous factor of variation, allowing continuous control over policy scales.
Interpretability: The representation is disentangled, with scale-related parameters explicitly represented so that humans can understand and manipulate them.

We built a novel beta-VAE architecture with convolutional layers that preserves spatial and temporal structure of 6-DoF manipulation trajectories. The training objective combines pairwise ranking loss and masked KL divergence to capture continuous policy scales.

Architecture diagram — coming soon

Figure 2: Beta-VAE architecture with custom loss combining pairwise ranking and masked KL divergence.

Results

Key achievements of this work include:

Novel beta-VAE architecture preserving spatial and temporal structure of 6-DoF manipulation trajectories
Custom loss combining pairwise ranking and masked KL divergence to capture continuous policy scales
Trajectory collection pipeline using MetaWorld simulation for model validation
Inverse kinematics visualization tool using Random Forest for real-time policy evaluation
LLM-assisted scale perception module to automate labeling and enable generalized policy learning

Results visualization — coming soon

Figure 3: Visualization of learned latent space showing continuous scale control across manipulation tasks.

Interactive Demo

Interactive demo — coming soon

Citation

@article{tang2025scale,
  title     = {Disentangled Scale Control for Robotic Policies},
  author    = {Tang, Yuer and Mao, Jiayuan},
  year      = {2025},
  institution = {MIT CSAIL}
}