Disentangled Scale Control for Robotic Policies

Yuer Tang, Jiayuan Mao

MIT CSAIL · Leslie Kaelbling Lab · 2025
Paper (coming soon) Code (coming soon)

Abstract

We develop compact, interpretable latent representations for fine-grained robotic manipulation. Our approach learns parameterizations where a small number of latent dimensions can continuously control policy scales (e.g., door-opening angle, motion speed) while preserving smoothness and interpretability for downstream task adaptation. The learned representation supports gradual adjustments rather than discrete switches, enabling humans to understand and manipulate scale-related parameters in downstream tasks.

Method overview figure — coming soon

Figure 1: Overview of the disentangled scale control framework. A beta-VAE architecture preserves spatial and temporal structure of 6-DoF manipulation trajectories while learning interpretable scale factors.

Method

Our framework addresses three key properties for policy parameterization:

  • Compactness: A low-dimensional representation that provides easy, meaningful specifications for humans and efficient sampling for machines.
  • Smoothness: Each parameter is a clear, continuous factor of variation, allowing continuous control over policy scales.
  • Interpretability: The representation is disentangled, with scale-related parameters explicitly represented so that humans can understand and manipulate them.

We built a novel beta-VAE architecture with convolutional layers that preserves spatial and temporal structure of 6-DoF manipulation trajectories. The training objective combines pairwise ranking loss and masked KL divergence to capture continuous policy scales.

Architecture diagram — coming soon

Figure 2: Beta-VAE architecture with custom loss combining pairwise ranking and masked KL divergence.

Results

Key achievements of this work include:

  • Novel beta-VAE architecture preserving spatial and temporal structure of 6-DoF manipulation trajectories
  • Custom loss combining pairwise ranking and masked KL divergence to capture continuous policy scales
  • Trajectory collection pipeline using MetaWorld simulation for model validation
  • Inverse kinematics visualization tool using Random Forest for real-time policy evaluation
  • LLM-assisted scale perception module to automate labeling and enable generalized policy learning
Results visualization — coming soon

Figure 3: Visualization of learned latent space showing continuous scale control across manipulation tasks.

Interactive Demo

Interactive demo — coming soon

Citation

@article{tang2025scale,
  title     = {Disentangled Scale Control for Robotic Policies},
  author    = {Tang, Yuer and Mao, Jiayuan},
  year      = {2025},
  institution = {MIT CSAIL}
}