RoboScribe

Language-guided policy generation for robosuite manipulation tasks.

Yuer Tang, Bree Chen

UCLA

Code Technical report (PDF) Watch demo

Motivation

Researchers waste hours writing scripted robot policies, wrestling with simulator APIs, observation keys, and controller bugs instead of doing research. RoboScribe fixes this. Tell it what the robot should do in plain English. It builds a step-by-step plan for you to review, then writes, tests, and debugs the code automatically until the policy works.

System overview

You describe the task. The system figures out how to do it. Here's what happens under the hood.

RoboScribe pipeline overview — task description through policy iteration

Figure 1. Two stages. First, the system detects your environment and proposes a plan — you review it. Then it writes the policy, runs it in simulation, reads what went wrong, and tries again.

Diagnosis and feedback in the agent loop

Figure 2. When a policy fails, the system combines trajectory data, optional vision analysis, and your feedback to figure out what went wrong — then fixes it.

Demo

Full walkthrough from task input to working policy.

Full UI run: task input, phase review, policy iteration, success.

How it works

Five steps from idea to working policy.

1

Describe task & set up

Type what you want the robot to do.

Task description UI
Task description and setup.
2

Confirm environment

The system picks the right simulation environment. You confirm.

Environment confirmation UI
Environment confirmation.
3

Phase review

See the proposed plan. Edit it or approve it.

Phase plan review
Phase plan with control logic and human approval.
4

Policy generation & simulation

Code is generated, tested in simulation, and revised automatically.

Code generation
Code generation.
Simulation rollout
Simulation rollout.
Diagnostics and feedback
Diagnostics and feedback.
5

Results

Download the working policy. Watch the replay.

Success summary
Success summary and metrics.
Replay
Replay of successful run.
Policy code
Generated policy code.
Diagnosis report
Diagnosis and feedback report.

Results

We tested on four robosuite tasks, from simple picks to contact-heavy manipulation. All runs used Qwen with a Panda robot and no human feedback during iteration.

Task rollouts

Lift trajectory
Lift (100%)
Stack trajectory
Stack (100%)
NutAssembly trajectory
NutAssembly (90%)
Door trajectory
Door (in progress)

Per-environment performance

Task Geometric Precision First Attempt Iteration Helps? Best Rate
Lift Low (XYZ translation) 100% N/A (succeeds immediately) 100%
Stack Medium (two-object) 0–100% Yes (0%→100%) 100%
NutAssembly High (quaternion math) 0–60% (variable) No (regresses) 40%
Door High (rotation + pull) 0% No (stuck) 0%

Iteration trajectory — Stack (0%→100%)

Stack went from 0% to 100% in one revision. The system spotted an alignment error and fixed it.

Attempt Success Diagnosis Agent Action
1 0% (0/10) "eef not aligning with cube positions"
2 100% (10/10) — (success) Fixed approach alignment logic

Key findings

Supported Environments

Currently supported robosuite tasks. Each card shows a real rollout from a generated policy.

Lift
Pick up a cube from the table
100% success — 1st attempt
Stack
Stack cube A (red) on top of cube B (green)
100% success — 1st attempt
NutAssemblySquare
Pick up a square nut and place it on a peg
90% success — requires domain constraints
Door
Open a door by turning the handle
In progress — benefits from human feedback

Run it locally

Install

# Python backend (includes all dependencies)
pip install -e "roboscribe/[sim]"

# React frontend
cd roboscribe/src/roboscribe/frontend && npm install

Run

# Terminal 1 — Backend
cd roboscribe/src/roboscribe
uvicorn roboscribe.server.main:app --host 0.0.0.0 --port 8000

# Terminal 2 — Frontend
cd roboscribe/src/roboscribe/frontend
npm run build && npm run preview

Open http://localhost:4173. To preview this documentation site: cd docs && python3 -m http.server 8080http://localhost:8080.

LLM backends

Provider Env variable Notes
Qwen DASHSCOPE_API_KEY Free tier
OpenAI OPENAI_API_KEY GPT-4o, vision diagnosis
Anthropic ANTHROPIC_API_KEY Claude
DeepSeek DEEPSEEK_API_KEY Budget-friendly