RoboScribe

Language-guided policy generation for robosuite manipulation tasks.

UCLA

Motivation

Researchers waste hours writing scripted robot policies, wrestling with simulator APIs, observation keys, and controller bugs instead of doing research. RoboScribe fixes this. Tell it what the robot should do in plain English. It builds a step-by-step plan for you to review, then writes, tests, and debugs the code automatically until the policy works.

System overview

You describe the task. The system figures out how to do it. Here's what happens under the hood.

RoboScribe pipeline overview — task description through policy iteration

Figure 1. Two stages. First, the system detects your environment and proposes a plan — you review it. Then it writes the policy, runs it in simulation, reads what went wrong, and tries again.

Diagnosis and feedback in the agent loop

Figure 2. When a policy fails, the system combines trajectory data, optional vision analysis, and your feedback to figure out what went wrong — then fixes it.

Demo

Full walkthrough from task input to working policy.

Full UI run: task input, phase review, policy iteration, success.

How it works

Five steps from idea to working policy.

Describe task & set up

Type what you want the robot to do.

Task description UI — Task description and setup.

Confirm environment

The system picks the right simulation environment. You confirm.

Environment confirmation UI — Environment confirmation.

Phase review

See the proposed plan. Edit it or approve it.

Phase plan review — Phase plan with control logic and human approval.

Policy generation & simulation

Code is generated, tested in simulation, and revised automatically.

Results

Download the working policy. Watch the replay.

Diagnosis report — Diagnosis and feedback report.

Results

We tested on four robosuite tasks, from simple picks to contact-heavy manipulation. All runs used Qwen with a Panda robot and no human feedback during iteration.

Task rollouts

NutAssembly trajectory — NutAssembly (90%)

Per-environment performance

Task	Geometric Precision	First Attempt	Iteration Helps?	Best Rate
Lift	Low (XYZ translation)	100%	N/A (succeeds immediately)	100%
Stack	Medium (two-object)	0–100%	Yes (0%→100%)	100%
NutAssembly	High (quaternion math)	0–60% (variable)	No (regresses)	40%
Door	High (rotation + pull)	0%	No (stuck)	0%

Iteration trajectory — Stack (0%→100%)

Stack went from 0% to 100% in one revision. The system spotted an alignment error and fixed it.

Attempt	Success	Diagnosis	Agent Action
1	0% (0/10)	"eef not aligning with cube positions"	—
2	100% (10/10)	— (success)	Fixed approach alignment logic

Key findings

Simple tasks (Lift, Stack) work every time. Hard tasks involving rotation and contact (NutAssembly, Door) remain challenging for LLMs.
The diagnostic system always spots what went wrong. The bottleneck is writing the fix, not finding the bug.
Iteration helps on medium tasks (Stack: 0% to 100% in one try). On hard tasks, retrying doesn't help because the LLM makes the same kind of mistake each time.
The best place for human input is the phase plan, not debugging code after the fact.

Supported Environments

Currently supported robosuite tasks. Each card shows a real rollout from a generated policy.

Lift

Pick up a cube from the table

100% success — 1st attempt

Stack

Stack cube A (red) on top of cube B (green)

100% success — 1st attempt

NutAssemblySquare

Pick up a square nut and place it on a peg

90% success — requires domain constraints

Door

Open a door by turning the handle

In progress — benefits from human feedback

Run it locally

Install

# Python backend (includes all dependencies)
pip install -e "roboscribe/[sim]"

# React frontend
cd roboscribe/src/roboscribe/frontend && npm install

Run

# Terminal 1 — Backend
cd roboscribe/src/roboscribe
uvicorn roboscribe.server.main:app --host 0.0.0.0 --port 8000

# Terminal 2 — Frontend
cd roboscribe/src/roboscribe/frontend
npm run build && npm run preview

Open http://localhost:4173. To preview this documentation site: cd docs && python3 -m http.server 8080 → http://localhost:8080.

LLM backends

Provider	Env variable	Notes
Qwen	`DASHSCOPE_API_KEY`	Free tier
OpenAI	`OPENAI_API_KEY`	GPT-4o, vision diagnosis
Anthropic	`ANTHROPIC_API_KEY`	Claude
DeepSeek	`DEEPSEEK_API_KEY`	Budget-friendly