Overview
π0.5 is a vision-language-action (VLA) model from Physical Intelligence, jointly trained on robot demonstration data and large-scale multimodal data. It can perform long-horizon tasks in unseen real-world open environments and generalizes across them. This page focuses onws1, i.e. world_size=1 — a single rank with no distributed setup. Everything on this page targets this single-card configuration. The entry is PI05WS1Scheduler.
Architecture
PhyAI’s decomposes pi0.5 inference into four cooperating components:phyai/src/phyai/models/pi05
main_pi05.py
scheduler_ws1_pi05.py
model_runner_pi05.py
modeling_pi05.py
configuration_pi05.py
img_preprocess_pi05.py
tokenization_pi05.py
scheduler.setup() and scheduler.step().
Running pi0.5
Construct the engine
The plugin name is
"pi05". The engine handles setup, weight loading, and graph capture in one shot.max_batch_size fixes the captured-graph batch dimension. Pick based on the largest batch you’ll submit; smaller batches are padded internally.
Build a request
PI05Request carries the per-step inference inputs:| Field | Shape | Notes |
|---|---|---|
pixel_values | (B, 3, 3, H, W) | 3 cameras × 3 channels per robot, H = W = image_size |
input_ids | (B, tokenizer_max_length) int64 | Right-padded with zeros |
lang_lens | (B,) int64 | Real (un-padded) length per sample |
noise | (B, chunk_size, max_action_dim) or None | Optional; when None, the scheduler samples a fresh Gaussian internally |
B can be any value in [1, max_batch_size]. Build the tensors on the engine’s device; the scheduler validates shapes and raises immediately on mismatch.Step the engine
End-to-end example
examples/pi05/run_pi05.py exercises the full path with deterministic dummy inputs at max_batch_size ∈ {1, 4} and includes a multi-batch equivalence check. To run it:
PASS line for the equivalence check. Just change the path after --checkpoint to your local checkpoint path.
Current limitations
- Single GPU only. Tensor parallel, continuous batching, and preemption are all out of scope for
PI05WS1Scheduler. max_batch_sizeis fixed at engine construction. To change it, you must tear down and rebuild the engine.- The vision tower replays sequentially per real robot — it doesn’t batch along the camera dimension.

