> ## Documentation Index
> Fetch the complete documentation index at: https://phyai.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Single-GPU Inference for PI0.5

> How PhyAI runs pi0.5 inference on a single GPU

export const ModelCard = ({title, subtitle, icon, rows = {}}) => {
  const entries = Object.entries(rows);
  const renderValue = value => {
    if (value === null || value === undefined) {
      return <span className="text-sm text-zinc-400 dark:text-zinc-600">—</span>;
    }
    if (Array.isArray(value)) {
      return <div className="flex flex-wrap gap-1.5">
                    {value.map((v, i) => <span key={i} className="inline-flex items-center px-2 py-0.5 rounded-md text-[11.5px] font-medium bg-[#003399]/[0.06] text-[#003399] ring-1 ring-inset ring-[#003399]/15 dark:bg-[#60A5FA]/[0.10] dark:text-[#60A5FA] dark:ring-[#60A5FA]/20">
                            {v}
                        </span>)}
                </div>;
    }
    if (typeof value === "string" || typeof value === "number") {
      return <span className="text-sm text-zinc-800 dark:text-zinc-100 break-words">
                    {value}
                </span>;
    }
    return value;
  };
  const hasHeader = title || subtitle || icon;
  return <div className="not-prose my-6 overflow-hidden rounded-xl bg-white dark:bg-zinc-900 ring-1 ring-zinc-200 dark:ring-zinc-800 shadow-[0_1px_2px_rgb(15_23_42_/_0.04),0_4px_16px_-4px_rgb(15_23_42_/_0.06)] dark:shadow-[0_1px_0_rgb(255_255_255_/_0.04)_inset,0_8px_24px_-8px_rgb(0_0_0_/_0.5)]">
            {hasHeader && <div className="flex items-center gap-3.5 px-5 py-4 bg-zinc-50/60 dark:bg-zinc-800/20 border-b border-zinc-200/80 dark:border-zinc-800/80">
                    {icon && <div className="flex h-10 w-10 shrink-0 items-center justify-center rounded-[10px] bg-gradient-to-br from-[#003399] to-[#2563EB] text-white text-lg font-semibold ring-1 ring-inset ring-white/10 shadow-[0_1px_2px_rgb(0_51_153_/_0.25),0_3px_6px_-2px_rgb(0_51_153_/_0.18)]">
                            {icon}
                        </div>}
                    <div className="min-w-0">
                        {title && <div className="text-[15px] font-semibold tracking-tight text-zinc-900 dark:text-zinc-50">
                                {title}
                            </div>}
                        {subtitle && <div className="mt-0.5 text-xs text-zinc-500 dark:text-zinc-400">
                                {subtitle}
                            </div>}
                    </div>
                </div>}

            <div>
                {entries.map(([key, value], i) => <div key={key} className={`flex items-stretch ${i < entries.length - 1 ? "border-b border-zinc-100 dark:border-zinc-800/60" : ""}`}>
                        <div className="w-44 shrink-0 flex items-center px-5 py-3 text-[13px] font-medium text-zinc-500 dark:text-zinc-400">
                            {key}
                        </div>
                        <div className="flex-1 flex items-center px-5 py-3 min-w-0">
                            {renderValue(value)}
                        </div>
                    </div>)}
            </div>
        </div>;
};

<ModelCard
  title="pi0.5 (pi05_base)"
  subtitle="Vision-Language-Action · Single-GPU Inference"
  icon="π"
  rows={{
"Model Type": "VLA",
"Weights": <a href="https://huggingface.co/lerobot/pi05_base" target="_blank" rel="noreferrer" className="text-sm text-[#003399] dark:text-[#60A5FA] underline underline-offset-2 hover:opacity-80 break-all">huggingface.co/lerobot/pi05_base</a>,
"Tags": ["VLA", "flow-matching", "PaliGemma", "SigLIP", "single-GPU"],
"Image Input": "3-camera RGB · 224×224",
"Tokenizer Length": "200",
"Entry Point": <code className="px-2 py-0.5 rounded bg-[#003399]/10 dark:bg-[#60A5FA]/15 text-[#003399] dark:text-[#60A5FA] text-xs font-mono">PI05WS1Scheduler</code>,
"Param Precision": "bf16",
"Paper": <a href="https://www.pi.website/blog/pi05" target="_blank" rel="noreferrer" className="text-sm text-[#003399] dark:text-[#60A5FA] underline underline-offset-2 hover:opacity-80 break-all">pi.website/blog/pi05</a>,
}}
/>

# Overview

π0.5 is a vision-language-action (VLA) model from Physical Intelligence, jointly trained on robot demonstration data and large-scale multimodal data. It can perform long-horizon tasks in unseen real-world open environments and generalizes across them.

This page focuses on `ws1`, i.e. `world_size=1` — a single rank with no distributed setup. Everything on this page targets this single-card configuration. The `entry` is `PI05WS1Scheduler`.

<img src="https://mintcdn.com/phyai/5pJPpWhhmXXF8M7E/images/models/pi05/inference_pipeline.svg?fit=max&auto=format&n=5pJPpWhhmXXF8M7E&q=85&s=be595475f922eb6d5be530d4ecf2af05" alt="PI0.5 model execution pipeline" width="960" height="480" data-path="images/models/pi05/inference_pipeline.svg" />

# Architecture

PhyAI's <Tooltip headline="Engine + plugin" tip="Engine.__init__ runs a fixed sequence (config, CUDA, dist, parallel, linear), then resolves the plugin by name and calls its setup(). Each plugin declares an Entry subclass + an EntryArgs subclass.">engine + plugin contract</Tooltip> decomposes pi0.5 inference into four cooperating components:

<Tree>
  <Tree.Folder name="phyai/src/phyai/models/pi05" defaultOpen>
    <Tree.File name="main_pi05.py" />

    <Tree.File name="scheduler_ws1_pi05.py" />

    <Tree.File name="model_runner_pi05.py" />

    <Tree.File name="modeling_pi05.py" />

    <Tree.File name="configuration_pi05.py" />

    <Tree.File name="img_preprocess_pi05.py" />

    <Tree.File name="tokenization_pi05.py" />
  </Tree.Folder>
</Tree>

The diagram below illustrates how phyai's three model runners cooperate with the scheduler, and how the engine bootstrap hands off to `scheduler.setup()` and `scheduler.step()`.

<img src="https://mintcdn.com/phyai/kTrpPyWbAvy9VX8Q/images/models/pi05/engine_lifecycle.svg?fit=max&auto=format&n=kTrpPyWbAvy9VX8Q&q=85&s=a3709c417b43965fc4806ec5548d35ff" alt="PhyAI Engine ↔ Scheduler ↔ 3 Runners lifecycle" width="960" height="760" data-path="images/models/pi05/engine_lifecycle.svg" />

# Running pi0.5

<Steps>
  <Step title="Get the weights">
    Prepare a `pi05_base` safetensors checkpoint. You can download it from huggingface:

    ```
    https://huggingface.co/lerobot/pi05_base
    ```
  </Step>

  <Step title="Construct the engine">
    The plugin name is `"pi05"`. The engine handles setup, weight loading, and graph capture in one shot.

    ```python theme={null}
    import torch
    from pathlib import Path
    from phyai.engine import Engine, EngineArgs
    from phyai.engine_config import DeviceConfig, EngineConfig, RuntimeConfig
    from phyai.models.pi05.main_pi05 import PI05Args

    engine = Engine(
        EngineArgs(
            plugin="pi05",
            plugin_args=PI05Args(
                checkpoint_dir=Path("/path/to/pi05_base/"),
                max_batch_size=4,
            ),
            config=EngineConfig(
                device=DeviceConfig(target="cuda", params_dtype=torch.bfloat16),
                runtime=RuntimeConfig(use_cuda_graph=True),
            ),
        )
    )
    ```

    `max_batch_size` fixes the captured-graph batch dimension. Pick based on the largest batch you'll submit; smaller batches are padded internally.
    <Warning>The cuda graph batch bucketing optimization is not enabled when WS=1.</Warning>
  </Step>

  <Step title="Build a request">
    `PI05Request` carries the per-step inference inputs:

    | Field          | Shape                                       | Notes                                                                    |
    | -------------- | ------------------------------------------- | ------------------------------------------------------------------------ |
    | `pixel_values` | `(B, 3, 3, H, W)`                           | 3 cameras × 3 channels per robot, `H = W = image_size`                   |
    | `input_ids`    | `(B, tokenizer_max_length)` int64           | Right-padded with zeros                                                  |
    | `lang_lens`    | `(B,)` int64                                | Real (un-padded) length per sample                                       |
    | `noise`        | `(B, chunk_size, max_action_dim)` or `None` | Optional; when `None`, the scheduler samples a fresh Gaussian internally |

    `B` can be any value in `[1, max_batch_size]`. Build the tensors on the engine's device; the scheduler validates shapes and raises immediately on mismatch.
  </Step>

  <Step title="Step the engine">
    ```python theme={null}
    actions = engine.step(request)  # (actual_B, chunk_size, max_action_dim)
    ```

    The padding is sliced off before returning — the tensor you get has its leading dim equal to the real batch.
  </Step>

  <Step title="Close the engine">
    ```python theme={null}
    engine.close()
    ```

    Releases the scheduler's buffers and tears down the captured cuda graphs.
  </Step>
</Steps>

# End-to-end example

`examples/pi05/run_pi05.py` exercises the full path with deterministic dummy inputs at `max_batch_size ∈ {1, 4}` and includes a multi-batch equivalence check. To run it:

```bash theme={null}
uv run python examples/pi05/run_pi05.py --checkpoint /path/to/pi0.5
```

The script prints per-phase latency stats (mean / median / std / min / max over 3 warmups + 30 timed runs) and a `PASS` line for the equivalence check. Just change the path after `--checkpoint` to your local checkpoint path.

# Current limitations

* Single GPU only. Tensor parallel, continuous batching, and preemption are all out of scope for `PI05WS1Scheduler`.
* `max_batch_size` is fixed at engine construction. To change it, you must tear down and rebuild the engine.
* The vision tower replays sequentially per real robot — it doesn't batch along the camera dimension.

# Full example

```python theme={null}
from pathlib import Path

import torch

from phyai.engine import Engine, EngineArgs
from phyai.engine_config import DeviceConfig, EngineConfig, RuntimeConfig
from phyai.models.pi05.configuration_pi05 import PI05Config
from phyai.models.pi05.main_pi05 import PI05Args
from phyai.models.pi05.scheduler_ws1_pi05 import PI05Request
from phyai.utils import load_config

CHECKPOINT_DIR = Path("/path/to/pi05_base/")  # change to your local checkpoint folder
BATCH_SIZE = 1

cfg = load_config(CHECKPOINT_DIR, PI05Config)
device = torch.device("cuda")
dtype = torch.bfloat16

# 1. Construct the Engine — runs setup, weight loading, and CUDA graph capture in one shot.
engine = Engine(
    EngineArgs(
        plugin="pi05",
        plugin_args=PI05Args(
            checkpoint_dir=CHECKPOINT_DIR,
            max_batch_size=BATCH_SIZE,
        ),
        config=EngineConfig(
            device=DeviceConfig(target="cuda", params_dtype=dtype),
            runtime=RuntimeConfig(use_cuda_graph=True),
        ),
    )
)
try:
    # 2. Build a dummy request: random pixels + single-token prompt.
    input_ids = torch.zeros(
        BATCH_SIZE, cfg.tokenizer_max_length, dtype=torch.int64, device=device
    )
    input_ids[:, 0] = 2  # any non-pad token id
    request = PI05Request(
        pixel_values=torch.rand(
            BATCH_SIZE,
            3,
            3,
            cfg.vision.image_size,
            cfg.vision.image_size,
            dtype=dtype,
            device=device,
        ),
        input_ids=input_ids,
        lang_lens=torch.ones(BATCH_SIZE, dtype=torch.int64, device=device),
    )

    # 3. Run one inference step.
    actions = engine.step(request)
    print(f"action chunk shape={tuple(actions.shape)}, dtype={actions.dtype}")
finally:
    # 4. Release scheduler buffers and tear down captured CUDA graphs.
    engine.close()
```
