Agent Device: Giving Your AI Coding Agent Eyes and Hands on a Real Phone

AI coding agents have gotten frighteningly good at writing application code. Ask one to add a sign-in screen to your React Native app and it will happily produce components, hooks, and navigation wiring in seconds. There is just one problem: the agent has never actually seen the app run. It reasons about source code, not about pixels on a device. So when the keyboard covers the submit button, or a tap target is mysteriously dead, the agent has no idea — it only knows what the code says should happen.

Agent Device is built to fix exactly that gap. Distributed as the agent-device CLI from the team at Callstack, it gives an AI agent a real feedback loop on actual devices: open an app, inspect the current screen, interact with visible elements, and collect debugging evidence — all through one command-line tool. If you have used Vercel's agent-browser for the web, the mental model is identical, just pointed at mobile, TV, and desktop apps instead of web pages. It works with native iOS and Android apps as well as anything built with Expo, Flutter, or React Native, as long as the target runs on a supported device, simulator, emulator, or desktop environment.

Why Agents Need a Device, Not Just a Codebase

The pitch is simple: agents should verify what actually happens on a device, not just reason about code. That sounds obvious, but the design choices underneath are what make it usable inside an agent loop rather than a flaky bolt-on.

The headline feature is token efficiency. Instead of dumping full screenshots and verbose logs on every single step (which would torch an agent's context window), Agent Device hands back compact structured snapshots, semantic element references, and evidence captured only when you ask for it. The capabilities break down into a few clean buckets:

Inspect the real UI through structured accessibility snapshots, interactive refs like @e3, selectors, and even React Native component trees.
Interact by opening apps, tapping, typing, scrolling, performing gestures, waiting, asserting state, handling alerts, and closing sessions.
Capture evidence with screenshots, videos, logs, traces, network traffic, performance samples, crash context, and React profiles.
Replay workflows by recording .ad scripts for local runs, CI, and repeatable end-to-end checks — with strict Maestro YAML export when a flow needs to run in Maestro.
Run across platforms including iOS Simulator, Android Emulator, physical devices, tvOS, Android TV, macOS, and Linux desktop targets.

That last point is unusual. A single CLI that drives iOS and Android and TV and desktop apps is rare, and it means one agent workflow can cover an entire product surface instead of you juggling a different tool per platform.

Getting It Onto Your Machine

Agent Device ships as a global CLI. Installation is a one-liner, though the platform you want to drive determines which native toolchain you need installed alongside it.

# npm
npm install -g agent-device@latest

# yarn
yarn global add agent-device@latest

Once installed, confirm it is alive and read the workflow help — which the project explicitly calls the source of truth for agents:

agent-device --version
agent-device help workflow

A few prerequisites depend on your target. You will need Node.js 22 or newer across the board. For iOS, tvOS, and macOS targets you need Xcode. For Android you need the Android SDK plus ADB. And for macOS desktop automation you will need to grant the Accessibility permission. The runtime footprint of the package itself is tiny — it ships with essentially one dependency (yaml) — so almost all of the setup weight comes from the native platform tooling you already have if you build mobile apps.

A First Round Trip Through a Session

The core loop is delightfully readable, which matters when an agent is the one driving it. You discover an app, open a session, snapshot the screen to get actionable references, act on them, capture proof, and close out.

# Find an app to drive
agent-device apps --platform ios
agent-device apps --platform android

# Start a session against a simulator
agent-device open SampleApp --platform ios

# Inspect the current screen. -i returns interactive elements only.
agent-device snapshot -i
# @e1 [heading]    "Settings"
# @e2 [button]     "Sign In"
# @e3 [text-field] "Email"

# Act on a ref, capture evidence, then clean up
agent-device fill @e3 "test@example.com"
agent-device screenshot ./artifacts/settings.png
agent-device close

The key idea is the ref model. Every snapshot assigns short references like @e1, @e2, and @e3 to elements on the current screen. Those refs from the latest snapshot are immediately actionable, so the agent taps and types against stable semantic handles rather than guessing at screen coordinates. The catch worth internalizing: refs are scoped to the snapshot that produced them. After you scroll or navigate to a new screen, the old refs may no longer point where you think, so you take a fresh snapshot. It is the same discipline that keeps web agents grounded in real DOM state, applied to a phone.

Wiring It Into Your Agent

Agent Device is meant to be run straight from an agent's terminal — Cursor, Codex, Claude Code, Windsurf, or any other agent that can shell out. Because the installed CLI help is treated as the canonical instruction set, an agent can orient itself by reading the help topics rather than relying on stale memorized commands.

# The recommended entry point for any agent
agent-device help workflow

# Topic-specific help when a task gets more involved
agent-device help debugging
agent-device help replay

Beyond raw CLI invocation, Agent Device also exposes MCP (Model Context Protocol) support, so it can be registered as a set of direct tools inside MCP-aware clients instead of being shelled out blindly. That gives you a choice: let the agent call the CLI as a terminal command, or surface the operations as first-class tools through MCP, depending on how your agent stack is wired. Either way, the workflow you saw above — discover, open, snapshot, act, capture, close — stays the same.

From Throwaway Exploration to Durable CI Checks

The feature that turns Agent Device from a debugging toy into something you can lean on is replay. As an agent (or you) pokes around an app, useful interactions can be recorded into .ad script files. Those scripts can be replayed locally, run in CI, or kept around as repeatable end-to-end checks. The exploratory session you ran once becomes a regression test you run forever.

And when a flow needs to live in a more conventional runner, Agent Device offers strict Maestro YAML export. This is the "explore now, codify later" path: you let an agent discover the happy path interactively, then export it to Maestro so it slots into an existing test suite. That framing is important — Agent Device is explicit that it is not trying to replace Appium, Detox, or Maestro. Those are traditional automation frameworks written for human-authored test suites. Agent Device is the live-verification layer for the agentic loop, optimized for an AI that needs to inspect state, interact semantically, capture evidence, and only then promote the good runs into durable checks. It complements your test framework rather than competing with it.

For continuous integration, there is an EAS workflow template to crib from, and the documentation points at cloud and remote-execution options (including Linux runners and managed devices) for teams that want device verification running on every pull request. The tool is already in use by teams at Callstack, JPMorgan Chase, Expensify, Shopify, and others, which is a reasonable signal that the model holds up beyond a demo.

Knowing What You're Stepping Into

A bit of honesty about maturity is worth your while. As of version 0.17.6, Agent Device is firmly pre-1.0 and moving fast — it went from its first release to seventeen minor versions in roughly four months, often shipping several releases a week. That velocity is great for momentum and bad for stability guarantees: commands and flags can shift between minor versions, so pin a specific version in CI rather than tracking @latest there.

You should also remember that Agent Device sits on top of real platform tooling. It runs session-aware commands through platform backends — XCTest for iOS and tvOS, ADB plus an Android snapshot helper for Android, a local helper for macOS, and AT-SPI for Linux. That means the usual native prerequisites apply, and a flaky simulator or a missing Xcode component is still a flaky simulator. Agent Device makes the device legible to an agent; it does not make device setup disappear.

The Takeaway

For a long time, the agentic development loop had a blind spot exactly where it mattered most: the moment code becomes a running app. An agent could write the feature and write the test, but it could not look at the screen and say "the button is below the fold" or "this tap does nothing." Agent Device fills that spot with a token-conscious, cross-platform CLI that lets the agent open the app, read the UI through stable refs, act on it, gather just enough evidence, and turn the good runs into replayable checks. If you are building React Native, Expo, Flutter, or native mobile apps with an AI agent in the loop, it is well worth wiring up — just pin your version and keep your Xcode and Android SDK in good order, and let the agent finally see what it is building.