agent-click
Computer use CLI for AI Agents.
macOS — available now
Windows — coming soon
Linux — coming soon
Agent-click lets you control desktop apps from the terminal. Click buttons, type into fields, read what's on screen. All using a single CLI.
Built for AI agents. An agent can snapshot the screen, decide what to click, and act while you sit back and watch. Star it on GitHub
npm install -g agent-click
Quick Start
Install and try it in 2 minutesCommands
Everything agent-click can doSnapshots
See what's on screen, then act on itSelectors
Find buttons, fields, anythingWorkflows
Chain steps into reusable scriptsAI Agents
How agents use agent-clickHow it works
agent-click reads the accessibility tree — the same structure screen readers use. It sees every button, text field, and menu item in any app. You point, it acts.
1
Snapshot
Capture every interactive element. Each gets a ref.
$ agent-click snapshot -a Calculator -i -c [@e1] button "All Clear" [@e5] button "7" [@e8] button "Multiply" [@e11] button "6" [@e20] button "Equals"
2
Act
Use refs to click, type, or read.
$ agent-click click @e5 && agent-click click @e8 && agent-click click @e11 && agent-click click @e20 $ agent-click text -a Calculator 42
3
Re-snapshot
UI changed? Snapshot again for fresh refs.
$ agent-click snapshot -a Calculator -i -c
What can you do with it?
Anything you'd do by clicking around:
Open Maps and search for the Colosseum
Send a Slack message to a teammate
Fill out a form in a browser
Multiply numbers in Calculator
Read the price of a flight from a booking site
Scrape data from a desktop app into a spreadsheet
Click through a setup wizard automatically
Automate a multi-step workflow with a YAML file