agent-click

Computer use CLI for AI Agents.

macOS — available now
Windows — coming soon
Linux — coming soon

Agent-click lets you control desktop apps from the terminal. Click buttons, type into fields, read what's on screen. All using a single CLI.

Built for AI agents. An agent can snapshot the screen, decide what to click, and act while you sit back and watch. Star it on GitHub

npm install -g agent-click

How it works

agent-click reads the accessibility tree — the same structure screen readers use. It sees every button, text field, and menu item in any app. You point, it acts.

1

Snapshot

Capture every interactive element. Each gets a ref.
$ agent-click snapshot -a Calculator -i -c
[@e1] button "All Clear"   [@e5] button "7"
[@e8] button "Multiply"    [@e11] button "6"
[@e20] button "Equals"
2

Act

Use refs to click, type, or read.
$ agent-click click @e5 && agent-click click @e8 && agent-click click @e11 && agent-click click @e20
$ agent-click text -a Calculator
42
3

Re-snapshot

UI changed? Snapshot again for fresh refs.
$ agent-click snapshot -a Calculator -i -c

What can you do with it?

Anything you'd do by clicking around:

Open Maps and search for the Colosseum
Send a Slack message to a teammate
Fill out a form in a browser
Multiply numbers in Calculator
Read the price of a flight from a booking site
Scrape data from a desktop app into a spreadsheet
Click through a setup wizard automatically
Automate a multi-step workflow with a YAML file