Background Operation

What works headless and what needs window focus.

Background-first

agent-click is designed to run actions without bringing apps to the foreground. Most operations work while the app is minimized or behind other windows.

Native apps (macOS)

ActionHeadless?Method
Click (single)YesAXPress
Type with selectorYesAXSetValue
Read textYesAccessibility tree
SnapshotYesAccessibility tree
Get valueYesAccessibility tree
Move windowYesAXSetPosition
Resize windowYesAXSetSize
Scroll to elementYesAXScrollToVisible
Key pressNoCGEvent (needs focus)
Scroll directionNoCGEvent (needs focus)
Double-clickNoMouse simulation
Right-clickNoMouse simulation
DragNoMouse simulation
ScreenshotNoNeeds visible window

Electron apps (CDP)

ActionHeadless?Method
ClickYesJS element.click()
TypeYesInput.insertText
Key pressYesInput.dispatchKeyEvent
ScrollYesJS scrollBy()
Read textYesdocument.body.innerText
SnapshotYesDOM walker via Runtime.evaluate
ScreenshotNoNeeds visible window

How AXPress works

When you run agent-click click @e5, agent-click sends an AXPress action to the element via the macOS accessibility API. This is the same mechanism VoiceOver uses — it doesn't move the mouse cursor or activate the window. The app processes the press in its event loop without becoming frontmost.

If AXPress isn't supported for a particular element (rare), agent-click falls back to coordinate-based mouse clicking, which does require the window to be visible.

Why scroll and key need focus

macOS routes keyboard and scroll events to the frontmost application. There's no accessibility equivalent of AXPress for keyboard input. The OS security model prevents injecting keystrokes into background apps.

For Electron apps, this limitation doesn't apply — CDP dispatches events directly to the Chromium process via WebSocket, bypassing the OS event system entirely.