Auto-Detecting What Your AI Agent Should Do Based on App Context
Auto-Detecting What Your AI Agent Should Do
Most AI agent interfaces require you to specify what you want done. Open the agent, pick a skill or workflow, provide the input. This is backwards.
A desktop agent already knows what app you are using and what is on your screen. It should surface the right automation automatically.
Context-Aware Skills
When you are in Mail, the relevant skills are:
- Draft a reply
- Summarize the email thread
- Forward with a summary
- Create a task from the email
When you are in a browser on HubSpot, the relevant skills are:
- Update the deal stage
- Add call notes
- Set a follow-up
- Export contact information
The agent detects the active app through the accessibility tree, identifies what is on screen, and surfaces the most likely action. You confirm with one tap or voice command instead of navigating a menu.
Implementation
The accessibility API tells you:
- Which app is in the foreground (
AXFocusedApplication) - What window is active
- What UI elements are visible
- What content is displayed
Map these signals to a skill library:
Mail + compose window → draft skills
Safari + HubSpot URL → CRM skills
Finder + Downloads folder → file organization skills
Terminal + git repo → development skills
Destructive Command Detection
Not all auto-detected skills should auto-execute. The same context awareness that surfaces skills should also flag dangerous ones:
- "Delete all" buttons detected → require explicit confirmation
- Email send actions → preview before executing
- File deletion → show what will be removed
Context awareness cuts both ways. Use it to accelerate safe actions and slow down dangerous ones.
Fazm uses accessibility APIs for context-aware skill detection. Open source on GitHub. Discussed in r/ClaudeAI.