Vision
3 articles about vision.
Accessibility APIs vs OCR - Two Approaches to Desktop Agent Vision
·2 min read
Desktop agents need to see and understand what is on screen. Accessibility APIs give you the UI tree directly while OCR reads pixels. Each approach has real trade-offs in speed, reliability, and information quality.
accessibility-apiocrdesktop-agentvisionautomation
Using a Desktop AI Agent to Identify Fonts from Screenshots
·3 min read
A practical use case for desktop AI agents - identifying fonts from screenshots by combining screen capture with vision models for instant typography analysis.
desktop-agentfontsscreenshotsdesignautomationvision
DOM Understanding Is More Reliable Than Screenshot Vision for Browser Agents
·2 min read
Vision models guess what's on screen. DOM parsing knows exactly what elements exist, their states, and their relationships. For browser automation, structured data wins.
domscreenshotvisionbrowser-agentreliability
Browse by Topic
Claude Code (101)Automation (94)Macos (79)Productivity (76)Ai Agent (74)Ai Agents (61)Desktop Agent (54)Parallel Agents (49)Accessibility Api (39)Tutorial (37)Developer Tools (34)Claude Md (31)Comparison (31)Mcp (29)Developer Workflow (27)Desktop Automation (26)Open Source (25)Memory (24)Privacy (22)Workflow (22)