Vision
3 articles about vision.
Accessibility APIs vs OCR - Two Approaches to Desktop Agent Vision
·2 min read
Desktop agents need to see and understand what is on screen. Accessibility APIs give you the UI tree directly while OCR reads pixels. Each approach has real
accessibility-apiocrdesktop-agentvisionautomationdesktopagents
Using a Desktop AI Agent to Identify Fonts from Screenshots
·3 min read
A practical use case for desktop AI agents - identifying fonts from screenshots by combining screen capture with vision models for instant typography analysis.
desktop-agentfontsscreenshotsdesignautomationvision
DOM Understanding Is More Reliable Than Screenshot Vision for Browser Agents
·2 min read
Vision models guess what's on screen. DOM parsing knows exactly what elements exist, their states, and their relationships. For browser automation
domscreenshotvisionbrowser-agentreliability
Browse by Topic
Ai Agents (346)Automation (240)Productivity (203)Macos (192)Ai Agent (182)Claude Code (163)Desktop Agent (120)Open Source (106)Developer Tools (104)April 2026 (86)Reliability (83)Accessibility Api (79)Mcp (78)Parallel Agents (75)Desktop Automation (68)Multi Agent (64)Claude (56)Ai Coding (56)Security (54)Llm (51)