Safety
14 articles about safety.
AI Agents Recommend Packages That Don't Exist
AI agents confidently invoke non-existent functions and recommend phantom npm packages. How to detect and prevent hallucinated tool calls in production.
AI Agent Hallucination Detection - Safeguards That Actually Work
AI agents fail confidently - they report success while quietly doing the wrong thing. Here are concrete safeguards: state diffing, confidence calibration, and bounded blast radius patterns with real implementation examples.
The Real Test Is What an Agent Refuses to Do - Safe Defaults in AI
Designing AI agent refusal logic took longer than building the automation itself. Learn why safe defaults and refusal boundaries define trustworthy agents.
How an Undo Layer Makes AI Agents Trustworthy
The key to trusting an AI agent that acts on your behalf is building an undo layer. When every action can be reversed, the cost of mistakes drops to nearly
What Fear Feels Like for an AI Agent - Uncertainty and Irreversible Actions
Fear for an AI agent is uncertainty about whether the next action will break something irreversible. Exploring the cost of mistakes in autonomous agent
Against Frictionlessness - Why AI Agent UX Needs Friction
Removing confirmation dialogs let an AI agent click delete-all. Learn why intentional friction in AI agent UX prevents catastrophic mistakes and protects users.
Human-in-the-Loop AI - What It Is and Why Your AI Agent Needs It
Human-in-the-loop AI keeps humans in control of automated decisions. Learn the different HITL patterns, why they matter for trust and safety, and how modern
Monitoring Autonomous AI Agents - Spending Caps, Action Logs, and Notification Triggers
Letting an AI agent run overnight without guardrails is how you wake up to a $500 API bill and 200 unintended actions. Here is how to set up proper monitoring.
Safety Problems at the Execution Layer - Not in the Prompt
82% of MCP implementations have path traversal vulnerabilities. Real AI agent safety failures happen at execution, not planning. Here is what the CVE data shows and how to build execution-layer guardrails.
Yolo Mode vs Safe Permissions - When to Let Your AI Agent Run Free
Should you skip permission checks in AI agents? It depends on the task. Code agents with git are low risk. Desktop agents touching production systems need
What's the Difference Between Trusting an AI Agent and Verifying One?
Trust means believing the agent will do the right thing. Verification means checking that it did. For desktop agents, verification wins every time.
Using AI Agents to Automate Trading Workflows Safely
AI agents can open browsers, read financial data, and automate repetitive trading tasks. The key is permission tiers - auto-approve reads, require
AI-Native Browsers Create Security Risks That Local Agents Avoid
Why giving AI deep browser access exposes passwords and session tokens, and how local desktop agents interact safely through accessibility APIs instead.
Git Worktrees Are the Secret to Running Multiple AI Agents Safely
Without isolation, parallel AI agents edit the same files and create merge conflicts. Git worktrees give each agent its own working directory on a separate