Safety

14 articles about safety.

AI Agents Recommend Packages That Don't Exist

March 18, 2026·2 min read

AI agents confidently invoke non-existent functions and recommend phantom npm packages. How to detect and prevent hallucinated tool calls in production.

hallucinationphantom-packagestool-callssafetyai-agentsai_agents

AI Agent Hallucination Detection - Safeguards That Actually Work

March 18, 2026·6 min read

AI agents fail confidently - they report success while quietly doing the wrong thing. Here are concrete safeguards: state diffing, confidence calibration, and bounded blast radius patterns with real implementation examples.

hallucinationai-agentreliabilityverificationsafety

The Real Test Is What an Agent Refuses to Do - Safe Defaults in AI

March 18, 2026·3 min read

Designing AI agent refusal logic took longer than building the automation itself. Learn why safe defaults and refusal boundaries define trustworthy agents.

refusal-logicsafetyai-agentdefaultstrust

How an Undo Layer Makes AI Agents Trustworthy

March 18, 2026·2 min read

The key to trusting an AI agent that acts on your behalf is building an undo layer. When every action can be reversed, the cost of mistakes drops to nearly

trustundoai-agentsafetydesktop-agentchatgptcoding

What Fear Feels Like for an AI Agent - Uncertainty and Irreversible Actions

March 18, 2026·2 min read

Fear for an AI agent is uncertainty about whether the next action will break something irreversible. Exploring the cost of mistakes in autonomous agent

ai-agenterror-handlingreliabilityautonomous-executionsafety

Against Frictionlessness - Why AI Agent UX Needs Friction

March 18, 2026·3 min read

Removing confirmation dialogs let an AI agent click delete-all. Learn why intentional friction in AI agent UX prevents catastrophic mistakes and protects users.

uxfrictionsafetyai-agentdesign

Human-in-the-Loop AI - What It Is and Why Your AI Agent Needs It

March 18, 2026·11 min read

Human-in-the-loop AI keeps humans in control of automated decisions. Learn the different HITL patterns, why they matter for trust and safety, and how modern

ai-agentssafetyenterpriseexplainer

Monitoring Autonomous AI Agents - Spending Caps, Action Logs, and Notification Triggers

March 18, 2026·3 min read

Letting an AI agent run overnight without guardrails is how you wake up to a $500 API bill and 200 unintended actions. Here is how to set up proper monitoring.

monitoringautonomous-agentsspending-capssafetynotificationsai_agents

Safety Problems at the Execution Layer - Not in the Prompt

March 18, 2026·6 min read

82% of MCP implementations have path traversal vulnerabilities. Real AI agent safety failures happen at execution, not planning. Here is what the CVE data shows and how to build execution-layer guardrails.

safetyexecution-layersecurityai-agentsguardrailsartificial

Yolo Mode vs Safe Permissions - When to Let Your AI Agent Run Free

March 18, 2026·2 min read

Should you skip permission checks in AI agents? It depends on the task. Code agents with git are low risk. Desktop agents touching production systems need

ai-agentpermissionssecurityyolo-modesafety

What's the Difference Between Trusting an AI Agent and Verifying One?

March 17, 2026·2 min read

Trust means believing the agent will do the right thing. Verification means checking that it did. For desktop agents, verification wins every time.

trustverificationai-agentsafetyobservability

Using AI Agents to Automate Trading Workflows Safely

March 17, 2026·2 min read

AI agents can open browsers, read financial data, and automate repetitive trading tasks. The key is permission tiers - auto-approve reads, require

tradingautomationai-agentfinancesafety

AI-Native Browsers Create Security Risks That Local Agents Avoid

March 17, 2026·2 min read

Why giving AI deep browser access exposes passwords and session tokens, and how local desktop agents interact safely through accessibility APIs instead.

browser-securitylocal-agentcredentialsprivacysafety

Git Worktrees Are the Secret to Running Multiple AI Agents Safely

March 17, 2026·2 min read

Without isolation, parallel AI agents edit the same files and create merge conflicts. Git worktrees give each agent its own working directory on a separate

git-worktreemulti-agentisolationparallel-developmentsafety

Safety

AI Agents Recommend Packages That Don't Exist

AI Agent Hallucination Detection - Safeguards That Actually Work

The Real Test Is What an Agent Refuses to Do - Safe Defaults in AI

How an Undo Layer Makes AI Agents Trustworthy

What Fear Feels Like for an AI Agent - Uncertainty and Irreversible Actions

Against Frictionlessness - Why AI Agent UX Needs Friction

Human-in-the-Loop AI - What It Is and Why Your AI Agent Needs It

Monitoring Autonomous AI Agents - Spending Caps, Action Logs, and Notification Triggers

Safety Problems at the Execution Layer - Not in the Prompt

Yolo Mode vs Safe Permissions - When to Let Your AI Agent Run Free

What's the Difference Between Trusting an AI Agent and Verifying One?

Using AI Agents to Automate Trading Workflows Safely

AI-Native Browsers Create Security Risks That Local Agents Avoid

Git Worktrees Are the Secret to Running Multiple AI Agents Safely

Browse by Topic