Mapping AI Agent Permissions in Cloud with Graph-Based Inventories

M
Matthew Diakonov

Mapping AI Agent Permissions in Cloud with Graph-Based Inventories

When you have three AI agents running in your cloud, you can track permissions in a spreadsheet. When you have thirty, you need a graph. Tools like Cartography build a graph-based inventory of your entire cloud infrastructure - and that graph becomes essential when AI agents start accumulating permissions nobody tracks.

The Cloud Native Computing Foundation accepted Cartography as a sandbox project precisely because this problem is widespread. Operators at Netflix, Airbnb, and Lyft built it to answer questions like "what can this service account reach?" - questions that become existential once autonomous agents start acquiring credentials at scale.

The Permission Creep Problem

AI agents need access to work. The email agent needs Gmail API access. The deployment agent needs AWS credentials. The monitoring agent needs read access to production logs. Over time, these permissions accumulate.

Nobody revokes the temporary S3 access the data agent needed for that one migration six months ago. The IAM role created for a proof-of-concept agent stays active. The overly broad "AmazonS3FullAccess" policy that was "just for testing" becomes permanent. Research from the Cloud Security Alliance found that in agentic deployments, every agent is effectively a privileged identity, and every integration is a potential liability unless managed like one.

The problem compounds because AI agents often create sub-agents and delegate credentials dynamically. An orchestrator agent might spin up five worker agents for a task and grant each one a copy of its own credentials. After the task, the orchestrator terminates - but the delegated credentials may not be revoked.

How Graph Inventories Work

Cartography pulls data from AWS, GCP, Azure, and SaaS APIs on a schedule. It builds a Neo4j graph where nodes are resources (S3 buckets, RDS databases, Lambda functions, IAM roles) and edges are access relationships. Add your AI agents as custom nodes and you can query exactly what each agent can reach.

The schema for AWS is extensive: Cartography maps ACM, API Gateway, Bedrock, CloudWatch, CodeBuild, Config, Cognito, EC2, ECS, ECR, EFS, Elasticsearch, EKS, DynamoDB, Glue, GuardDuty, IAM, Inspector, KMS, Lambda, RDS, Redshift, Route53, S3, SageMaker, Secrets Manager, Security Hub, SNS, SQS, SSM, and STS. GCP coverage includes Cloud Functions, Cloud Run, Cloud SQL, Compute, DNS, IAM, KMS, Secret Manager, Storage, and GKE.

A practical query to find agents with production database access:

MATCH (agent:AIAgent)-[:HAS_ROLE]->(role:AWSRole)-[:ALLOWS]->(resource:RDSInstance)
WHERE resource.environment = 'production'
RETURN agent.name, role.arn, resource.id, agent.last_used
ORDER BY agent.last_used DESC

You can also query for agents that have access they have not used recently - a sign that permissions should be revoked:

MATCH (agent:AIAgent)-[:HAS_ACCESS]->(resource)
WHERE agent.last_used < datetime() - duration('P90D')
  AND resource.sensitivity = 'high'
RETURN agent.name, resource.name, agent.last_used

Blast Radius Analysis

The real power is answering "what if this agent is compromised?" Traverse the graph from the compromised agent node and you see every resource it can touch, every service it can call, and every other agent it can impersonate.

Security graphs from tools like Wiz accelerate this investigation by showing dependencies and blast radius immediately - what models are affected, what data they can access, and which identities or services are involved. What used to take hours of IAM policy auditing becomes a 30-second graph traversal.

A blast radius query for a compromised agent:

// Find everything reachable from a compromised agent
MATCH path = (agent:AIAgent {name: "deployment-agent"})-[:HAS_ROLE|ALLOWS|HAS_ACCESS*1..5]->(resource)
WHERE resource.type IN ['S3Bucket', 'RDSInstance', 'SecretManagerSecret', 'IAMRole']
RETURN resource.name, resource.type, resource.environment, length(path) as hops
ORDER BY hops ASC

This tells you the immediate access (1 hop), the access through assumed roles (2-3 hops), and the full transitive blast radius (4-5 hops). For a deployment agent that "only needs to push containers," you might discover it can also assume a role that reads production secrets three hops away.

Integrating Agent Nodes

Cartography lets you add custom nodes by writing a Python module that pulls from your agent management system. The structure is straightforward:

# cartography/intel/agents.py
def sync_ai_agents(neo4j_session, agents_data, update_tag):
    ingest_query = """
    UNWIND $Agents AS agent
        MERGE (a:AIAgent {id: agent.id})
        ON CREATE SET a.firstseen = timestamp()
        SET a.name = agent.name,
            a.model = agent.model,
            a.environment = agent.environment,
            a.last_used = agent.last_used,
            a.lastupdated = $UpdateTag
    """
    neo4j_session.run(ingest_query, Agents=agents_data, UpdateTag=update_tag)

Then write the relationship sync to connect agents to their IAM roles, and those roles connect automatically to whatever AWS resources they can access via Cartography's existing IAM graph.

Practical Least-Privilege Enforcement

The graph makes it easy to find violations of the principle of least privilege. Run this query weekly:

// Agents with write access to production but last used > 30 days ago
MATCH (agent:AIAgent)-[:HAS_ROLE]->(role:AWSRole)-[:ALLOWS]->(resource)
WHERE resource.environment = 'production'
  AND 'write' IN role.permissions
  AND agent.last_used < datetime() - duration('P30D')
RETURN agent.name, role.arn, count(resource) as resource_count

Feed this into a ticket queue or a Slack alert. Agents with stale production write access are candidates for permission reduction - downgrade to read-only or revoke entirely.

The Databricks AI Security Framework (DASF v3.0) now explicitly calls out agentic AI permission management as a distinct control category, recommending identity privilege graphs as the primary mechanism for visibility.

Start Before You Scale

The time to set up permission graphing is before you have a problem. Adding it after a security incident means you are mapping access patterns while also fighting a fire.

A Cartography sync job takes about 2 hours to set up for a typical AWS account. The Neo4j instance can run on a small EC2 instance or locally for smaller deployments. Run it on a daily schedule and you have a continuously updated map of every agent's access across your cloud infrastructure.

Build the graph now while your agent count is small and the relationships are still understandable. At thirty agents with overlapping permissions, the graph will show you things a spreadsheet could never reveal.

Fazm is an open source macOS AI agent. Open source on GitHub.

More on This Topic

Related Posts