Cloudanix Joins AWS ISV Accelerate Program

IAM, AI, and Cloud Security: Unlocking Scale and Battling New Threats with Stephen Kuenzli

Stephen Kuenzli of k9 Security explains how to scale IAM with self-serve patterns, why literal least privilege fails, and how AI agents with MCP servers transform security operations.

Who can delete the production database? That question — unanswered in most organizations — is what drove Stephen Kuenzli to dedicate his career to solving IAM at scale. A decade later, the problem has only grown more complex as AI agents, MCP servers, and vibe coding multiply the volume of access decisions organizations must make.

We spoke with Stephen Kuenzli, founder of k9 Security and author of Effective IAM for AWS, on the Scale to Zero podcast. Stephen helps cloud teams scale governance within their existing workflows, bringing deep expertise from years of leading cloud migrations and solving the hardest IAM challenges at scale.

You can read the complete transcript of the episode here >

What are the biggest misconceptions about scaling IAM?

Stephen identifies two critical misconceptions that prevent organizations from scaling IAM effectively:

Misconception 1: Centralized security teams can be in the delivery path without blocking teams. The scale factors are wrong. There are roughly 50 application engineers to every one cloud security specialist — who may not even sit in the security team. If that specialist is responsible for reviewing 2–5 changes per day and does not respond within four business hours, they block delivery, create stress, and slow the entire organization.

Misconception 2: Being too literal about least privilege. Leaders often aspire to least privilege without defining what it means at scale. AWS alone has over 17,000 individual permissions. Attempting to “artisanally craft” bespoke policies for each principal and resource is humanly impossible. Stephen calls this “CodeGolf” — removing every last permission — and argues it misses the point.

The practical alternative: use coarser abstractions like “can this principal administer the resource, read data, write data, or delete data?” You can achieve 90% of the risk reduction with these higher-level controls, then play CodeGolf only with your most critical resources.

How do you make IAM self-serve without losing control?

Stephen’s approach: 95% of security changes should be self-serve from pre-built, vetted components. Security specialists should not be in the operational path of reviewing changes before production.

The implementation pattern:

  • Build reference architectures on sound security patterns. For example, implement a data perimeter where each application gets its own KMS key. All application data is encrypted with that key, and access is controlled via key policy — even when multiple applications share an account.
  • Codify these patterns as infrastructure-as-code libraries. Application teams express intent (“I want S3 and DynamoDB”) and the underlying components handle encryption, key management, and policy generation automatically.
  • Use policy generators (Stephen’s are freely available on GitHub for CDK and Terraform) that translate human-readable intent into correct, least-privilege policies. It took an expert weeks to get these policies right once — now they are reusable components.
  • Create a security guild or center of excellence where practitioners collaborate, discuss gaps, and share solutions. When something is not available as a pre-built component, the guild provides transparent planning — “this is a two-week effort, not a two-hour one.”

What is the first thing to check in any AWS account?

Stephen’s immediate focus when working with clients: who has IAM administrative access? Specifically, who can create roles, change policies, or detach policies?

The consistent finding: 3–5 excess IAM administrators in production accounts. Not from malicious intent — from incidents where permissions were granted to “get things working again” and never revoked.

The danger scenario: Application A runs on the edge taking internet traffic. It has IAM admin privileges on an ECS cluster shared with Application B. If Application A is breached, the attacker can pivot to all of Application B’s data — even though Application B itself was never directly compromised.

Other critical red flags:

  • Stale API access keys that have existed for months or years. These lead to breaches. Move to IAM roles (via identity provider SSO for people, instance roles for applications) or at minimum implement key rotation.
  • Applications running with permissions they do not need. Test with the same permissions you will have in production — your security policies are as much a part of the application definition as the code itself.

How are AI agents and MCP servers changing security operations?

Stephen sees a fundamental shift: the cost of analyzing incoming security data is about to drop precipitously. AI agents with access to MCP servers can integrate previously siloed data sources — Security Hub, JIRA, CSPMs — into cohesive stories.

What becomes possible:

  • Automated research and analysis. Agents pull findings from Security Hub, deduplicate them, cross-reference with JIRA tickets, and prepare a cohesive package for human decision-making.
  • Intelligent routing. If an agent knows that security issues in a specific account were recently resolved by a particular team, it can route new issues to that team automatically. The bar is low — anything better than fully manual is a win.
  • Decision support with reasoning. LLMs bring reasoning capability that was missing from software for decades. They can apply severity labels consistently based on context, reducing dependence on individual analyst judgment.
  • Risk quantification. With access to private information (firmographics, revenue, industry context), agents may help quantify risk in meaningful ways — building literal business cases for prioritizing fixes.

Stephen built an MCP server for Security Hub in about four hours. The actual code was trivial — most time was spent on configuration. When he asked Claude to identify the most critical issues, it did a “very good job” of filtering and prioritizing.

What challenges does AI create for security teams?

Vibe coding and AI-generated code increase the volume of changes flowing through delivery pipelines. Stephen’s advice:

  • Examine what happens at 2x, 3x, 5x volume. If your security processes have manual steps or limited scalability, they will break under increased throughput.
  • Do not take shortcuts on SDLC. Organization goals and policies for secure delivery did not change just because code is generated faster. The same reviews, checks, and delivery processes apply.
  • POC new AI security tools carefully. There will be a wave of agents that review PRs and find issues. Start with one team, validate signal-to-noise ratio, then expand to three teams, then ten. Have security specialists feel the pain of low signal alongside application engineers.
  • Build empathy into tool rollout. Do not dump AI-generated findings into JIRA and walk away. Be responsible for the quality of what you are routing to application teams.

The opportunity is enormous, but it requires the same disciplined, incremental approach that works for any security capability: start small, measure, iterate, scale.

Ready to see your graph?

Connect a cloud account in under 30 minutes. See every finding rooted in identity, asset, and blast radius — with a fix path attached.

Book a Demo

Blog

Read More Posts

Your Trusted Partner in Data Protection with Cutting-Edge Solutions for
Comprehensive Data Security.

Wednesday, Apr 29, 2026

Code Security Best Practices for DevSecOps Teams in 2026

In 2026, the speed of software development has reached a point where traditional security methods can no longer keep up.

Read More

Wednesday, Apr 29, 2026

Integrating Security into Every Stage: A Blueprint for Secure Software Development

The escalating frequency and severity of software vulnerabilities exploited in the wild forced a paradigm shift in how a

Read More

Tuesday, Apr 14, 2026

Top 15 Cloud Misconfigurations in 2026 - How to Fix Them?

Most cloud breaches today are not the result of sophisticated zero-day exploits. They are the result of misconfiguration

Read More