Brad Geesaman, Principal Security Researcher at Ghost Security, brings deep expertise in cloud security, Kubernetes, and application security. As the creator of ReaperBot—an agentic AI system for web application security testing—Brad is at the forefront of exploring how AI can augment security workflows.
In this episode, Brad shares insights on the evolution from traditional automation to agentic AI, the challenges of non-deterministic systems, and practical strategies for adopting AI in AppSec. His perspective balances enthusiasm for AI’s potential with pragmatic caution about its limitations.
You can read the complete transcript of the episode here >
How is AI changing the secure coding paradigm?
AI is both lowering the barrier to entry for writing code and exacerbating existing challenges. Brad prefers the term “augmented code” or “co-pilot code” over “vibe coding”—it’s about getting help to structure projects, set up build systems, and overcome initial hurdles.
The pros:
- Lower barrier to entry: Beginners can get started faster with boilerplate generation
- Extended flow state: Developers stay in creative mode longer when AI handles repetitive tasks
- Faster iteration: Generate test harnesses and use cases at 80-90% completion
The challenges:
- More code, more developers: More code being shipped faster means more to validate and secure
- Non-programmers coding: People without programming backgrounds can now generate and deploy code
- Increased security surface: Security teams must secure code from both expert and novice developers
The key insight: AI helps developers stay in flow state by automating boilerplate, but it also means security teams face a larger volume of code to assess and secure.
Why is standardization critical for agentic AI in AppSec?
Brad’s experience building ReaperBot highlighted a painful reality: 80% of development time went to building integrations between tools and frameworks, not the core functionality.
The problem with custom integrations:
- Not interoperable: Highly specific to one model and framework
- Not portable: Can’t easily switch models or let others use different frameworks
- Duplicated effort: Every team rebuilds the same integration layer
Model Context Protocol (MCP) as the Solution
MCP puts an abstraction layer between LLMs and external services, creating a standard interface. This decouples consumers (LLMs) from producers (tools and services).
Benefits:
- Vendor flexibility: Produce an MCP endpoint once, consume from any LLM
- Minimal friction: Standard discovery and tool usage across frameworks
- Marketplace potential: Enables interoperability and service models
Current challenges:
- Security and IAM: Once MCP servers aren’t running locally, identity and access management become critical
- Standardization risk: The danger of creating “the 16th standard” instead of true convergence
Brad hopes the industry coalesces around MCP rather than fragmenting into competing standards, as this would unlock significant interoperability and marketplace opportunities.
What does “using AI to secure AI” actually mean?
Brad frames this through the lens of scale and determinism:
Traditional systems: Deterministic inputs → deterministic outputs, manageable at human scale
AI-amplified systems: Variable inputs → variable outputs, operating at superhuman scale
The challenge: How do you validate that inputs are appropriate and outputs are safe when operating at speeds beyond human capacity?
The solution: Use LLMs to judge LLMs.
Common patterns include:
- LLM as judge: One model evaluates another’s outputs
- Adversarial LLMs: Multiple models with different biases agree or disagree
- Odd-number voting: Three or five models vote on correctness
Is it perfect? No. But it’s the only way to keep up with the scale and nuance of AI-generated content without millions of human reviewers.
How can non-deterministic AI systems be considered reliable?
This is a fundamental question for AI adoption in security. Brad’s answer: It depends on your tolerance and use case.
Reliability is always on a spectrum—rarely 100%, even with traditional systems. The key questions:
- What’s your tolerance for reliability?
- Does your AI setup land within that tolerance window?
When AI Makes Sense
- Variable inputs with variable outputs: Summarizing 10,000 words to 100 words
- Nuanced classification: Bucketing environments as dev, staging, or prod based on context
- Scale beyond human capacity: Processing thousands of decisions per day
When AI Doesn’t Make Sense
- Deterministic requirements: If you have deterministic inputs, use deterministic methods
- Extremely high precision needs: Getting to near-perfect accuracy requires significant time and money
- Undefined tolerance: If you expect perfection, AI is a non-starter
Brad’s advice: Define what you’re able to tolerate, then shoot for that. Don’t aim for perfection—you’ll never get there.
How should organizations adopt agentic AI for AppSec?
Brad strongly advises against starting with agentic AI. Instead, build into it progressively.
Start with Workflows
Think of two types of AI use:
- Defined workflows: Step-by-step processes with clear decision points
- Agentic AI: Self-led, non-deterministic systems that adapt to variable inputs
The progression:
- Break down workflows: Like making a PB&J sandwich—get bread, get peanut butter, open jar, get knife
- Identify AI-suitable steps: Find specific steps best suited for LLMs (e.g., classification tasks)
- Use deterministic methods elsewhere: Feed inputs reliably into the AI step, use traditional methods for everything else
- Add agents sparingly: Only when a step has multiple pathways that can’t be deterministically solved
When to Use Agents
Agents make sense when:
- A workflow step has 3-5 possible pathways
- You can’t deterministically decide which path to take
- You can define the goal clearly
- You have tools the agent can use to reach that goal
Think of it as dropping in a junior team member with 1,000 examples of similar tasks. They iterate toward the goal using available tools.
What is ReaperBot and how does it work?
ReaperBot is a team of agents that interfaces with Reaper, Ghost Security’s API-driven proxy tool. It automates the workflow from “find live hosts” to “identify broken object level authorization (BOLA) vulnerabilities.”
Architecture
Orchestrator agent: The strongest model (O3-mini) that takes user input, breaks it into steps, and delegates to sub-agents
Sub-agents:
- Discoverer agent: Handles benign tasks (domain lookup, host probing, subdomain finding)
- Tester agent: Tests specific endpoints for BOLA vulnerabilities
- BOLA agent: Analyzes parameters and tests variations (e.g., guessing different account IDs)
The Workflow
- User provides a goal (e.g., “test ghostbank.net for BOLA”)
- Orchestrator breaks it down into steps
- Discoverer finds live hosts and catalogs requests
- Tester identifies candidates for BOLA testing
- BOLA agent tests variations and produces a report
The system iterates, corrects errors, and walks itself through the plan—just like a human pen tester would, but automated.
Purpose
ReaperBot is experimental and educational:
- Show, don’t tell: Demonstrate what’s possible with agentic AI
- Conversation starter: Help teams understand where automation makes sense
- R&D platform: Explore what Ghost Security will build into their platform
It’s open source and designed to be readable—written in Python with clear prompts so others can learn and extend it.
What are the advantages and disadvantages of manual vs. automated testing?
Manual Testing
Pros:
- Absolute control over actions, order, and results
- Expertise-driven decision making
- Ability to adapt to unexpected findings
Cons:
- Time-limited (e.g., 16 hours for two web apps)
- Doesn’t scale
- Repetitive tasks create toil
Automated/Agentic Testing
Pros:
- Handles boilerplate and table-stakes items
- Surfaces candidates for human review
- Operates at scale
Cons:
- Less deterministic
- May miss nuanced vulnerabilities
- Requires careful oversight
Brad’s approach: Use automation for discovery, fingerprinting, and enumeration. Use AI assistance for payload crafting and iteration. But maintain human control—don’t go full automated pen testing yet.
The goal is to surface candidates and keep humans in the flow, not replace human expertise entirely.
How do you build trust in agentic AI systems?
Trust must be earned, just like onboarding a new team member.
The Progression
- Start small: Automate one part of a workflow
- Audit everything: Transparency and logging are critical
- Evaluate performance: Run thousands of iterations to build confidence
- Expand gradually: Add more automation as trust builds
Think of it like adding a new SOC analyst:
- Start with tier-one alerts, not advanced persistent threats
- Review all their work initially
- Build trust through demonstrated accuracy over time
Living Systems
Agentic AI systems are living, breathing systems—not “set it and forget it”:
- Models change: New versions come out every few weeks
- Prompts evolve: Feedback loops require prompt adjustments
- Continuous validation: Keep the system in its performing window
The benefit: You can shift to newer, cheaper, faster models as they emerge—if you design for portability rather than tight coupling to one provider.
What elements of AppSec benefit least from AI?
Brad identifies two critical downsides:
1. Cost and Complexity
- Token costs: Running LLMs at scale isn’t free
- Complexity budget: AI adds complexity to workflows, not simplicity
- Trade-off: You’re trading human toil for system complexity and cost
The question: Is the reduction in human burnout worth the added complexity?
2. Loss of Craftsmanship
This is the less-discussed downside. If AI solves most problems and generates most code:
- Commoditization: Boilerplate becomes automatic, reducing the art of software
- Reduced curiosity: Why explore better ways if the LLM says “this is best”?
- Less precision: Technically correct implementations may not be the most clever or efficient
Brad’s recommendation: Use AI as an exoskeleton to empower you, not as a remote-controlled robot that removes the fun and challenge of building software.
What special considerations should operations teams have?
When choosing AI systems to eliminate toil, Brad emphasizes three key considerations:
1. Match the Problem to the Tool
Don’t do “LLMs for LLM’s sake” (like “Kubernetes for Kubernetes sake”). Use AI when:
- The problem operates at scale beyond human capacity
- Non-determinism is required to handle nuances
- Current tooling can’t handle all the edge cases
2. Define Acceptable Accuracy
Be clear about your tolerance:
- “Better than a human over 1,000 iterations” is achievable
- “Better than a human over 10 iterations” is much harder
- Set the bar clearly so everyone understands intent and purpose
3. Fix Downstream Processes
Critical insight: LLMs amplify weaknesses in existing processes.
If you speed up one step 100x, you’ll push strain onto the next part of your workflow. Example:
- Triage 100 critical findings → 50 tickets to dev team
- Add AI to triage all mediums → 300 more tickets to dev team
- Dev team is now overwhelmed (6x more work)
Brad’s rule: Spend at least half your budget on process improvements to support the AI-enhanced step. Don’t just slap AI on one part and call it done.