Test Your Agents Like You Test Your Code

Your agent works great in development. But will it work in production? With Agent CI, you can find out before you deploy.

In 5 minutes, you'll have automated testing running on every pull request. Here's how:

Step 1: Connect Your Repository

Install Agent CI on Your GitHub Organization

  1. Go to agent-ci.com and click "Login with GitHub"
  2. Authorize Agent CI to verify your GitHub identity (you'll see a GitHub authorization page)
  3. Install the Agent CI GitHub App:
    • You'll be redirected to GitHub's app installation page
    • Choose your organization from the dropdown (only organization owners can install apps)
    • Select repository access:
      • Choose "All repositories" for organization-wide access
      • Or choose "Only select repositories" and pick specific repos
    • Click "Install" to complete the GitHub App installation

Note: Only organization owners can install GitHub Apps. If you're not an owner, ask an organization owner to complete this step.

Set Up Agent CI

  1. Create your organization in Agent CI (you'll be redirected back after GitHub installation)
  2. Click "Create New Application" and select your repository from the dropdown
  3. Confirm permissions - Agent CI will show you exactly what access it has to your selected repositories

That's it. Agent CI is now monitoring your repository for pull requests and can run evaluations.

Step 2: Write Your First Test

Create a folder called .agentci in your repository root and add this file:

.agentci/basic-test.toml

[eval]
description = "Basic agent test"
type = "accuracy"
targets.agents = ["*"]

[[eval.cases]]
prompt = "What is 2 + 2?"
output = "4"

[[eval.cases]]
prompt = "Hello!"
output = "{{*}}"  # Matches any response

Commit and push this file. That's your first agent evaluation!

Step 3: See It Work

Open a pull request with your new .agentci/basic-test.toml file. Within seconds, you'll see:

✅ Agent CI: Evaluation complete
   └── ✅ basic-test (2/2 test cases passed)

That's it! You now have automated agent testing on every pull request.

Why This Works So Well

Configuration files in Git = Separation of concerns

  • Your agent code stays clean
  • Tests are versioned with your code
  • Changes are auditable and reviewable
  • Teams can collaborate on test criteria

Zero setup, maximum power

  • No servers to manage
  • No complex deployment configurations
  • Automatic environment creation
  • Native GitHub integration

Level Up: More Evaluation Types

Once you're comfortable with basic accuracy tests, try these:

Performance testing:

[eval]
type = "performance"
[eval.thresholds]
max_latency_ms = 2000

Safety checks:

[eval]
type = "safety"
[eval.templates]
prompt_injection = true
harmful_content = true

Consistency verification:

[eval]
type = "consistency"
iterations = 5  # Run 5 times, check variance

Next Steps

Add more tests as you need them:

  • Performance thresholds
  • Safety checks against prompt injection
  • Consistency across multiple runs

Invite your team:

  • Everyone can see evaluation results on PRs
  • Changes to tests go through code review
  • Full audit trail of who changed what

Scale with confidence:

  • Production monitoring runs automatically
  • Performance trends tracked over time
  • Regressions caught before users see them

The 5-Minute Foundation

In just 5 minutes, you've built something powerful:

  • Automated testing for AI agents
  • Confidence in your deployments
  • A foundation that scales with your team

Your agents are no longer black boxes. They're systematically tested, continuously monitored, and ready for production.