Test Your Agents Like You Test Your Code
Your agent works great in development. But will it work in production? With Agent CI, you can find out before you deploy.
In 5 minutes, you'll have automated testing running on every pull request. Here's how:
Step 1: Connect Your Repository
Install Agent CI on Your GitHub Organization
- Go to agent-ci.com and click "Login with GitHub"
- Authorize Agent CI to verify your GitHub identity (you'll see a GitHub authorization page)
- Install the Agent CI GitHub App:
- You'll be redirected to GitHub's app installation page
- Choose your organization from the dropdown (only organization owners can install apps)
- Select repository access:
- Choose "All repositories" for organization-wide access
- Or choose "Only select repositories" and pick specific repos
- Click "Install" to complete the GitHub App installation
Note: Only organization owners can install GitHub Apps. If you're not an owner, ask an organization owner to complete this step.
Set Up Agent CI
- Create your organization in Agent CI (you'll be redirected back after GitHub installation)
- Click "Create New Application" and select your repository from the dropdown
- Confirm permissions - Agent CI will show you exactly what access it has to your selected repositories
That's it. Agent CI is now monitoring your repository for pull requests and can run evaluations.
Step 2: Write Your First Test
Create a folder called .agentci in your repository root and add this file:
.agentci/basic-test.toml
[eval]
description = "Basic agent test"
type = "accuracy"
targets.agents = ["*"]
[[eval.cases]]
prompt = "What is 2 + 2?"
output = "4"
[[eval.cases]]
prompt = "Hello!"
output = "{{*}}" # Matches any response
Commit and push this file. That's your first agent evaluation!
Step 3: See It Work
Open a pull request with your new .agentci/basic-test.toml file. Within seconds, you'll see:
✅ Agent CI: Evaluation complete
└── ✅ basic-test (2/2 test cases passed)
That's it! You now have automated agent testing on every pull request.
Why This Works So Well
Configuration files in Git = Separation of concerns
- Your agent code stays clean
- Tests are versioned with your code
- Changes are auditable and reviewable
- Teams can collaborate on test criteria
Zero setup, maximum power
- No servers to manage
- No complex deployment configurations
- Automatic environment creation
- Native GitHub integration
Level Up: More Evaluation Types
Once you're comfortable with basic accuracy tests, try these:
Performance testing:
[eval]
type = "performance"
[eval.thresholds]
max_latency_ms = 2000
Safety checks:
[eval]
type = "safety"
[eval.templates]
prompt_injection = true
harmful_content = true
Consistency verification:
[eval]
type = "consistency"
iterations = 5 # Run 5 times, check variance
Next Steps
Add more tests as you need them:
- Performance thresholds
- Safety checks against prompt injection
- Consistency across multiple runs
Invite your team:
- Everyone can see evaluation results on PRs
- Changes to tests go through code review
- Full audit trail of who changed what
Scale with confidence:
- Production monitoring runs automatically
- Performance trends tracked over time
- Regressions caught before users see them
The 5-Minute Foundation
In just 5 minutes, you've built something powerful:
- Automated testing for AI agents
- Confidence in your deployments
- A foundation that scales with your team
Your agents are no longer black boxes. They're systematically tested, continuously monitored, and ready for production.