Evaluations

  • TOML Configuration Overview
    Complete reference for Agent CI's TOML configuration format. Learn how to define evaluations, configure targets, and set up test cases in .agentci/evals/ files.

  • Accuracy Evaluations
    Test agent outputs with exact matching, substring containment, regex patterns, semantic similarity, and schema validation. Essential for deterministic tasks and API responses.

  • Performance Evaluations
    Measure response latency, token usage, and resource consumption with configurable thresholds. Ensure your agents meet production performance and cost requirements.

  • Safety Evaluations
    Validate security against prompt injection, harmful content, SQL injection, PII exposure, and jailbreaking attempts. Built-in templates for common attack vectors.

  • Consistency Evaluations
    Test output variance and behavioral reliability across multiple runs. Ensure deterministic behavior and detect regressions in agent consistency over time.

  • LLM Evaluations
    Use LLM-as-judge methodology for quality assessment. Measure helpfulness, clarity, appropriateness, and completeness with configurable scoring criteria.