Pytest Integration

Turn any Evalytic metric into a pytest assertion.

The evalytic.testing module exposes two helpers, assert_test and assert_metric, that run the appropriate Evalytic runner for a given test case and fail the pytest assertion when any score falls below its threshold. The semantics match evaly gate --metric-threshold, so the same quality bar applies in both local unit tests and CI gates.

Quickstart

# tests/test_rag_quality.py
from evalytic.testing import assert_test
from evalytic.text.types import RAGTestCase, RetrievedChunk


def test_rag_quality():
    case = RAGTestCase(
        query="What does Evalytic evaluate?",
        response="Evalytic evaluates images, text, RAG, and agent runs.",
        contexts=[RetrievedChunk(text="Evalytic is an evaluation SDK for AI outputs.")],
    )
    assert_test(case, metrics={
        "faithfulness": 0.8,
        "hallucination": 0.9,
        "contextual_relevancy": 0.75,
    })

Run it like any other pytest test: pytest tests/test_rag_quality.py -v. When any metric falls below its threshold, pytest prints the failing metric ids, their actual scores, and the thresholds that were missed. All failures are reported together so one run surfaces every regression.

Supported Test Cases

Test case	Runner	Typical metrics
`RAGTestCase`	`evaluate_rag`	`faithfulness`, `answer_relevancy`, `contextual_relevancy`, `hallucination`, `context_precision`, `context_recall`
`TextTestCase`	`evaluate_text`	`factual_correctness`, `semantic_similarity`, `g_eval`, `bleu`, `rouge`, `exact_match`, `levenshtein`, `string_presence`
`AgentTestCase`	`evaluate_agent`	`tool_call_accuracy`, `goal_accuracy`, `step_efficiency`

Single Metric Assertions

Use assert_metric when you only care about one score:

from evalytic.testing import assert_metric

def test_no_hallucinations(rag_case):
    assert_metric(rag_case, "hallucination", threshold=0.95)

Judge Overrides and Consensus

Both helpers forward the same judge options that the CLI and Python runners accept. Use a custom judge, a consensus panel, or a local judge endpoint without leaving your test:

assert_test(
    case,
    metrics={"faithfulness": 0.85},
    judges=["gemini-2.5-flash", "gpt-5.2", "claude-sonnet-4-6"],
)

Inspecting the Report

assert_test returns the underlying MetricEvalReport after a passing run. Use it to make additional assertions or to attach the full report as CI artifact:

report = assert_test(case, metrics={"faithfulness": 0.8})
assert report.metric_averages()["faithfulness"] >= 0.9, "stretch goal"

No pytest plugin required: assert_test is a plain Python helper that raises AssertionError. Pytest's built-in assertion reporting handles the rest, so there are no entry points to install or configure.

Reports Flagship Model Benchmark