Pytest Integration
Turn any Evalytic metric into a pytest assertion.
The evalytic.testing module exposes two helpers, assert_test and
assert_metric, that run the appropriate Evalytic runner for a given test case and fail
the pytest assertion when any score falls below its threshold. The semantics match
evaly gate --metric-threshold, so the same quality bar applies in both local unit tests and CI gates.
Quickstart
# tests/test_rag_quality.py
from evalytic.testing import assert_test
from evalytic.text.types import RAGTestCase, RetrievedChunk
def test_rag_quality():
case = RAGTestCase(
query="What does Evalytic evaluate?",
response="Evalytic evaluates images, text, RAG, and agent runs.",
contexts=[RetrievedChunk(text="Evalytic is an evaluation SDK for AI outputs.")],
)
assert_test(case, metrics={
"faithfulness": 0.8,
"hallucination": 0.9,
"contextual_relevancy": 0.75,
})
Run it like any other pytest test: pytest tests/test_rag_quality.py -v. When any metric
falls below its threshold, pytest prints the failing metric ids, their actual scores, and the
thresholds that were missed. All failures are reported together so one run surfaces every regression.
Supported Test Cases
| Test case | Runner | Typical metrics |
|---|---|---|
RAGTestCase |
evaluate_rag |
faithfulness, answer_relevancy, contextual_relevancy, hallucination, context_precision, context_recall |
TextTestCase |
evaluate_text |
factual_correctness, semantic_similarity, g_eval, bleu, rouge, exact_match, levenshtein, string_presence |
AgentTestCase |
evaluate_agent |
tool_call_accuracy, goal_accuracy, step_efficiency |
Single Metric Assertions
Use assert_metric when you only care about one score:
from evalytic.testing import assert_metric
def test_no_hallucinations(rag_case):
assert_metric(rag_case, "hallucination", threshold=0.95)
Judge Overrides and Consensus
Both helpers forward the same judge options that the CLI and Python runners accept. Use a custom judge, a consensus panel, or a local judge endpoint without leaving your test:
assert_test(
case,
metrics={"faithfulness": 0.85},
judges=["gemini-2.5-flash", "gpt-5.2", "claude-sonnet-4-6"],
)
Inspecting the Report
assert_test returns the underlying MetricEvalReport after a passing run.
Use it to make additional assertions or to attach the full report as CI artifact:
report = assert_test(case, metrics={"faithfulness": 0.8})
assert report.metric_averages()["faithfulness"] >= 0.9, "stretch goal"
assert_test is a plain Python helper that
raises AssertionError. Pytest's built-in assertion reporting handles the rest, so there
are no entry points to install or configure.