Installation

Install Evalytic with pip. Choose the extras you need.

Requirements

  • Python 3.10+ (tested on 3.10, 3.11, 3.12, 3.13)
  • pip (or any PEP 517 compatible installer)

Install Variants

Core

pip install evalytic

Includes the CLI, shared judge/provider support for LLM and VLM judges, Rich terminal output, JSON/HTML reports, and the core commands for visual, RAG, text, and agent evaluation. ~5 MB download.

With Generation

pip install "evalytic[generation]"

Adds fal-client for fal.ai image generation. Parel builtin generation uses Evalytic's core httpx dependency, so Parel-only benchmark runs do not require this extra.

With Metrics

pip install "evalytic[metrics]"

Adds CLIP Score, LPIPS, ArcFace, and NIMA local metrics for visual workflows. Once installed, CLIP (text2img), LPIPS (img2img), and NIMA are auto-enabled in evaly bench. ~2 GB download.

Large download: The [metrics] extra installs PyTorch and model weights. Use --no-metrics to disable local metrics after install.

With OCR

pip install "evalytic[ocr]"

Adds OCR accuracy scoring via pytesseract for text-in-image workflows. Requires Tesseract on the system (brew install tesseract on macOS).

With Embeddings

pip install "evalytic[embeddings]"

Adds sentence-transformers for local embeddings. Recommended for answer_relevancy and semantic_similarity so those metrics work locally without a separate embeddings API.

Everything

pip install "evalytic[all]"

Installs generation, metrics, OCR, and embeddings extras in one go.

Use-Case Matrix

WorkflowRecommended InstallNotes
fal.ai visual benchmark with evaly benchevalytic[generation]Adds fal-client for fal.ai model APIs.
Parel visual benchmark with evaly benchevalyticCore install is enough. Set PAREL_API_KEY.
Visual benchmark + local metricsevalytic[all]Includes generation, CLIP/LPIPS/NIMA/ArcFace, OCR, and embeddings.
RAG evaluation with local answer relevancyevalytic[embeddings]Recommended so answer_relevancy works locally.
Text evaluation with semantic similarityevalytic[embeddings]semantic_similarity uses embeddings; deterministic metrics stay in core.
Agent evaluationevalyticCore install is enough. Embeddings can improve goal_accuracy when an expected output is provided.

Virtual Environment

We recommend using a virtual environment:

python3 -m venv .venv
source .venv/bin/activate  # macOS/Linux
pip install evalytic

Verify Installation

evaly --help
evaly bench --help
evaly rag eval --help
evaly text eval --help
evaly agent eval --help
evaly compare --help

CLI Aliases

Two entry points are available:

  • evaly — primary command name
  • evalytic — full-name alias (same functionality)

Development Install

For contributing to Evalytic or running tests:

git clone https://github.com/evalytic/evalytic.git
cd evalytic
pip install -e ".[dev]"
pytest -v