Configuration

Configure Evalytic with evalytic.toml, environment variables, and CLI flags.

Precedence Order

Configuration is resolved in this order (highest priority first):

  1. CLI flags--judge openai/gpt-5.2
  2. Environment variablesGEMINI_API_KEY=...
  3. .env file — auto-loaded from current directory
  4. evalytic.toml — project config file
  5. Defaults — built-in defaults

evalytic.toml

Create an evalytic.toml in your project root. The easiest way is the interactive wizard:

evaly init

Evalytic searches for config files in:

  1. ./evalytic.toml (current directory)
  2. ~/.evalytic/config.toml (user home)

Full example

# evalytic.toml

[keys]
fal = "fal_key_xxx"
gemini = "gemini_key_xxx"
openai = "sk-xxx"
anthropic = "sk-ant-xxx"

[bench]
judge = "gemini-2.5-flash"
concurrency = 4
dimensions = ["visual_quality", "prompt_adherence"]
image_size = "landscape_16_9"
seed = 42
output_dir = "./reports"

[bench.metrics]
clip_threshold = 0.18
clip_weight = 0.20
lpips_threshold = 0.40
lpips_weight = 0.20

[keys] Section

API keys defined here are set as environment variables when Evalytic loads:

Config KeyEnvironment VariableUsed By
falFAL_KEYfal.ai image generation
geminiGEMINI_API_KEYGemini judge
openaiOPENAI_API_KEYOpenAI judge
anthropicANTHROPIC_API_KEYAnthropic judge
Security: Don't commit evalytic.toml with API keys to version control. Add it to .gitignore, or use environment variables / .env instead.

[bench] Section

Default settings for the evaly bench command:

KeyTypeDefaultDescription
judgestring"gemini-2.5-flash"Default VLM judge (single mode)
judgesstring[]Multi-judge consensus mode (2-3 judges). Overrides judge when set.
modelsstring[]Default models for evaly bench (avoids -m flag)
promptsstringDefault prompts file path or inline prompt
concurrencyint4Max parallel generation requests
dimensionsstring[]autoDefault dimensions to score
image_sizestringDefault image size
seedintFixed seed for reproducibility
output_dirstringDefault output directory. Each run creates a timestamped subfolder with reports and error log.

[bench.metrics] Section

Thresholds and weights for local CLIP/LPIPS metrics:

KeyTypeDefaultDescription
clip_thresholdfloat0.18CLIP score flag threshold
clip_weightfloat0.20CLIP weight in overall score
lpips_thresholdfloat0.40LPIPS flag threshold
lpips_weightfloat0.20LPIPS weight in overall score

.env File

Evalytic auto-loads .env from the current directory using python-dotenv:

# .env
FAL_KEY=fal_key_xxx
GEMINI_API_KEY=gemini_key_xxx
OPENAI_API_KEY=sk-xxx
ANTHROPIC_API_KEY=sk-ant-xxx

Environment Variables

VariableDescription
FAL_KEYfal.ai API key for image generation
GEMINI_API_KEYGoogle Gemini API key for judge
OPENAI_API_KEYOpenAI API key for judge
ANTHROPIC_API_KEYAnthropic API key for judge

Example Configurations

Minimal (Gemini)

[keys]
fal = "fal_key_xxx"
gemini = "gemini_key_xxx"

CI/CD with GPT-5.2 judge

[bench]
judge = "openai/gpt-5.2"
concurrency = 2
dimensions = ["visual_quality", "prompt_adherence", "text_rendering"]

Local development with Ollama

[keys]
fal = "fal_key_xxx"

[bench]
judge = "ollama/qwen2.5-vl:7b"
seed = 42

Consensus mode (multi-judge)

# Consensus scoring: 2 primary judges + optional tiebreaker
[keys]
fal = "fal_key_xxx"
gemini = "gemini_key_xxx"
openai = "sk-xxx"

[bench]
judges = ["gemini-2.5-flash", "gpt-5.2"]

# Or with explicit tiebreaker (3rd judge)
# judges = ["gemini-2.5-flash", "gpt-5.2", "claude-haiku-4-5"]

Default models and prompts

# Saves you from typing -m and -p every time
[keys]
fal = "fal_key_xxx"
gemini = "gemini_key_xxx"

[bench]
models = ["flux-schnell", "flux-dev"]
prompts = "prompts.json"

With this config, evaly bench -y is all you need — models and prompts are loaded from the config file.

Inspect Configuration

Use evaly config show to see the active configuration, which keys are loaded, and where they came from:

evaly config show