Configuration

Configure Evalytic with evalytic.toml, environment variables, and CLI flags.

Precedence Order

Configuration is resolved in this order (highest priority first):

CLI flags — --judge openai/gpt-5.2
Environment variables — GEMINI_API_KEY=...
.env file — auto-loaded from current directory
evalytic.toml — project config file
Defaults — built-in defaults

evalytic.toml

Create an evalytic.toml in your project root. The easiest way is the interactive wizard:

evaly init

Evalytic searches for config files in:

./evalytic.toml (current directory)
~/.evalytic/config.toml (user home)

Full example

# evalytic.toml

[keys]
fal = "fal_key_xxx"
parel = "parel_key_xxx"
gemini = "gemini_key_xxx"
openai = "sk-xxx"
anthropic = "sk-ant-xxx"

[bench]
judge = "gemini-2.5-flash"
concurrency = 4
dimensions = ["visual_quality", "prompt_adherence"]
image_size = "landscape_16_9"
seed = 42
output_dir = "./reports"

# Weight VLM dimensions (default: equal)
[bench.dimension_weights]
input_fidelity = 0.5
visual_quality = 0.1

[bench.metrics]
clip_threshold = 0.18
clip_weight = 0.20
clip_range = [0.20, 0.40]
lpips_threshold = 0.40
lpips_weight = 0.20
lpips_range = [0.40, 0.95]
face_range = [0.60, 0.95]

# Override model cost or settings
[bench.model_overrides.flux-kontext]
cost = 0.06

[bench.model_overrides.my-custom-model]
endpoint = "fal-ai/my-custom/v1"
pipeline = "img2img"
cost = 0.04
image_field = "image_urls"

# Register a custom Parel text-to-image model
[bench.model_overrides.my-parel-model]
provider = "parel"
endpoint = "my-image-model"
pipeline = "text2img"
cost = 0.02

[keys] Section

API keys defined here are set as environment variables when Evalytic loads:

Config Key	Environment Variable	Used By
`fal`	`FAL_KEY`	fal.ai image generation + `fal/*` judges
`parel`	`PAREL_API_KEY`	Parel image generation + `parel/*` judges
`gemini`	`GEMINI_API_KEY`	Gemini judge
`openai`	`OPENAI_API_KEY`	OpenAI judge
`anthropic`	`ANTHROPIC_API_KEY`	Anthropic judge

Security: Don't commit evalytic.toml with API keys to version control. Add it to .gitignore, or use environment variables / .env instead.

[bench] Section

Default settings for the evaly bench command:

Key	Type	Default	Description
`judge`	string	`"gemini-2.5-flash"`	Default VLM judge (single mode)
`judges`	string[]	—	Multi-judge consensus mode (2-3 judges). Overrides `judge` when set.
`models`	string[]	—	Default models for `evaly bench` (avoids `-m` flag)
`prompts`	string	—	Default prompts file path or inline prompt
`concurrency`	int	`4`	Max parallel generation requests
`dimensions`	string[]	auto	Default dimensions to score
`image_size`	string	—	Default image size
`seed`	int	—	Fixed seed for reproducibility
`output_dir`	string	—	Default output directory. Each run creates a timestamped subfolder with reports and error log.

[bench.dimension_weights] Section

Customize how VLM dimensions contribute to the overall score. By default all dimensions are weighted equally (1/n). When you specify weights, unspecified dimensions share the remaining weight equally. Weights are normalized to sum to 1.0.

# E-commerce: product shape matters most
[bench.dimension_weights]
input_fidelity = 0.5
visual_quality = 0.1
# Remaining 0.4 split equally among other active dimensions

You can also set dimension weights via CLI: --dim-weights '{"input_fidelity": 0.5}'. CLI flags override toml values.

[bench.metrics] Section

Thresholds, weights, and normalize ranges for local metrics. Sharpness is always available (no torch required); CLIP/LPIPS/face require evalytic[metrics].

Key	Type	Default	Description
`clip_threshold`	float	`0.18`	CLIP score flag threshold
`clip_weight`	float	`0.20`	CLIP weight in overall score
`clip_range`	float[2]	`[0.18, 0.35]`	CLIP normalize range [min, max] for mapping to 0–5
`lpips_threshold`	float	`0.40`	LPIPS flag threshold
`lpips_weight`	float	`0.20`	LPIPS weight in overall score
`lpips_range`	float[2]	`[0.40, 0.95]`	LPIPS normalize range [min, max]
`face_range`	float[2]	`[0.60, 0.95]`	Face similarity normalize range [min, max]

[bench.model_overrides] Section

Override cost or settings for any model. Useful when provider prices change or you're using a custom endpoint. Overrides take priority over both the built-in registry and fal.ai auto-detected pricing.

# Override cost for an existing model
[bench.model_overrides.flux-kontext]
cost = 0.06

# Register a custom model
[bench.model_overrides.my-custom-model]
endpoint = "fal-ai/my-custom/v1"
pipeline = "img2img"
cost = 0.04
image_field = "image_urls"

Key	Type	Description
`provider`	string	`"fal"` (default) or `"parel"`
`endpoint`	string	Provider endpoint or model ID
`pipeline`	string	`"text2img"` or `"img2img"`
`cost`	float	USD per image (overrides auto-detect)
`image_field`	string	`"image_url"` or `"image_urls"`

Cost priority: model_overrides > fal.ai live pricing (auto-detected for fal.ai only) > built-in registry defaults. Parel builtins use registry cost values. Run evaly bench --list-models to see current prices.

.env File

Evalytic auto-loads .env from the current directory using python-dotenv:

# .env
FAL_KEY=fal_key_xxx
PAREL_API_KEY=parel_key_xxx
GEMINI_API_KEY=gemini_key_xxx
OPENAI_API_KEY=sk-xxx
ANTHROPIC_API_KEY=sk-ant-xxx

Environment Variables

Variable	Description
`FAL_KEY`	fal.ai API key for image generation + `fal/*` judges
`PAREL_API_KEY`	Parel API key for image generation + `parel/*` judges
`PAREL_BASE_URL`	Optional Parel API base URL override. Defaults to `https://api.parel.cloud`.
`GEMINI_API_KEY`	Google Gemini API key for judge
`OPENAI_API_KEY`	OpenAI API key for judge
`ANTHROPIC_API_KEY`	Anthropic API key for judge

Example Configurations

Single key (fal.ai only)

# One key for both generation and judging
[keys]
fal = "fal_key_xxx"

[bench]
judge = "fal/gemini-2.5-flash"

Single key (Parel only)

# One key for Parel generation and judging
[keys]
parel = "parel_key_xxx"

[bench]
models = ["parel/flux-schnell", "parel/gpt-image-1.5"]
judge = "parel/gpt-5.4"

Two keys (fal.ai + Gemini)

[keys]
fal = "fal_key_xxx"
gemini = "gemini_key_xxx"

CI/CD with GPT-5.2 judge

[bench]
judge = "openai/gpt-5.2"
concurrency = 2
dimensions = ["visual_quality", "prompt_adherence", "text_rendering"]

Local development with Ollama

[keys]
fal = "fal_key_xxx"

[bench]
judge = "ollama/qwen2.5-vl:7b"
seed = 42

Consensus mode (multi-judge)

# Consensus via fal.ai — single key, multiple judges
[keys]
fal = "fal_key_xxx"

[bench]
judges = ["fal/gemini-2.5-flash", "fal/gpt-5.2"]

# Or with separate API keys per provider
# [keys]
# gemini = "gemini_key_xxx"
# openai = "sk-xxx"
# [bench]
# judges = ["gemini-2.5-flash", "gpt-5.2"]

Default models and prompts

# Saves you from typing -m and -p every time
[keys]
fal = "fal_key_xxx"
gemini = "gemini_key_xxx"

[bench]
models = ["flux-schnell", "flux-dev"]
prompts = "prompts.json"

With this config, evaly bench -y is all you need — models and prompts are loaded from the config file.

Inspect Configuration

Use evaly config show to see the active configuration, which keys are loaded, and where they came from:

evaly config show

Installation evaly bench