LITMUX(1)

NAME

litmux⚡ Unit tests for AI. Test prompts, compare models, save money.

SYNOPSIS

$pip install litmux

INFO

96 stars
0 views

DESCRIPTION

⚡ Unit tests for AI. Test prompts, compare models, save money.

README

Litmux

Unit tests for AI. Test prompts, compare models, catch regressions.

pip install litmux && litmux init && litmux run

Why

Every team shipping AI features hits the same three problems:

  1. No testing standard. REST has Postman, frontends have Cypress. LLM calls have manual spot-checking.
  2. Prompt regression is invisible. A one-word change can silently break 15% of edge cases.
  3. Model selection is vibes. "We use GPT-4o because it's good" — but is it $15k/month better than Gemini Flash?

Litmux gives you a YAML config, pass/fail assertions, and a cost report. That's it.


Quick Start

pip install litmux

cp .env.example .env

Add at least one: OPENAI_API_KEY, ANTHROPIC_API_KEY, GOOGLE_API_KEY, HF_TOKEN

litmux init # scaffold a project litmux run # run tests against all configured models

No database, no cloud account, no Docker.


Core Commands

litmux run — unit tests for prompts

# litmux.yaml
models:
  - model: gpt-4o-mini
  - model: claude-haiku-4-5-20251001

tests:

  • name: summarize_earnings prompt: prompts/summarize.txt inputs: text: "Revenue grew 15% to $4.2 billion..." assert:
    • type: contains value: "revenue"
    • type: cost-less-than value: 0.01

litmux eval — bulk evaluation against datasets

evals:
  - name: ticket_classifier
    prompt: prompts/classify.txt
    dataset: datasets/support_tickets.csv
    input_mapping:
      ticket: text
    expected: expected_category
    assert:
      - type: json-valid
    judge:
      criteria: "Did the model correctly classify the ticket?"
      threshold: 7.0

litmux generate — AI-generated test datasets

litmux generate \
  --prompt prompts/classify.txt \
  --seed datasets/sample_tickets.csv \
  --n 50 \
  --output datasets/support_tickets.csv

litmux cost — cost projection across models

litmux cost --volume 50000

Finds the cheapest model that passes your tests.

litmux compare — side-by-side model outputs

litmux compare

Cloud (Optional, Free)

Sync results to a hosted dashboard for history, trends, and team visibility.

litmux login       # one-time browser auth
litmux run         # results auto-sync
litmux dashboard   # open app.litmux.dev

The CLI works fully offline. Cloud is opt-in.


Assertion Types

TypeDescription
containsOutput contains substring
not-containsOutput does not contain substring
regexOutput matches regex pattern
json-validOutput is valid JSON
json-schemaOutput has required JSON keys
cost-less-thanCost below threshold (USD)
latency-less-thanLatency below threshold (ms)
llm-judgeLLM scores output 1–10 against criteria

CI/CD

# .github/workflows/litmux.yml
- run: litmux run --ci
  env:
    OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

Configuration

models:
  - provider: openai | anthropic | google | huggingface
    model: string
    temperature: 0.0
    max_tokens: 1024

defaultTest: assert: - type: cost-less-than value: 0.01

tests:

  • name: string prompt: path/to/prompt.txt inputs: { variable: "value" } assert:
    • type: contains value: "expected"

evals:

  • name: string prompt: path/to/prompt.txt dataset: path/to/data.csv input_mapping: { prompt_var: csv_column } expected: csv_column assert: [...] judge: criteria: "..." threshold: 7.0

Environment Variables

VariablePurpose
OPENAI_API_KEYOpenAI models, LLM judge, dataset generation
ANTHROPIC_API_KEYAnthropic models
GOOGLE_API_KEYGoogle models
HF_TOKENHuggingFace models
LITMUX_NO_CACHESet to 1 to skip the response cache
LITMUX_API_URLOverride cloud API endpoint (default: https://api.litmux.dev)
LITMUX_API_URL_ALLOW_INSECURESet to 1 to allow non-HTTPS LITMUX_API_URL (local dev only)
LITMUX_DASHBOARD_URLOverride dashboard URL (default: https://app.litmux.dev)
LITMUX_JUDGE_MODELLLM model used for llm-judge assertions (default: gpt-4o-mini)
LITMUX_CLOUD_ENABLEDSet to 1 to opt in to Litmux Cloud (private beta)

All Commands

litmux run                    Run all tests
litmux run -t <name>          Run a specific test
litmux run --ci               CI output (markdown)
litmux eval                   Run all evals
litmux eval --limit 10        Evaluate first N rows
litmux generate ...           Generate a test dataset
litmux compare                Side-by-side model outputs
litmux cost -v 50000          Project monthly cost
litmux cache                  View / clear response cache
litmux init                   Scaffold a new project
litmux version                Show version

Cloud (private beta — join the waitlist at https://litmux.dev)

litmux login Authenticate with Litmux Cloud litmux logout Remove saved credentials litmux history Recent runs from cloud litmux dashboard Open the dashboard


Examples

See examples/ for three ready-to-run projects:

  • 01-quickstart — minimal single-model test
  • 02-multi-model — compare across providers
  • 03-generate-and-eval — AI-generated dataset + LLM judge

License

MIT

SEE ALSO

clihub4/18/2026LITMUX(1)