CLI Reference
Global Flags
| Flag | Description |
|---|---|
-v, --verbose | Enable debug-level logging |
skival run
Execute an eval suite.
skival run <suite.yaml> [flags]Flags
| Flag | Default | Description |
|---|---|---|
--samples N | 1 | Number of runs per treatment |
-p, --parallel N | 0 | Max concurrent samples (0 or 1 = sequential) |
--results-dir <path> | Save results to disk for later reporting | |
--treatments <names> | Comma-separated list of treatment names to run | |
--evals <ids> | Comma-separated list of eval IDs to run | |
--format <type> | markdown | Output format: markdown, json, or html |
--timeout <secs> | Timeout in seconds for all evals (overrides suite/eval-level timeouts) |
Examples
Run all evals:
skival run suite.yamlRun specific evals with 5 samples:
skival run suite.yaml --evals fizzbuzz,sorting --samples 5Run only the control treatment and save results:
skival run suite.yaml --treatments control --results-dir ./resultsRun with 4 concurrent samples:
skival run suite.yaml --samples 10 --parallel 4Override timeout to 2 minutes for all evals:
skival run suite.yaml --timeout 120skival report
Regenerate reports from previously saved results.
skival report <results-dir> [flags]Flags
| Flag | Default | Description |
|---|---|---|
--format <type> | markdown | Output format: markdown, json, or html |
Example
skival report ./results --format jsonskival validate
Parse and validate a suite file without executing it. Reports the suite structure including version, description, eval count, and treatment configuration.
skival validate <suite.yaml>Example
skival validate suite.yamlOutput:
Suite is valid
Version: 1
Description: My eval suite
Evals: 3
...skival compare
Compare two result directories and produce a diff report showing how treatments changed between runs. Useful for seeing what improved, regressed, or stayed the same after tweaking skills or prompts.
skival compare <baseline-dir> <candidate-dir> [flags]Flags
| Flag | Default | Description |
|---|---|---|
--format <type> | markdown | Output format: markdown, json, or html |
Examples
Compare two runs:
skival compare results/run-1 results/run-2Output as JSON for programmatic consumption:
skival compare results/run-1 results/run-2 --format jsonOutput
The report shows per-eval, per-treatment deltas for:
- Pass rate — percentage point change (e.g.
+50pp ↑) - Median cost — absolute USD and percentage change (e.g.
-$0.0200 (-40.0%) ↓) - Median duration — absolute and percentage change (e.g.
-2.0s (-20.0%) ↓)
Treatments that exist in only one run are labeled added or removed rather than causing errors.
Returns a non-zero exit code if either directory is missing or invalid.
skival version
Print the current skival version.
skival version