Running Tests
Commands
Section titled “Commands”semtest provides four commands: run for executing tests, init for scaffolding a new setup, list for showing available model keys, and uninstall for printing removal instructions.
semtest run
Section titled “semtest run”Basic usage
Section titled “Basic usage”# Run all test files in the configured directorysemtest run
# Run specific test filessemtest run auth-middleware.spec.md api-routes.test.md
# Run all specs in a subdirectorysemtest run api/
# Run with full pathssemtest run semtests/auth-middleware.spec.mdWhen file or directory arguments are provided, they’re resolved against cwd first, then against the configured root directory.
CLI flags
Section titled “CLI flags”| Flag | Type | Default | Description |
|---|---|---|---|
--timestamp | boolean | false | Generate a timestamped copy of the Markdown report |
--include-passing | boolean | false | Include passing tests in the Markdown report |
--strict | boolean | false | Exit code 2 if validation issues are found |
--skip-validation | boolean | false | Skip post-run validation entirely |
--debug | boolean | false | Log raw LLM output to {output}/debug/ |
--timeout <ms> | number | 0 | Timeout per test in milliseconds (0 = no timeout) |
--junit | boolean | false | Generate JUnit XML report |
--tag <tags> | string | (none) | Comma-separated tag filter — only run tests with matching tags |
--repeat <n> | number | 1 | Run each test N times (stops on first failure) |
--bail | boolean | false | Stop after the first failing test file |
--maxfail <n> | number | (none) | Stop after N failing test files |
-t, --testNamePattern <pattern> | string | (none) | Regex filter on test name or file path |
--skip-permissions-if-possible | boolean | false | Skip tool permission prompts where supported |
--verbose | boolean | false | Show detailed per-test output |
Config file options
Section titled “Config file options”All CLI flags can also be set in semtest.config.ts:
import { defineConfig } from "@westopp/semtest";
export default defineConfig({ output: "semtest-results/", testMatch: ["**/*.spec.md", "**/*.test.md"], testPathIgnorePatterns: ["node_modules", "dist", ".git", "vendor"], llm: "claude-code-sonnet-4-6", strict: true, debug: true, timestamp: true, includePassing: false, timeout: 60000, junit: true, repeat: 1, bail: false, verbose: false, skipPermissionsIfPossible: true,});Flag precedence
Section titled “Flag precedence”CLI flags override config values, and frontmatter overrides both:
frontmatter > CLI flag > config file > schema defaultFor example, if the config has llm: "claude-code-sonnet-4-6" but a spec file has llm: gemini-2.5-pro in its frontmatter, that test uses Gemini. Frontmatter overrides apply to: llm, timeout, and skipPermissionsIfPossible.
Exit codes
Section titled “Exit codes”| Code | Meaning | When |
|---|---|---|
0 | Pass | All tests passed |
1 | Fail | At least one test failed (but no errors) |
2 | Error | LLM subprocess error, parse error, or --strict with validation issues |
Precedence: error (2) > fail (1) > pass (0)
Debug mode
Section titled “Debug mode”When --debug is enabled:
- A
debug/directory is created inside the output directory (cleared on each run) - For each test file, a JSON file is written containing all retry attempts
- Each attempt includes the raw
stdout,stderr, andexitCodefrom the LLM CLI
semtest run --debugThis is useful for diagnosing unexpected LLM responses or retry behaviour.
Timeout
Section titled “Timeout”When --timeout <ms> is set (or timeout in config or frontmatter), each LLM subprocess is given a time limit. If the subprocess exceeds the limit:
SIGTERMis sent to the process- After 5 seconds, if still running,
SIGKILLis sent - The test result is marked as an error with a timeout message
semtest run --timeout 60000 # 60 second timeout per testA timeout of 0 (the default) means no time limit.
Tag filtering
Section titled “Tag filtering”Use --tag to run only tests whose frontmatter tags match:
semtest run --tag api,criticalThis runs only tests that have at least one of the specified tags in their frontmatter. Tags in frontmatter can be YAML arrays or comma-separated strings.
Repeat and bail
Section titled “Repeat and bail”# Run each test 3 times to check for flakinesssemtest run --repeat 3
# Stop on first failuresemtest run --bail
# Stop after 3 failuressemtest run --maxfail 3--bail and --maxfail cannot be used together.
semtest init
Section titled “semtest init”Scaffolds a new semtest setup in the current directory:
semtest initThis creates:
| File | Content |
|---|---|
semtest.config.ts | Default config using defineConfig() with claude-code-sonnet-4-6 as the model |
If the config file already exists, it is skipped with a message. This command is safe to run in an existing project — it won’t overwrite anything.
semtest list
Section titled “semtest list”Displays all available model keys, grouped by tool:
semtest listOutput:
Claude Code claude-code-opus-4-6 claude-opus-4-6 claude-code-sonnet-4-6 claude-sonnet-4-6 ...
Gemini CLI gemini-2.5-pro gemini-2.5-pro gemini-2.5-flash gemini-2.5-flash ...Use --json for machine-readable output:
semtest list --jsonsemtest uninstall
Section titled “semtest uninstall”Prints instructions for removing semtest based on how it was installed:
semtest uninstall