Running Tests

Commands

semtest provides four commands: run for executing tests, init for scaffolding a new setup, list for showing available model keys, and uninstall for printing removal instructions.

`semtest run`

Basic usage

# Run all test files in the configured directory
semtest run

# Run specific test files
semtest run auth-middleware.spec.md api-routes.test.md

# Run all specs in a subdirectory
semtest run api/

# Run with full paths
semtest run semtests/auth-middleware.spec.md

When file or directory arguments are provided, they’re resolved against cwd first, then against the configured root directory.

CLI flags

Flag	Type	Default	Description
`--timestamp`	boolean	`false`	Generate a timestamped copy of the Markdown report
`--include-passing`	boolean	`false`	Include passing tests in the Markdown report
`--strict`	boolean	`false`	Exit code 2 if validation issues are found
`--skip-validation`	boolean	`false`	Skip post-run validation entirely
`--debug`	boolean	`false`	Log raw LLM output to `{output}/debug/`
`--timeout <ms>`	number	`0`	Timeout per test in milliseconds (0 = no timeout)
`--junit`	boolean	`false`	Generate JUnit XML report
`--tag <tags>`	string	(none)	Comma-separated tag filter — only run tests with matching tags
`--repeat <n>`	number	`1`	Run each test N times (stops on first failure)
`--bail`	boolean	`false`	Stop after the first failing test file
`--maxfail <n>`	number	(none)	Stop after N failing test files
`-t, --testNamePattern <pattern>`	string	(none)	Regex filter on test name or file path
`--skip-permissions-if-possible`	boolean	`false`	Skip tool permission prompts where supported
`--verbose`	boolean	`false`	Show detailed per-test output

Config file options

All CLI flags can also be set in semtest.config.ts:

import { defineConfig } from "@westopp/semtest";

export default defineConfig({
  output: "semtest-results/",
  testMatch: ["**/*.spec.md", "**/*.test.md"],
  testPathIgnorePatterns: ["node_modules", "dist", ".git", "vendor"],
  llm: "claude-code-sonnet-4-6",
  strict: true,
  debug: true,
  timestamp: true,
  includePassing: false,
  timeout: 60000,
  junit: true,
  repeat: 1,
  bail: false,
  verbose: false,
  skipPermissionsIfPossible: true,
});

Flag precedence

CLI flags override config values, and frontmatter overrides both:

frontmatter > CLI flag > config file > schema default

For example, if the config has llm: "claude-code-sonnet-4-6" but a spec file has llm: gemini-2.5-pro in its frontmatter, that test uses Gemini. Frontmatter overrides apply to: llm, timeout, and skipPermissionsIfPossible.

Exit codes

Code	Meaning	When
`0`	Pass	All tests passed
`1`	Fail	At least one test failed (but no errors)
`2`	Error	LLM subprocess error, parse error, or `--strict` with validation issues

Precedence: error (2) > fail (1) > pass (0)

Debug mode

When --debug is enabled:

A debug/ directory is created inside the output directory (cleared on each run)
For each test file, a JSON file is written containing all retry attempts
Each attempt includes the raw stdout, stderr, and exitCode from the LLM CLI

semtest run --debug

This is useful for diagnosing unexpected LLM responses or retry behaviour.

Timeout

When --timeout <ms> is set (or timeout in config or frontmatter), each LLM subprocess is given a time limit. If the subprocess exceeds the limit:

SIGTERM is sent to the process
After 5 seconds, if still running, SIGKILL is sent
The test result is marked as an error with a timeout message

semtest run --timeout 60000  # 60 second timeout per test

A timeout of 0 (the default) means no time limit.

Tag filtering

Use --tag to run only tests whose frontmatter tags match:

semtest run --tag api,critical

This runs only tests that have at least one of the specified tags in their frontmatter. Tags in frontmatter can be YAML arrays or comma-separated strings.

Repeat and bail

# Run each test 3 times to check for flakiness
semtest run --repeat 3

# Stop on first failure
semtest run --bail

# Stop after 3 failures
semtest run --maxfail 3

--bail and --maxfail cannot be used together.

`semtest init`

Scaffolds a new semtest setup in the current directory:

semtest init

This creates:

File	Content
`semtest.config.ts`	Default config using `defineConfig()` with `claude-code-sonnet-4-6` as the model

If the config file already exists, it is skipped with a message. This command is safe to run in an existing project — it won’t overwrite anything.

`semtest list`

Displays all available model keys, grouped by tool:

semtest list

Output:

Claude Code
  claude-code-opus-4-6                 claude-opus-4-6
  claude-code-sonnet-4-6               claude-sonnet-4-6
  ...

Gemini CLI
  gemini-2.5-pro                       gemini-2.5-pro
  gemini-2.5-flash                     gemini-2.5-flash
  ...

Use --json for machine-readable output:

semtest list --json

`semtest uninstall`

Prints instructions for removing semtest based on how it was installed:

semtest uninstall