Skip to content

Running Tests

semtest provides four commands: run for executing tests, init for scaffolding a new setup, list for showing available model keys, and uninstall for printing removal instructions.

Terminal window
# Run all test files in the configured directory
semtest run
# Run specific test files
semtest run auth-middleware.spec.md api-routes.test.md
# Run all specs in a subdirectory
semtest run api/
# Run with full paths
semtest run semtests/auth-middleware.spec.md

When file or directory arguments are provided, they’re resolved against cwd first, then against the configured root directory.

FlagTypeDefaultDescription
--timestampbooleanfalseGenerate a timestamped copy of the Markdown report
--include-passingbooleanfalseInclude passing tests in the Markdown report
--strictbooleanfalseExit code 2 if validation issues are found
--skip-validationbooleanfalseSkip post-run validation entirely
--debugbooleanfalseLog raw LLM output to {output}/debug/
--timeout <ms>number0Timeout per test in milliseconds (0 = no timeout)
--junitbooleanfalseGenerate JUnit XML report
--tag <tags>string(none)Comma-separated tag filter — only run tests with matching tags
--repeat <n>number1Run each test N times (stops on first failure)
--bailbooleanfalseStop after the first failing test file
--maxfail <n>number(none)Stop after N failing test files
-t, --testNamePattern <pattern>string(none)Regex filter on test name or file path
--skip-permissions-if-possiblebooleanfalseSkip tool permission prompts where supported
--verbosebooleanfalseShow detailed per-test output

All CLI flags can also be set in semtest.config.ts:

import { defineConfig } from "@westopp/semtest";
export default defineConfig({
output: "semtest-results/",
testMatch: ["**/*.spec.md", "**/*.test.md"],
testPathIgnorePatterns: ["node_modules", "dist", ".git", "vendor"],
llm: "claude-code-sonnet-4-6",
strict: true,
debug: true,
timestamp: true,
includePassing: false,
timeout: 60000,
junit: true,
repeat: 1,
bail: false,
verbose: false,
skipPermissionsIfPossible: true,
});

CLI flags override config values, and frontmatter overrides both:

frontmatter > CLI flag > config file > schema default

For example, if the config has llm: "claude-code-sonnet-4-6" but a spec file has llm: gemini-2.5-pro in its frontmatter, that test uses Gemini. Frontmatter overrides apply to: llm, timeout, and skipPermissionsIfPossible.

CodeMeaningWhen
0PassAll tests passed
1FailAt least one test failed (but no errors)
2ErrorLLM subprocess error, parse error, or --strict with validation issues

Precedence: error (2) > fail (1) > pass (0)

When --debug is enabled:

  1. A debug/ directory is created inside the output directory (cleared on each run)
  2. For each test file, a JSON file is written containing all retry attempts
  3. Each attempt includes the raw stdout, stderr, and exitCode from the LLM CLI
semtest-results/debug/auth-middleware.spec.md.json
semtest run --debug

This is useful for diagnosing unexpected LLM responses or retry behaviour.

When --timeout <ms> is set (or timeout in config or frontmatter), each LLM subprocess is given a time limit. If the subprocess exceeds the limit:

  1. SIGTERM is sent to the process
  2. After 5 seconds, if still running, SIGKILL is sent
  3. The test result is marked as an error with a timeout message
Terminal window
semtest run --timeout 60000 # 60 second timeout per test

A timeout of 0 (the default) means no time limit.

Use --tag to run only tests whose frontmatter tags match:

Terminal window
semtest run --tag api,critical

This runs only tests that have at least one of the specified tags in their frontmatter. Tags in frontmatter can be YAML arrays or comma-separated strings.

Terminal window
# Run each test 3 times to check for flakiness
semtest run --repeat 3
# Stop on first failure
semtest run --bail
# Stop after 3 failures
semtest run --maxfail 3

--bail and --maxfail cannot be used together.

Scaffolds a new semtest setup in the current directory:

Terminal window
semtest init

This creates:

FileContent
semtest.config.tsDefault config using defineConfig() with claude-code-sonnet-4-6 as the model

If the config file already exists, it is skipped with a message. This command is safe to run in an existing project — it won’t overwrite anything.

Displays all available model keys, grouped by tool:

Terminal window
semtest list

Output:

Claude Code
claude-code-opus-4-6 claude-opus-4-6
claude-code-sonnet-4-6 claude-sonnet-4-6
...
Gemini CLI
gemini-2.5-pro gemini-2.5-pro
gemini-2.5-flash gemini-2.5-flash
...

Use --json for machine-readable output:

Terminal window
semtest list --json

Prints instructions for removing semtest based on how it was installed:

Terminal window
semtest uninstall