Skip to content

Running Tests

semtest provides two commands: run for executing tests and init for scaffolding a new setup.

Terminal window
# Run all spec files in the configured directory
semtest run
# Run specific test files
semtest run auth-middleware.spec.md api-routes.spec.txt
# Run all specs in a subdirectory
semtest run api/
# Run with full paths
semtest run semantic-tests/auth-middleware.spec.md

When file or directory arguments are provided, they’re resolved against cwd first, then against the configured tests directory.

FlagTypeDefaultDescription
--timestampbooleanfalseGenerate a timestamped copy of the Markdown report
--include-passingbooleanfalseInclude passing tests in the Markdown report
--strictbooleanfalseExit code 2 if validation issues are found
--skip-validationbooleanfalseSkip post-run validation entirely
--extensions <exts>string(all files)Comma-separated file extensions (e.g. .md,.txt)
--debugbooleanfalseLog raw LLM output to {output}/debug/
--timeout <ms>number0Timeout per test in milliseconds (0 = no timeout)
--junitbooleanfalseGenerate JUnit XML report

All CLI flags can also be set in semtest.config.ts:

import { defineConfig } from "@westopp/semtest";
export default defineConfig({
tests: "semantic-tests/",
output: "semantic-test-results/",
llm: {
runner: "claude",
capability: "balanced",
},
strict: true,
debug: true,
timestamp: true,
includePassing: false,
extensions: [".md", ".txt"],
timeout: 60000,
junit: true,
});

CLI flags always override config file values:

CLI flag > config file > schema default

For example, if the config has strict: true but you run semtest run without --strict, strict mode is still enabled. But if you explicitly pass a flag, it wins.

CodeMeaningWhen
0PassAll tests passed
1FailAt least one test failed (but no errors)
2ErrorLLM subprocess error, parse error, or --strict with validation issues

Precedence: error (2) > fail (1) > pass (0)

When --debug is enabled:

  1. A debug/ directory is created inside the output directory (cleared on each run)
  2. For each test file, a JSON file is written containing all retry attempts
  3. Each attempt includes the raw stdout, stderr, and exitCode from the LLM CLI
semantic-test-results/debug/auth-middleware.spec.md.json
semtest run --debug

This is useful for diagnosing unexpected LLM responses or retry behaviour.

When --timeout <ms> is set (or timeout in config), each LLM subprocess is given a time limit. If the subprocess exceeds the limit:

  1. SIGTERM is sent to the process
  2. After 5 seconds, if still running, SIGKILL is sent
  3. The test result is marked as an error with a timeout message
Terminal window
semtest run --timeout 60000 # 60 second timeout per test

A timeout of 0 (the default) means no time limit.

Scaffolds a new semtest setup in the current directory:

Terminal window
semtest init

This creates:

File/DirectoryContent
semtest.config.tsDefault config using defineConfig() with Claude as the runner
semantic-tests/Test directory
semantic-tests/example.spec.mdExample test file with heading, expectation, and behaviour sections

If any of these already exist, they are skipped with a message. This command is safe to run in an existing project — it won’t overwrite anything.