Skip to content

Writing Tests

Semantic tests are files that describe expected behaviour of a codebase in natural language. Instead of asserting on specific function outputs, they describe what should be true about the code, and an LLM evaluates whether the codebase satisfies those expectations.

Test files must follow the .spec.md or .test.md naming convention to be discovered automatically:

  • auth-middleware.spec.md
  • config-schema.test.md
  • api-routes.spec.md

Files that don’t match the .spec.md or .test.md pattern are ignored during directory discovery. If you pass a non-matching file explicitly via semtest run myfile.md, it will run with a warning.

Test files are discovered using the testMatch glob patterns in your config (default: ["**/*.spec.md", "**/*.test.md"]). The testPathIgnorePatterns config (default: ["node_modules", "dist", ".git", "vendor"]) excludes directories from discovery. The output directory is also automatically excluded.

Subdirectories create test groups for organised reporting:

semtests/
├── model-registry.spec.md
├── config-schema.spec.md
├── api/
│ ├── routes.spec.md
│ └── middleware.test.md
└── infra/
└── ci-pipeline.spec.md

In this layout, api/routes.spec.md and api/middleware.test.md are grouped under "api", while infra/ci-pipeline.spec.md is grouped under "infra". Top-level files have no group.

A semantic test file should have:

  1. A heading with an ID — used by the LLM to identify the test scenario
  2. An expectation section — what the code should do
  3. A behaviour section — specific details to verify
# Project Structure - id: project-structure
## Expectation
The project should follow a well-organized directory layout with all
TypeScript source code under `src/`, organized into logical subdirectories
for each concern: `config`, `discovery`, `frontmatter`, `prompt`, `parser`,
`runner`, `report`, `output`, `validation`, and `utils`.
## Behaviour
The `src/` directory should contain subdirectories for each module:
`config/` (Zod schema and config loader), `discovery/` (test file discovery),
`frontmatter/` (YAML frontmatter extraction), `prompt/` (LLM prompt
construction), `parser/` (JSON result parsing), `runner/` (model registry
and execution), `report/` (Markdown and JSON report generation),
`output/` (terminal progress and summary), `validation/` (result validation),
and `utils/` (process and filesystem helpers).
# Model Registry - id: model-registry
## Expectation
The runner module should implement a model registry where each supported
LLM CLI tool has factory functions that produce ModelConfig entries,
with a flat registry mapping model key strings to their configurations.
## Behaviour
The registry in `src/runner/registry.ts` should contain entries for
multiple LLM tools (Claude Code, Gemini, Codex, Aider, OpenCode, etc.).
Each entry should implement the ModelConfig interface with `command()`,
`parseOutput()`, `tool`, and optional `model` fields. The Claude Code
entries should pass the prompt as a positional argument, while other
tools should use stdin.

Spec files support YAML frontmatter for per-test overrides. Frontmatter is placed at the top of the file between --- delimiters:

---
tags: api, critical
timeout: 120000
llm: gemini-2.5-pro
skipPermissionsIfPossible: true
---
# API Route Authentication
...
FieldTypeDescription
tagsstring[] or CSV stringTags for --tag filtering. Accepts YAML arrays or comma-separated strings
timeoutnumberPer-test timeout in milliseconds, overrides config and CLI --timeout
llmmodel key stringModel key override — run semtest list to see valid keys
skipPermissionsIfPossiblebooleanOverride the global permission-bypass setting

Tags can be written as either a YAML array or a comma-separated string — both produce the same result:

# YAML array
tags:
- api
- auth
# Comma-separated string
tags: api, auth

Frontmatter overrides take precedence over CLI flags and config file values for llm, timeout, and skipPermissionsIfPossible:

frontmatter > CLI flag > config file > default
  • Be specific — mention exact file paths, function names, and expected types when possible
  • Use IDs — include id: my-id in headings so the LLM extracts consistent identifiers
  • One concern per file — each file should test one aspect of the codebase
  • Describe observable facts — focus on what can be verified by reading the code (file existence, exports, types, patterns)
  • Avoid implementation details — describe what the code does, not how it does it internally (unless that’s what you’re testing)
  • Use the naming convention — name files with .spec.md or .test.md so they are discovered automatically
  • Use tags for organisation — add tags frontmatter to group tests for selective execution with --tag