Writing Tests

What are semantic tests?

Semantic tests are files that describe expected behaviour of a codebase in natural language. Instead of asserting on specific function outputs, they describe what should be true about the code, and an LLM evaluates whether the codebase satisfies those expectations.

File naming convention

Test files must follow the .spec.md or .test.md naming convention to be discovered automatically:

auth-middleware.spec.md
config-schema.test.md
api-routes.spec.md

Files that don’t match the .spec.md or .test.md pattern are ignored during directory discovery. If you pass a non-matching file explicitly via semtest run myfile.md, it will run with a warning.

File location

Test files are discovered using the testMatch glob patterns in your config (default: ["**/*.spec.md", "**/*.test.md"]). The testPathIgnorePatterns config (default: ["node_modules", "dist", ".git", "vendor"]) excludes directories from discovery. The output directory is also automatically excluded.

Subdirectories create test groups for organised reporting:

semtests/
├── model-registry.spec.md
├── config-schema.spec.md
├── api/
│   ├── routes.spec.md
│   └── middleware.test.md
└── infra/
    └── ci-pipeline.spec.md

In this layout, api/routes.spec.md and api/middleware.test.md are grouped under "api", while infra/ci-pipeline.spec.md is grouped under "infra". Top-level files have no group.

File structure

A semantic test file should have:

A heading with an ID — used by the LLM to identify the test scenario
An expectation section — what the code should do
A behaviour section — specific details to verify

Example: Single test per file

# Project Structure - id: project-structure

## Expectation

The project should follow a well-organized directory layout with all
TypeScript source code under `src/`, organized into logical subdirectories
for each concern: `config`, `discovery`, `frontmatter`, `prompt`, `parser`,
`runner`, `report`, `output`, `validation`, and `utils`.

## Behaviour

The `src/` directory should contain subdirectories for each module:
`config/` (Zod schema and config loader), `discovery/` (test file discovery),
`frontmatter/` (YAML frontmatter extraction), `prompt/` (LLM prompt
construction), `parser/` (JSON result parsing), `runner/` (model registry
and execution), `report/` (Markdown and JSON report generation),
`output/` (terminal progress and summary), `validation/` (result validation),
and `utils/` (process and filesystem helpers).

Example: Architecture verification

# Model Registry - id: model-registry

## Expectation

The runner module should implement a model registry where each supported
LLM CLI tool has factory functions that produce ModelConfig entries,
with a flat registry mapping model key strings to their configurations.

## Behaviour

The registry in `src/runner/registry.ts` should contain entries for
multiple LLM tools (Claude Code, Gemini, Codex, Aider, OpenCode, etc.).
Each entry should implement the ModelConfig interface with `command()`,
`parseOutput()`, `tool`, and optional `model` fields. The Claude Code
entries should pass the prompt as a positional argument, while other
tools should use stdin.

Frontmatter

Spec files support YAML frontmatter for per-test overrides. Frontmatter is placed at the top of the file between --- delimiters:

---
tags: api, critical
timeout: 120000
llm: gemini-2.5-pro
skipPermissionsIfPossible: true
---

# API Route Authentication

...

Supported fields

Field	Type	Description
`tags`	`string[]` or CSV string	Tags for `--tag` filtering. Accepts YAML arrays or comma-separated strings
`timeout`	`number`	Per-test timeout in milliseconds, overrides config and CLI `--timeout`
`llm`	model key string	Model key override — run `semtest list` to see valid keys
`skipPermissionsIfPossible`	`boolean`	Override the global permission-bypass setting

Tag formats

Tags can be written as either a YAML array or a comma-separated string — both produce the same result:

# YAML array
tags:
  - api
  - auth

# Comma-separated string
tags: api, auth

Precedence

Frontmatter overrides take precedence over CLI flags and config file values for llm, timeout, and skipPermissionsIfPossible:

frontmatter > CLI flag > config file > default

Tips for writing effective tests

Be specific — mention exact file paths, function names, and expected types when possible
Use IDs — include id: my-id in headings so the LLM extracts consistent identifiers
One concern per file — each file should test one aspect of the codebase
Describe observable facts — focus on what can be verified by reading the code (file existence, exports, types, patterns)
Avoid implementation details — describe what the code does, not how it does it internally (unless that’s what you’re testing)
Use the naming convention — name files with .spec.md or .test.md so they are discovered automatically
Use tags for organisation — add tags frontmatter to group tests for selective execution with --tag