Technical Overview

semtest is a pipeline-based CLI tool. Data flows forward through discrete stages, each transforming it into the input the next stage needs. The mental model: load config, find test files, construct prompts, invoke LLMs, parse responses, validate results, generate reports, print a summary. Each stage is a separate module with clear inputs and outputs — the config module produces a SemtestConfig, discovery produces SemanticTest[], the runner produces RunResult, and so on. This makes the codebase easy to navigate: if you want to understand how prompts work, you read src/prompt/builder.ts. If you want to understand how LLM output is parsed, you read src/parser/result.ts.

Execution flow

When a user runs semtest run, the following pipeline executes:

CLI parses flags and arguments via Commander
Config Loader finds and validates semtest.config.ts using jiti + Zod
Discovery scans for test files matching testMatch glob patterns (or resolves specific file paths and directories)
Execute Loop iterates over each test sequentially:
- Frontmatter overrides (llm, timeout, skipPermissions) are resolved per test
- Prompt Builder constructs the full LLM prompt from the test file content
- Model Registry resolves the model key to a ModelConfig with command builder and output parser
- Process Spawner invokes the LLM CLI as a child process
- Parser extracts JSON results from the raw LLM output (with fallback strategies)
- Retries up to 3 times on empty responses
- Supports repeat runs (stop on first failure) and bail/maxfail for early exit
Validation checks for duplicate IDs and invalid results
Reports are generated (Markdown, JSON, and optionally JUnit XML) and written to the output directory
Terminal Output prints a summary with colour-coded results
Exit code is determined: 0 = pass, 1 = fail, 2 = error

Module diagram

The diagram below shows how modules connect. Dashed lines indicate optional or feedback paths.

Legend

Colour	Category	Modules
Blue	Entry point	CLI
Amber	Core pipeline	Config, Discovery, Frontmatter, Execute Loop, Prompt Builder, Model Registry, Process Spawner, Parser, Validation
Green	Output layer	MD Report, JSON Report, JUnit Report, Terminal Output, Debug Output

Key types

The most important types flow through the pipeline:

Type	Module	Purpose
`SemtestConfig`	config/schema	Validated configuration object
`SemanticTest`	discovery/tests	Discovered test file with name, path, content, frontmatter, tags, and optional group
`FrontmatterData`	frontmatter/parser	Parsed YAML frontmatter fields (tags, timeout, llm, skipPermissionsIfPossible)
`ModelConfig`	runner/registry	Command builder + output parser for a specific model key
`ModelKey`	runner/registry	Union type of all registered model key strings
`CommandSpec`	runner/registry	Command + args + optional stdin for a CLI invocation
`TestResult`	parser/result	Parsed result for a single test scenario
`TestRunResult`	runner/execute	Result tied to its source file
`RunResult`	runner/execute	Full run output with summary and status
`CIResult`	report/json	JSON report shape consumed by CI
`ValidationResult`	validation/results	Validation issues found post-run