Skip to content

Technical Overview

semtest is a pipeline-based CLI tool. Data flows forward through discrete stages, each transforming it into the input the next stage needs. The mental model: load config, find test files, construct prompts, invoke LLMs, parse responses, validate results, generate reports, print a summary. Each stage is a separate module with clear inputs and outputs — the config module produces a SemtestConfig, discovery produces SemanticTest[], the runner produces RunResult, and so on. This makes the codebase easy to navigate: if you want to understand how prompts work, you read src/prompt/builder.ts. If you want to understand how LLM output is parsed, you read src/parser/result.ts.

When a user runs semtest run, the following pipeline executes:

  1. CLI parses flags and arguments via Commander
  2. Config Loader finds and validates semtest.config.ts using jiti + Zod
  3. Discovery scans for test files matching testMatch glob patterns (or resolves specific file paths and directories)
  4. Execute Loop iterates over each test sequentially:
    • Frontmatter overrides (llm, timeout, skipPermissions) are resolved per test
    • Prompt Builder constructs the full LLM prompt from the test file content
    • Model Registry resolves the model key to a ModelConfig with command builder and output parser
    • Process Spawner invokes the LLM CLI as a child process
    • Parser extracts JSON results from the raw LLM output (with fallback strategies)
    • Retries up to 3 times on empty responses
    • Supports repeat runs (stop on first failure) and bail/maxfail for early exit
  5. Validation checks for duplicate IDs and invalid results
  6. Reports are generated (Markdown, JSON, and optionally JUnit XML) and written to the output directory
  7. Terminal Output prints a summary with colour-coded results
  8. Exit code is determined: 0 = pass, 1 = fail, 2 = error

The diagram below shows how modules connect. Dashed lines indicate optional or feedback paths.

CLI (cli.ts) Config Loader Discovery Prompt Builder Execute Loop Frontmatter Model Registry Process Spawner Parser Validation MD Report JSON Report JUnit Report Terminal Output Debug Output
ColourCategoryModules
BlueEntry pointCLI
AmberCore pipelineConfig, Discovery, Frontmatter, Execute Loop, Prompt Builder, Model Registry, Process Spawner, Parser, Validation
GreenOutput layerMD Report, JSON Report, JUnit Report, Terminal Output, Debug Output

The most important types flow through the pipeline:

TypeModulePurpose
SemtestConfigconfig/schemaValidated configuration object
SemanticTestdiscovery/testsDiscovered test file with name, path, content, frontmatter, tags, and optional group
FrontmatterDatafrontmatter/parserParsed YAML frontmatter fields (tags, timeout, llm, skipPermissionsIfPossible)
ModelConfigrunner/registryCommand builder + output parser for a specific model key
ModelKeyrunner/registryUnion type of all registered model key strings
CommandSpecrunner/registryCommand + args + optional stdin for a CLI invocation
TestResultparser/resultParsed result for a single test scenario
TestRunResultrunner/executeResult tied to its source file
RunResultrunner/executeFull run output with summary and status
CIResultreport/jsonJSON report shape consumed by CI
ValidationResultvalidation/resultsValidation issues found post-run