How it works

decoct runs a configurable pipeline of passes over YAML, JSON, and INI input. Each pass transforms the document in-place, removing noise and highlighting what an LLM actually needs to see.

Secret redaction

The strip-secrets pass always runs first — non-negotiably. It uses Shannon entropy analysis, regex patterns (AWS keys, Azure connection strings, PEM blocks, GitHub tokens, etc.), and path-based rules (*.password, *.api_key) to find and replace secrets with [REDACTED]. This happens before any other processing and before any data could reach an LLM.

The audit trail records what was redacted and why, but never the actual value.

Platform auto-detection

decoct auto-detects the platform when no --schema or --profile is specified. Detection is content-based and covers eight platforms: Docker Compose, Kubernetes, Ansible playbooks, cloud-init, Terraform state, GitHub Actions, Traefik, and Prometheus. Docker Compose files are identified by a services dict, Terraform state by terraform_version + resources keys, Kubernetes by apiVersion + kind, and so on.

When detected, the matching bundled schema is applied automatically. decoct compress docker-compose.yml just works — no flags needed.

Platform default stripping

With a --schema file (or auto-detected), decoct removes values that match known platform defaults. decoct ships with 25 bundled schemas covering 1,494 platform defaults:

  • Container & Orchestration — Docker Compose (35), Kubernetes (50)
  • Configuration Management — Ansible (132), cloud-init (55), sshd-config (35)
  • Infrastructure as Code — Terraform state, AWS CloudFormation (56), Azure ARM (65), GCP (42)
  • CI/CD — GitHub Actions (8), GitLab CI (25), ArgoCD (14)
  • Databases — PostgreSQL (169), MariaDB/MySQL (76), MongoDB (15), Redis (61), Kafka (63)
  • Observability — Prometheus (62), Grafana (162), OpenTelemetry (19), Fluent Bit (75)
  • Networking — Traefik (57)
  • Identity — Keycloak (78), Entra ID (44), Intune (96)

Eight platforms support auto-detection — no --schema flag needed.

Schemas carry a confidence level — authoritative, high, medium, or low — and the skip_low_confidence option controls stripping aggressiveness.

Design standard enforcement

Assertions are structured representations of your design standards — the kind of thing that normally lives in a wiki page or an engineer's head. Each assertion has an id, a human-readable description, a severity (must/should/may), a rationale, and optionally a match condition.

With --assertions, the pipeline strips values that conform to must-severity rules and annotates deviations with # [!] comments. The result is a document that only contains what differs from your standards — exactly what an LLM needs to reason about.

Match conditions support 7 types: exact values, regex patterns, numeric ranges, list membership, negation, exists (presence/absence checks), and none (LLM-context-only). Path patterns use dot notation with * (single segment) and ** (any depth) wildcards.

The exists match type was the single biggest improvement for real-world detection — services missing healthcheck or container_name entirely were invisible before because the matcher had nothing to walk to. Now absent required keys are detected and annotated as [!] missing.

Assertions without a match field still exist in the system — they're loaded as context for LLM-based analysis in later phases.

Schemas

Schemas define what a platform looks like by default — the defaults map, plus drop_patterns (fields to always remove like UUIDs) and system_managed (fields generated by the system like timestamps). Each schema declares its confidence level (authoritative, high, medium, or low), which can gate whether defaults are stripped or merely annotated.

Profiles

Profiles bundle a schema reference, assertion file references, and per-pass configuration into a single file. Instead of passing --schema, --assertions, and individual pass options, you pass --profile docker.yaml and everything is configured. Profiles support pass-specific configuration like custom secret paths, entropy thresholds, field keep/drop patterns, and confidence filtering.

Input format support

The pipeline accepts YAML, JSON, and INI/key-value files. JSON is parsed and converted to the internal representation. INI files (.ini, .conf, .cfg, .cnf, .properties) are handled with automatic section detection and type coercion. All passes operate identically regardless of input format. Output is always YAML.

Directory and recursive mode

The CLI accepts directory arguments. Pass a directory to compress all .yaml, .yml, and .json files in it. Add --recursive to descend into subdirectories. Multi-file runs show per-file stats and aggregate totals.

Bundled schemas and profiles

Schemas, assertions, and profiles ship inside the package. Short names resolve to bundled files — no external downloads or configuration required:

decoct compress config.yaml --schema docker-compose
decoct compress config.yaml --schema cloud-init
decoct compress config.yaml --profile docker-compose

The bundled Docker Compose profile combines the schema (35 defaults), deployment standards assertions (12 rules from OPS-DOCKER-001), and a full 9-pass pipeline.

Class-based reconstitution

The emit-classes pass adds @class header comments that document every default value stripped by the pipeline. These comments group stripped defaults into named classes by path prefix, allowing an LLM reading the compressed output to reconstruct the full configuration without access to the original schema.

LLM-assisted learning

decoct schema learn derives platform schemas from example configuration files and/or vendor documentation using Claude. decoct assertion learn derives design standard assertions from standards documents, examples, or a corpus of configuration files. Both support merging into existing files for iterative refinement.

Corpus inference mode (--corpus) analyses patterns across multiple configuration files to discover implicit standards — values that are consistent across all files become assertions automatically.

Requires pip install decoct[llm] and the ANTHROPIC_API_KEY environment variable.

Integration

decoct fits into existing workflows: CI/CD pipelines (GitHub Actions, GitLab CI), pre-commit hooks for secret detection, shell pipelines with kubectl and terraform show, and MCP tool servers for LLM agents. The deterministic pipeline and stdin/stdout design make it composable with any toolchain.

Token counting

decoct uses tiktoken for accurate token counting. Use --stats for a before/after summary, --stats-only to skip the YAML output, or --show-removed for a per-pass breakdown of what was stripped. Supports cl100k_base (GPT-4, Claude) and o200k_base (GPT-4o) encodings.

Pipeline architecture

Passes declare ordering constraints (run_after, run_before) and are topologically sorted. The pipeline framework validates there are no cycles and handles timing and statistics collection. The fixed order ensures secrets are always stripped before any other processing, and deviation annotation always happens after conformance checking.

Library API

Beyond the CLI, decoct exposes a Python API for embedding in your own tooling. Run individual passes or construct a full pipeline programmatically. The document is modified in-place using ruamel.yaml's round-trip loader, preserving key ordering and structure.