How it works
decoct runs a configurable pipeline of passes over YAML, JSON, and INI input.
Each pass transforms the document in-place, removing noise and
highlighting what an LLM actually needs to see.
Secret redaction
The strip-secrets pass always runs first — non-negotiably.
It uses Shannon entropy analysis, regex patterns (AWS keys, Azure
connection strings, PEM blocks, GitHub tokens, etc.), and path-based
rules (*.password, *.api_key) to find and
replace secrets with [REDACTED]. This happens before any
other processing and before any data could reach an LLM.
The audit trail records what was redacted and why, but never the
actual value.
Platform auto-detection
decoct auto-detects the platform when no --schema or
--profile is specified. Detection is content-based and
covers eight platforms: Docker Compose, Kubernetes, Ansible playbooks,
cloud-init, Terraform state, GitHub Actions, Traefik, and Prometheus.
Docker Compose files are identified by a services dict,
Terraform state by terraform_version +
resources keys, Kubernetes by apiVersion +
kind, and so on.
When detected, the matching bundled schema is applied automatically.
decoct compress docker-compose.yml just works — no
flags needed.
Platform default stripping
With a --schema file (or auto-detected), decoct removes
values that match known platform defaults. decoct ships with 25
bundled schemas covering 1,494 platform defaults:
- Container & Orchestration — Docker Compose (35), Kubernetes (50)
- Configuration Management — Ansible (132), cloud-init (55), sshd-config (35)
- Infrastructure as Code — Terraform state, AWS CloudFormation (56), Azure ARM (65), GCP (42)
- CI/CD — GitHub Actions (8), GitLab CI (25), ArgoCD (14)
- Databases — PostgreSQL (169), MariaDB/MySQL (76), MongoDB (15), Redis (61), Kafka (63)
- Observability — Prometheus (62), Grafana (162), OpenTelemetry (19), Fluent Bit (75)
- Networking — Traefik (57)
- Identity — Keycloak (78), Entra ID (44), Intune (96)
Eight platforms support auto-detection — no --schema flag needed.
Schemas carry a confidence level — authoritative,
high, medium, or low — and
the skip_low_confidence option controls stripping
aggressiveness.
Design standard enforcement
Assertions are structured representations of your design standards —
the kind of thing that normally lives in a wiki page or an engineer's
head. Each assertion has an id, a human-readable
description, a severity (must/should/may), a
rationale, and optionally a match condition.
With --assertions, the pipeline strips values that
conform to must-severity rules and annotates deviations
with # [!] comments. The result is a document that only
contains what differs from your standards — exactly what an LLM
needs to reason about.
Match conditions support 7 types: exact values, regex patterns,
numeric ranges, list membership, negation, exists
(presence/absence checks), and none (LLM-context-only). Path
patterns use dot notation with * (single segment)
and ** (any depth) wildcards.
The exists match type was the single biggest
improvement for real-world detection — services missing
healthcheck or container_name entirely
were invisible before because the matcher had nothing to walk to.
Now absent required keys are detected and annotated as
[!] missing.
Assertions without a match field still exist in the
system — they're loaded as context for LLM-based analysis in later
phases.
Schemas
Schemas define what a platform looks like by default — the
defaults map, plus drop_patterns (fields
to always remove like UUIDs) and system_managed (fields
generated by the system like timestamps). Each schema declares its
confidence level (authoritative,
high, medium, or low), which
can gate whether defaults are stripped or merely annotated.
Profiles
Profiles bundle a schema reference, assertion file references, and
per-pass configuration into a single file. Instead of passing
--schema, --assertions, and individual
pass options, you pass --profile docker.yaml and
everything is configured. Profiles support pass-specific configuration
like custom secret paths, entropy thresholds, field keep/drop
patterns, and confidence filtering.
Input format support
The pipeline accepts YAML, JSON, and INI/key-value files. JSON is
parsed and converted to the internal representation. INI files
(.ini, .conf, .cfg,
.cnf, .properties) are handled with
automatic section detection and type coercion. All passes operate
identically regardless of input format. Output is always YAML.
Directory and recursive mode
The CLI accepts directory arguments. Pass a directory to compress
all .yaml, .yml, and .json
files in it. Add --recursive to descend into
subdirectories. Multi-file runs show per-file stats and aggregate
totals.
Bundled schemas and profiles
Schemas, assertions, and profiles ship inside the package. Short
names resolve to bundled files — no external downloads or
configuration required:
decoct compress config.yaml --schema docker-compose
decoct compress config.yaml --schema cloud-init
decoct compress config.yaml --profile docker-compose
The bundled Docker Compose profile combines the schema (35
defaults), deployment standards assertions (12 rules from
OPS-DOCKER-001), and a full 9-pass pipeline.
Class-based reconstitution
The emit-classes pass adds @class header comments that document every default value stripped by the pipeline. These comments group stripped defaults into named classes by path prefix, allowing an LLM reading the compressed output to reconstruct the full configuration without access to the original schema.
LLM-assisted learning
decoct schema learn derives platform schemas from example configuration files and/or vendor documentation using Claude. decoct assertion learn derives design standard assertions from standards documents, examples, or a corpus of configuration files. Both support merging into existing files for iterative refinement.
Corpus inference mode (--corpus) analyses patterns across multiple configuration files to discover implicit standards — values that are consistent across all files become assertions automatically.
Requires pip install decoct[llm] and the ANTHROPIC_API_KEY environment variable.
Integration
decoct fits into existing workflows: CI/CD pipelines (GitHub Actions, GitLab CI), pre-commit hooks for secret detection, shell pipelines with kubectl and terraform show, and MCP tool servers for LLM agents. The deterministic pipeline and stdin/stdout design make it composable with any toolchain.
Token counting
decoct uses tiktoken
for accurate token counting. Use --stats for a
before/after summary, --stats-only to skip the YAML
output, or --show-removed for a per-pass breakdown
of what was stripped. Supports cl100k_base (GPT-4,
Claude) and o200k_base (GPT-4o) encodings.
Pipeline architecture
Passes declare ordering constraints (run_after,
run_before) and are topologically sorted. The pipeline
framework validates there are no cycles and handles timing and
statistics collection. The fixed order ensures secrets are always
stripped before any other processing, and deviation annotation always
happens after conformance checking.
Library API
Beyond the CLI, decoct exposes a Python API for embedding in your
own tooling. Run individual passes or construct a full pipeline
programmatically. The document is modified in-place using
ruamel.yaml's
round-trip loader, preserving key ordering and structure.