12 — Testing¶
Testing in k8s-lab is the operator's contract for "did my change keep the canonical flow green?" There are three layers and they are run in this order whenever a role, a chart, or a module is touched:
make lint— yamllint, ansible-lint,terraform fmt -check,helm lint. Cheap, deterministic, runs in seconds.- Per-role Molecule scenarios — one scenario per Ansible role
under
tests/molecule/, driven by Molecule's delegated driver on top of a single shared Vagrant + libvirt VM. Each scenario runsprepare→converge→idempotence→verify→destroyagainst the live VM. make test-local-e2e— thee2e-localMolecule scenario, which executes the full canonical flow (substrate → bootstrap → pivot → cleanup → workload, plus Gates A/B at every cluster).
The harness is the only end-to-end driver shipped in the repo.
There is no CI; the "test before commit" rule (plan §2.11a,
user-memory feedback_test_before_commit) is enforced by running
this locally on the developer's machine.
The full specification lives in plan §9 (local development and
testing) and §9.5 (shared inventory architecture).
1. Lint¶
make lint is the entry point for static checks. It chains four
sub-targets in the order they typically catch the most bugs:
| Sub-target | What it runs | Scope |
|---|---|---|
make lint-yaml |
yamllint -c .yamllint . |
every *.yml/*.yaml in the repo against .yamllint |
make lint-ansible |
ansible-lint roles/ from ansible/ |
every role under ansible/roles/ |
make lint-terraform |
terraform -chdir=<d> fmt -check -recursive for every TF directory under terraform/ and tests/fixtures/terraform/ |
formatting only — not terraform validate |
make lint-helm |
helm lint <chart> for every charts/*/ with a Chart.yaml |
the five wrapper charts (capi-cluster-class, capi-workload-cluster, cni-calico, metallb, metallb-config) |
When to run: before every commit, and again before opening any
PR. Lint failures are not optional and not gated on changed files —
the contract is "the entire repo is lint-clean at HEAD". The Python
side (yamllint, ansible-lint) requires the project venv to be
active; terraform and helm come from the system PATH.
2. Molecule scenarios¶
Every Ansible role under ansible/roles/ has a matching scenario
directory under tests/molecule/. The naming contract (plan §9.5.2):
The scenario directory name (kebab-case) equals the role directory name (snake_case → kebab-case). E.g. role
bootstrap_clusterctl⇄ scenario directorybootstrap-clusterctl. The scenario'sscenario.namefield inmolecule.ymlkeeps the snake_case form so thatMOLECULE_SCENARIO_NAMEmatches the role directory consumed byshared/converge.yml(see §3 below).
The currently wired scenarios, in canonical-flow order:
| # | Scenario directory | Role exercised | Phase |
|---|---|---|---|
| 1 | base-system |
base_system |
host substrate |
| 2 | binary-fetch |
binary_fetch |
host substrate |
| 3 | lxd-host |
lxd_host |
LXD substrate |
| 4 | lxd-project |
lxd_project |
LXD substrate |
| 5 | lxd-storage-pools |
lxd_storage_pools |
LXD substrate |
| 6 | lxd-network-int-managed |
lxd_network_int_managed |
LXD substrate |
| 7 | lxd-profiles |
lxd_profiles |
LXD substrate |
| 8 | lxd-bootstrap-instance |
lxd_bootstrap_instance |
bootstrap |
| 9 | bootstrap-k3s |
bootstrap_k3s |
bootstrap |
| 10 | bootstrap-clusterctl |
bootstrap_clusterctl |
bootstrap |
| 11 | bootstrap-capn-secret |
bootstrap_capn_secret |
bootstrap |
| 12 | export-artifacts |
export_artifacts |
bootstrap (handoff) |
| 13 | cleanup-bootstrap |
cleanup_bootstrap |
post-pivot |
| 14 | e2e-local |
(composite — full canonical flow) | end-to-end |
The tests/molecule/Makefile curated SCENARIOS list is the source
of truth; make -C tests/molecule harness-smoke enumerates and
sanity-checks it.
The pivot_clusterctl_move role is exercised only inside the
e2e-local composite scenario; it has no standalone Molecule
scenario by design (pivot is an end-to-end behaviour, not a
substrate primitive).
3. Driver model — delegated¶
Molecule supports several drivers (Docker, Podman, EC2…); k8s-lab
uses the delegated driver, which is also the default since
Molecule 6.x. The contract is a clean split (plan §9.1):
- The harness owns
create/destroy. We bring the Vagrant + libvirt VM up viamake -C tests/vagrant/debian13 upand tear it down viamake -C tests/vagrant/debian13 destroy. Thescripts/molecule_run.pywrapper invokes those Make targets and exportsK8SLAB_HOST_{ADDR,USER,PORT,KEY}fromvagrant ssh-configbefore exec'ingmolecule. - Molecule owns
prepare/converge/idempotence/verify. Each scenario plays its ownprepare.yml+ the sharedconverge.yml(or its ownconverge.ymlfor integration scenarios) + averify.yml.
In molecule.yml the wiring is plain:
driver:
name: default
options:
managed: false # we own create/destroy ourselves
platforms:
- name: k8slab-host
groups:
- k8slab_host
managed: false is what tells Molecule "do not generate a
create/destroy playbook from a driver template — just use the
inventory the harness gives you".
3.1. Why delegated¶
The decisive reason is VM economy. A naive Molecule run with the
Vagrant driver builds and destroys a VM per scenario; a full
canonical-flow check (14 scenarios) would take hours of vagrant
up/destroy cycles on a laptop. The delegated split lets us share
one Vagrant VM across all scenarios, which keeps full-cycle
runs in minutes rather than hours.
The scripts/molecule_run.py wrapper additionally self-heals when
the live VM identity changes (a vagrant destroy or host reboot
between two molecule invocations): it compares
tests/vagrant/debian13/.vagrant/machines/host/libvirt/id against
.artifacts/harness-vm-id and wipes any stale
~/.ansible/tmp/molecule.* scenario state when the UUIDs differ.
This is what makes make -C tests/molecule <scenario>-vagrant-test
"just work" against a fresh VM with no manual reset step.
4. Shared inventory architecture (§9.5)¶
Production playbooks read one host_vars file per host. The
harness mirrors this: every Molecule scenario sees the same
substrate host_vars so a scenario can never silently diverge from
the others. The Step 8 incident (plan §9.5) — export-artifacts
forgot the lxd_bootstrap_instance_devices proxy spec, the role
reconciled it to {}, and runner-side kubectl then could not
reach bootstrap — was the trigger for this consolidation.
4.1. Layout¶
tests/molecule/
├── shared/
│ ├── inventory/group_vars/k8slab_host.yml ← single substrate file
│ ├── tasks/ ← shared prepare snippets
│ ├── converge.yml ← role-shim for per-role scenarios
│ └── verify.yml ← shared verify helpers
├── <scenario>/
│ ├── molecule.yml ← inventory.links + scenario meta
│ ├── prepare.yml
│ ├── verify.yml
│ └── host_vars/k8slab-host.yml ← optional, scenario-local override
└── Makefile
4.2. The single substrate file¶
tests/molecule/shared/inventory/group_vars/k8slab_host.yml carries:
- Connection coordinates via env-lookups —
ansible_host: "{{ lookup('env', 'K8SLAB_HOST_ADDR') }}"and so on, populated byscripts/molecule_run.pyfromvagrant ssh-config. k8s_lab_*globals (plan§8) — version pins, network CIDRs, cluster identity, topology counts.- Substrate
host_varsper role —lxd_host_*,lxd_storage_pools_pools,lxd_bootstrap_instance_devices(the LXD proxy publishing bootstrap k3s on:16443),bootstrap_k3s_wait_*,bootstrap_clusterctl_*,base_system_btrfs_pool_required: true(prod-like default),export_artifacts_rootand_mgmt_api_server_url.
4.3. Per-scenario molecule.yml¶
Each scenario file is small (~65 lines) and only carries the links + scenario meta:
provisioner:
name: ansible
inventory:
links:
group_vars: ../shared/inventory/group_vars
host_vars: host_vars # only if scenario-local file exists
playbooks:
prepare: prepare.yml
converge: ${MOLECULE_PROJECT_DIRECTORY}/shared/converge.yml
verify: verify.yml
scenario:
name: <role_name_in_snake_case>
The shared converge.yml reads MOLECULE_SCENARIO_NAME from the
environment (Molecule sets it for every play) and include_roles
the matching role under ansible/roles/. This is why the contract
"scenario.name == role directory name" is load-bearing — it is
what wires the converge step.
A subtlety from plan §9.5.2: provisioner.inventory.host_vars
inside molecule.yml is silently dropped when
inventory.links is non-empty (Molecule's
provisioner/ansible.py:442 is all-or-nothing). That is why
scenario-local overrides MUST live in a real
<scenario>/host_vars/k8slab-host.yml file referenced through
inventory.links.host_vars: host_vars, not in molecule.yml.
4.4. Active scenario-local overrides¶
Per plan §9.5.3, the only override currently in use is
base_system_btrfs_pool_required: false on every scenario from
binary-fetch onwards (binary-fetch, lxd-storage-pools,
lxd-network-int-managed, lxd-profiles,
lxd-bootstrap-instance, bootstrap-k3s, bootstrap-clusterctl,
bootstrap-capn-secret, export-artifacts). The reason: those
roles either do not touch the btrfs pool contract, or LXD already
owns the disk from previous runs and the base_system btrfs check
is no longer applicable.
base-system is the only scenario that exercises
base_system_btrfs_pool_required: true end-to-end (it inherits
the shared default with no override).
5. Running individual scenarios¶
tests/molecule/Makefile exposes a pattern target:
- scenario — kebab-case directory name from §2 above.
- driver —
delegated(canonical) orvagrant(alias). Both resolve to the same recipe. The driver token stays in the target even when only one is wired, so the invocation scheme is stable across the repo (plan§9.1). - action — one of
create,prepare,converge,idempotence,verify,test,destroy,lint.testis the full sequencedependency → create → prepare → converge → idempotence → verify → destroy.
Examples:
make test-local-harness # bring up VM
make -C tests/molecule bootstrap-clusterctl-vagrant-test # full cycle
make -C tests/molecule lxd-host-delegated-converge # converge only
make -C tests/molecule lxd-host-delegated-destroy # state only
make -C tests/molecule destroy-all # all scenario state
The shared converge.yml honours the meta-dependency chain in each
role's own meta/main.yml (user-memory
feedback_role_dependencies). Running
bootstrap-clusterctl-vagrant-test therefore pulls in base_system
→ binary_fetch → all lxd_* substrate roles →
lxd_bootstrap_instance → bootstrap_k3s → bootstrap_clusterctl
in dependency order. The shared VM caches everything earlier in the
chain, so the second scenario you run is fast.
The harness-smoke target lists every wired scenario without
running any of them — useful when a scenario directory has been
moved or renamed and you want to know whether the Make pattern
still resolves it.
6. Running e2e-local¶
This is the master target. It (1) brings up the Vagrant VM if not
already running (make -C tests/vagrant/debian13 up), (2) runs
make -C tests/molecule e2e-local-vagrant-converge — substrate →
mgmt-1 helm install → CNI → MetalLB → Gate A/B → pivot → re-emit
mgmt.kubeconfig → cleanup_bootstrap → workload helm install →
CNI → MetalLB, and (3) runs make -C tests/molecule
e2e-local-vagrant-verify — workload Gate A/B + external curl to
the MetalLB VIP via ext6-ra-peer + workload Nodes Ready=True via
the runner-side rewritten kubeconfig + a CAPI snapshot of the
self-hosted mgmt-1.
The full phase breakdown with timings is in
06-quickstart-local.md §3. Rough
runtimes:
- first run (cold image cache): 30–45 min;
- warm cache, fresh VM: ~20 min;
- warm cache, existing VM: ~15–18 min.
Failure halts at the first red gate. If Gate A/B on mgmt-1 fails,
pivot is never attempted — the run aborts before touching mgmt-1's
state. If pivot itself fails, cleanup_bootstrap is never run, so
the bootstrap LXC stays around for inspection (plan §3).
7. What verify.yml asserts¶
Every scenario carries a verify.yml. The assertion shape varies
by phase:
- Substrate roles (
base-system…lxd-profiles) — artefacts on disk (paths, modes, file content), systemd services enabled+active, LXD object state vialxc list -f jsonorcommunity.general.lxd_*modules. - Bootstrap roles —
bootstrap-k3schecks the k3s API on bootstrap LXC127.0.0.1:6443;bootstrap-clusterctlasserts the four CAPI provider namespaces are populated and cert-manager - CAPN Deployments are Ready;
bootstrap-capn-secretchecks the move-labelled identity Secret exists in everyk8s_lab_capn_identity_namespacesnamespace;export-artifactschecks.artifacts/mgmt.kubeconfig(mode 0600) authenticates against the bootstrap API. e2e-localruns (in this order):helm test capi-workload-cluster -n capi-clustersagainst mgmt-1 — chart-side cluster-ready hook;helm test cni-calico -n tigera-operatoron the workload — Gate B (CNI viability);helm test metallb-config -n metallb-systemon the workload — Gate A in-cluster;- external HTTP GET from the Vagrant VM to the MetalLB
IPv6 VIP — Gate A out-of-cluster, exercising the data path
ext6-ra-peer→ veth →br-ext6→ speaker leader's eth1 → kube-proxy → backend Pod; - workload Nodes Ready=True via the rewritten
.artifacts/clusters/lab-default.kubeconfig, count =k8s_lab_workload_controlplane_count + k8s_lab_workload_worker_count; - CAPI snapshot — Cluster, KubeadmControlPlane, MachineDeployment,
Machine, LXCCluster, LXCMachine in
capi-clustersagainst the self-hostedmgmt.kubeconfig. Confirms pivot landed cleanly.
Every step is idempotent: chart-side helm test Pods carry
hook-delete-policy: before-hook-creation,hook-succeeded.
8. Per-role Molecule sequence on a clean VM¶
When a role changes, a single <role>-vagrant-test is not
sufficient. Per user-memory rule
feedback_per_role_molecule_sequence, the obligatory check is to
run scenarios in dependency order on a freshly destroyed VM, without
destroying the VM between them, all the way through to
export_artifacts. export_artifacts is the credential-producing
role that downstream Phases 5+ depend on; running up to it
validates the artefact contract (mgmt.kubeconfig,
mgmt.auto.tfvars.json) even when the changed role itself is
upstream of it.
Recommended sequence — start with make destroy-vm, then run each
of the following with make -C tests/molecule <s>-vagrant-test in
order, breaking on first failure:
base-system → binary-fetch → lxd-host → lxd-project →
lxd-storage-pools → lxd-network-int-managed → lxd-profiles →
lxd-bootstrap-instance → bootstrap-k3s → bootstrap-clusterctl →
bootstrap-capn-secret → export-artifacts
If a step fails, the VM is left in the failed state for inspection —
make destroy-vm is the explicit reset.
9. Long-running tests in the background¶
Per user-memory rule feedback_background_tests_streaming,
test-local-e2e (30+ min) and the per-role chain above are
not to be run in the foreground with a 15-minute timeout
hanging over you. Run them in the background and stream the log:
For active provisioning (CAPN reconciling LXC machines, kubeadm
init on a new CP, …) rule feedback_active_provisioning_monitor
adds a second tail on the substrate side — open another terminal
and watch CAPN controller logs and CR status (e.g. kubectl -n
capn-system logs -l app.kubernetes.io/name=capi-provider-incus -f
pre-pivot; kubectl -n capi-clusters get cluster,machine,lxcmachine
-w post-pivot).
The point is to catch the actual error before the test pod hits its own timeout — the test stdout often goes silent for minutes while the substrate is the one failing.
10. Where the harness lives¶
| Concern | Path |
|---|---|
| Scenario directories | tests/molecule/<scenario>/ |
Shared substrate host_vars |
tests/molecule/shared/inventory/group_vars/k8slab_host.yml |
| Shared converge shim + prepare snippets | tests/molecule/shared/converge.yml, tests/molecule/shared/tasks/ |
| Per-role wrapper Makefile | tests/molecule/Makefile |
| Vagrant VM lifecycle + libvirt nets | tests/vagrant/debian13/{Makefile,Vagrantfile,libvirt-networks/} |
| Molecule wrapper + SSH helper | scripts/molecule_run.py, scripts/_harness.py |
| Top-level lint + e2e targets | repo-root Makefile |
Plan sources: §9 (local development), §9.4 (Make scheme),
§9.5 (shared inventory), §10.2 (test-local-e2e),
§13.x (per-role scenario specifics).
11. CI¶
There is no CI in this repository. Linters and Molecule are
intentionally local-only — the "test before commit" policy
(plan §2.11a, user-memory feedback_test_before_commit) puts
the responsibility on the developer running the harness on a real
VM before pushing. A consumer repository wrapping this code is
free to add CI on top of make lint + make test-local-e2e —
that is a consumer concern, not a Stage 1 contract.