03 — Stack¶
This chapter is the reference list of every external dependency this repository pins, with the upstream version, where the pin lives in the source tree, and a one-paragraph rationale per component.
The single source of truth for every pin is the verified version
log at §8a of
../plans/PLAN-stage1-common.md;
this chapter is a curated mirror of that table plus the rationale
required by §2.11. If the table here ever diverges from §8a, the
plan wins and this page must be regenerated.
For the why of the architecture itself (single-host model,
bootstrap-and-pivot, layer ownership), see
02-architecture.md. This chapter does not
re-tell that flow — it only enumerates the pieces.
Summary table¶
| Component | Version | Where pinned | Notes |
|---|---|---|---|
| Kubernetes (workload + mgmt) | v1.35.0 |
k8s_lab_kubernetes_version |
Bounded by upstream CAPN simplestreams kubeadm/<ver> images — §8a deviation. |
| k3s (bootstrap cluster only) | v1.35.3+k3s1 |
k8s_lab_k3s_version |
Single binary, single node, host-network controllers. |
| kubectl | v1.35.3 |
k8s_lab_kubectl_version |
Fetched into /opt/capi-lab/bin by binary_fetch. |
Cluster API (clusterctl + core) |
v1.12.5 |
k8s_lab_clusterctl_version |
Drives clusterctl init on bootstrap, clusterctl move on pivot. |
CAPN (cluster-api-provider-incus) |
v0.8.5 |
k8s_lab_capn_provider_version |
LXD/Incus infrastructure provider. |
| LXD snap channel | 6/stable |
lxd_host_snap_channel |
Feature-stable track; deviates from Canonical LTS recommendation 5.21/stable per §2.11. |
Calico (tigera-operator chart) |
v3.31.5 |
k8s_lab_calico_chart_version |
Subchart dependency in charts/cni-calico/. |
| MetalLB chart | 0.15.3 |
k8s_lab_metallb_chart_version |
Subchart dependency in charts/metallb/. |
Terraform hashicorp/helm provider |
3.1.1 |
k8s_lab_helm_provider_version |
Sole driver of CR-creating helm releases. |
ansible.posix collection |
>=2.1.0 |
ansible/requirements.yml |
Lower bound only — no ceiling per §2.11. |
community.general collection |
>=12.6.0 |
ansible/requirements.yml |
Used by lxd_* roles. |
community.crypto collection |
>=3.2.0 |
ansible/requirements.yml |
CAPN restricted-cert generation. |
kubernetes.core collection |
>=6.4.0 |
ansible/requirements.yml |
k8s_info polling on bootstrap k3s and mgmt-1. |
python3-kubernetes (Debian Trixie) |
30.1.0-2 |
tests/molecule/shared/tasks/prepare.yml |
Required on the executor node by kubernetes.core. |
Host platform¶
- OS target: Debian-family Linux. The pinned reference is Debian 13
Trixie — both the production target host (per plan
§2.1) and the local Vagrant VM run this build. Role preflight checks enforce the host distribution; see role source for the exact assertion. - Package source: only the standard Debian APT repositories.
Custom APT repositories on the host are forbidden (
§2.2); every non-standard binary (kubectl,clusterctl,k3s) is fetched into/opt/capi-lab/binby thebinary_fetchrole, version-pinned and checksum-verified. - No host-level Kubernetes. No Docker, no kind, no host kubelet.
The host runs LXD, the
br-ext6Linux bridge, and nothing else Kubernetes-related.
LXD substrate¶
- Distribution channel: snap, channel
6/stable(lxd_host_snap_channel). - Why snap, not deb: Canonical's official recommendation for
installing LXD on Linux, including Debian, is the snap; the snap
channel is the documented version-pinning mechanism (
§2.1). - Why
6/stable, not5.21/stableLTS:§2.11reads "latest stable" literally, which selects the feature-stable track. The§8adeviation note records this trade-off explicitly: regression risk is higher and CAPN has not declared explicit compatibility with LXD 6.x; if a Gate-blocking incompatibility surfaces, the fallback is5.21/stable, recorded back in the plan change log. - Auto-refresh: held via
snap refresh --holdby thelxd_hostrole to prevent unattended snap auto-updates from rolling LXD forward mid-deploy (§2.2,§12.5). - Project isolation: the entire lab lives inside one LXD project
(
capi-lab) so it does not collide with hand-managed LXC instances on the same host (§2.3). CAPN's API access is scoped by a project-restricted TLS certificate.
Cluster API stack¶
| Piece | Version | Source |
|---|---|---|
clusterctl (binary) |
v1.12.5 |
upstream kubernetes-sigs/cluster-api releases |
| CAPI core / CABPK / KCP controllers | v1.12.5 |
installed by clusterctl init from its built-in registry |
| CAPN infrastructure provider | v0.8.5 |
lxc/cluster-api-provider-incus |
- The
clusterctlconfig used by thebootstrap_clusterctlrole is rendered fromansible/roles/bootstrap_clusterctl/templates/clusterctl.yaml.j2and declares only the CAPN entry; the core, kubeadm-bootstrap, and kubeadm-control-plane providers come fromclusterctl's built-in registry. - Upstream docs:
- Cluster API: https://cluster-api.sigs.k8s.io/
- CAPN: https://capn.linuxcontainers.org/
- Image constraint (
§8a).k8s_lab_kubernetes_versionis bounded by the set of prebuiltcapi:kubeadm/<ver>images published on the upstream CAPN simplestreams (https://images.linuxcontainers.org/capn/). The server only mints images for selected releases (typically<minor>.0plus rare patches). Settingkubernetes.versionto a version with no image fails at the first LXCMachine creation withFailed getting image: The requested image couldn't be found for fingerprint "kubeadm/<ver>". As of the§8averification (2026-04-25), the available pins arev1.33.0,v1.33.5,v1.34.0,v1.35.0;v1.35.0is the latest relevant pin. Upstreamdl.k8s.io/release/stable.txtmay be ahead, but those releases are irrelevant to this repository until CAPN publishes a matching image.
Bootstrap cluster runtime¶
- Implementation: k3s
v1.35.3+k3s1, single binary, single node, inside the transientcapi-bootstrap-0LXC instance. - Why k3s, not kind or kubeadm:
kindrequires Docker on the host, which violates§2.2(no custom binaries on the host beyond the LXD snap and the fetched/opt/capi-lab/bintoolchain) and is also inconsistent with the "all Kubernetes nodes are LXC containers" model.kubeadmfor a single throwaway node is far heavier than a single-binary k3s server, both in disk and in cold-start time.- k3s is a fully compliant lightweight Kubernetes distribution
shipped as a single binary;
k3s serverexposes the--tls-san,--disable=servicelb,--disable=traefikflags and a config file that map cleanly ontobootstrap_k3sdefaults (§2.4). - Lifetime: transient. Lives only between
bootstrap_clusterctlandcleanup_bootstrap. Helm releases on bootstrap k3s are not migrated byclusterctl move— only CAPI CRs are — which is why no workload Cluster CR is ever created on bootstrap (02-architecture.md §3.3).
Workload cluster runtime¶
- Bootstrapper: kubeadm, driven through CAPI's Kubeadm Bootstrap Provider (CABPK) and Kubeadm Control Plane Provider (KCP), with CAPN as the infrastructure backend.
- Container image: the prebuilt CAPN
capi:kubeadm/v1.35.0unprivileged-kubeadm LXC image from the upstream simplestreams. These images are built specifically for the unprivileged container path and are required from CAPNv1.32.4onward (§2.8,§12.2). - LXC mode: unprivileged only. A substrate invariant. The
privileged path is closed by design and will not be added as an
opt-in (
§2.8). If a feature does not work in unprivileged LXC, the fix is to change the CNI, narrow scope, or move to VM-based nodes outside this repo's scope. - CAPN profile: the project ships its own
lxd_profiles(capi-controlplaneandcapi-worker), built on the CAPN Canonical LXD unprivileged kubeadm baseline:linux.kernel_modules,security.nesting=true,security.idmap.isolated=truewhere applicable, host/bootbind-mount to/usr/lib/ostree-boot, andsnapd/apparmorsystemd units disabled inside the guest (§2.8).
CNI¶
- Choice: Calico, delivered by the upstream
tigera-operatorHelm chart wrapped bycharts/cni-calico/. - Versions: chart
v3.31.5,appVersion v3.31.5(subchart pinned in lockstep — seecharts/cni-calico/Chart.yaml). - Topology: dual-stack IPv4 + IPv6,
natOutgoing: Enabledfor both families. The IPv6 SNAT is the canonical case of the bootstrap → self-hosted network-surface asymmetry documented in02-architecture.md §3.4: invisible pre-pivot, required post-pivot. - Why an upstream wrapper, not direct chart install: the wrapper
disables the operator's default
InstallationCR (installation. enabled=false) and ships its own with all substrate-required fields hardcoded (per the memory rule "chart-required values are hardcoded"). The upstream chart's optional knobs that are still legitimately tunable are exposed viavalues.yamlof the wrapper. - Acceptance: Gate B (
§6,02-architecture.md §8.1) ships as a chart-sidehelm.sh/hook: testPod inside the same chart, so a CNI bring-up failure failsterraform applyimmediately. - Why not Cilium or kube-router: out of scope for Stage 1.
Multi-CNI runtime toggles are a non-goal (
01-overview.md); a CNI swap is a deliberate design step (a new wrapper chart), not a feature flag.
Load balancing¶
- Choice: MetalLB in L2 mode, IPv6 VIP only, announced on
worker
eth1(br-ext6). - Versions: chart
v0.15.3,appVersion v0.15.3(subchart pinned in lockstep — seecharts/metallb/Chart.yaml). - Delivery split:
charts/metallb/— wrapper around the upstream chart, installs CRDs + controller + speaker, with substrate-required toggles hardcoded (crds.enabled=true,frrk8s.enabled=false,speaker.frr.enabled=false).charts/metallb-config/— owns theIPAddressPoolandL2AdvertisementCRs and the Gate A acceptance hook. Installed as the second Helm release in the pair so CRDs are already registered when its CRs are applied.- Why L2, not BGP: there is no BGP infrastructure on a single
bare-metal host. L2 mode requires only that the upstream segment
carries NDP correctly, which Gate A asserts. BGP-based Pod IPv6
routing is in the Stage 2 backlog (
01-overview.md) and not part of this v1.0 contract. - L2 advertisement contract (
§5.5): IPAddressPoolis sourced from the operator-supplied external IPv6 range;L2Advertisement.spec.interfaces: [eth1];L2Advertisement.spec.nodeSelectorsselects nodes that actually have the external NIC.interfacesalone does not affect leader election in L2 mode, so the node selector is mandatory.- Acceptance: Gate A (
§6,02-architecture.md §8.2) is a dual: a chart-sidehelm.sh/hook: testPod plus a verify-side external curl from the Vagrant VM (or a probe in production). A broken L2 segment failsterraform applybefore MetalLB starts pretending to serve VIPs (§12.1).
Helm provider¶
- Provider:
hashicorp/helm,~> 3.1.1. - Why Terraform-driven Helm, not raw
helmcalls: 02-architecture.md §5fixes Helm as the only mutation channel for Kubernetes objects. Terraform'shelm_releaseis the declarative front-end that lets every release be planned, diffed, and destroyed byterraform plan/terraform destroyrather than by an ad-hoc shell script.null_resource+local-exec helm testinside the same module binds Gate A and Gate B helm tests to the lifecycle of the release that publishes them, so a Gate failure fails the apply in one tool (§6).- Why version 3.x: the 3.x line of the provider supports the
modern Helm v3 storage format (
sh.helm.release.v1.<release>.<n>Secrets) and is the current upstream feature-stable track per§2.11.
Ansible collections¶
Pinned in ansible/requirements.yml:
collections:
- name: ansible.posix
version: ">=2.1.0"
- name: community.general
version: ">=12.6.0"
- name: community.crypto
version: ">=3.2.0"
- name: kubernetes.core
version: ">=6.4.0"
| Collection | Used by | Purpose |
|---|---|---|
ansible.posix |
shared role plumbing | sysctl, mount, firewalld helpers from POSIX-shaped tasks. |
community.general |
lxd_* roles, base_system |
lxd_* modules, snap module, miscellaneous host tooling. |
community.crypto |
bootstrap_capn_secret, lxd_host |
Generates the CAPN restricted TLS certificate / key. |
kubernetes.core |
bootstrap_clusterctl, pivot_clusterctl_move |
k8s_info polling of CAPI CRs and Deployments while waiting for clusterctl init and clusterctl move to settle. The CAPN identity Secret is the single deliberate kubernetes.core.k8s create-side use, gated by 02-architecture.md §5.1. |
Bounds policy (§2.11). Every entry is a lower bound only —
no <X+1 ceiling — so the next major release is not pre-emptively
locked out. Bumps are driven from upstream releases
(GET /repos/<owner>/<repo>/releases/latest) and recorded inline in
the plan plus aggregated in §8a.
Executor-side dependency. kubernetes.core requires the
kubernetes PyPI package on the executor node. On Debian Trixie this
is supplied as the system package python3-kubernetes (version
30.1.0-2), installed by
tests/molecule/shared/tasks/prepare.yml so test scenarios get it
for free. A consumer repo deploying to a real host installs the
same package via its host-bootstrap role.
Local harness¶
The local harness is the only end-to-end driver shipped in this repository.
- Hypervisor: Vagrant + libvirt (the
vagrant-libvirtplugin), one Debian 13 VM, defined undertests/vagrant/debian13/. - Test driver: Molecule, in delegated mode — the
developer supplies
create/destroy; Molecule runsprepare,converge,idempotence,verify. Delegated is Molecule's default driver and is the documented place for custom-driven scenarios (§9.1). - Make targets: the local harness is driven exclusively through
make, never rawvagrant/virsh/moleculeinvocations:make -C tests/vagrant/debian13 up,make -C tests/molecule <scenario>-delegated-{create,converge,verify,destroy},make test-local-e2e. - External RA source: an in-VM
radvdlistening on a veth peer (ext6-ra-peer) attached tobr-ext6announces2001:db8:42:100::/64. Delivered bytests/molecule/shared/tasks/ext6-ra-source.ymland applied in every scenario'sprepare. The node-side RA reception baseline is identical between local and prod; only the RA source differs (02-architecture.md §4.5). - Image policy: every consumer image must be cloud-init-capable
because the RA reception baseline lands as cloud-init
write_fileson first boot (§2.10,02-architecture.md §4.4).
Verification cadence¶
§2.11 and §2.11b fix how pins are bumped:
- At every bump, verify upstream first. Stale defaults from the
model's memory or from old documentation do not count. The agent
must consult the upstream source (
GitHub releases/latest,snap info, registry index) and record the verified value. - Pin to the current upstream feature-stable. If vendor official
guidance contradicts "the most recent upstream stable" (the LXD
5.21/stableLTS vs6/stablecase), the bump still takes the most recent feature-stable and records the trade-off in the§8adeviation section. The only exception is when upstream has only prereleases — then the previous stable is allowed temporarily, with an explicit§8anote and a follow-up to upgrade at the first stable release. - No artificial upper bound. Lower bounds only in
requirements.yml, provider blocks, and chart references. Any>=X,<X+1form is treated as a stale pin and removed at the moment of the bump. - Inline + table. Every bump updates two places: the inline
pin comment in
§8of the plan, and the§8aaggregated table. If the inline date diverges from the table, the inline pin is the truth and the table is regenerated at the next review. - No standalone progress / changelog file. The plan files
themselves carry the bump history; a separate
PLAN-*-progress.mdorCHANGELOG.mdis forbidden by§2.11b.
The CI/lint stage (when it lands) is expected to flag stale pins against upstream automatically.
Where to read more¶
| Stack question | Source |
|---|---|
| What version of X is pinned today? | §8a of ../plans/PLAN-stage1-common.md. |
| Why is the bump policy this strict? | §2.11 of the plan. |
| Why this CNI / LB / runtime, not another? | §2.8, §5.5, §12.1, §12.2 of the plan. |
| What does each role / chart actually do? | 09-roles-reference.md, 10-modules-and-charts.md. |
| What is the deployment workflow? | 07-deployment-guide.md. |