Skip to content

04 — Repository structure

For developers about to edit code. Maps every shipped directory to its purpose, names the runtime-only artefacts, and fixes the naming and variable-prefix rules every new file must obey.

The architectural why lives in plan §7 (Repository), §2.5 (Repository boundary) and §2.6 (Ansible role contract).


Top-level tree

What is shipped in the repo (what you git clone):

k8s-lab/
├── Makefile                     # the only entry point for local workflows (§7, §10)
├── README.md                    # repo-level pointer to doc/
├── ansible/                     # 14 roles + collection requirements + ansible.cfg
├── charts/                      # 5 Helm charts — every K8s object lives here
├── clusterctl/                  # reserved; runtime clusterctl.yaml is rendered by role templates
├── doc/                         # this documentation set
├── plans/                       # PLAN-stage1-*.md (English) — source of truth for "why"
├── PLAN-stage1-*.md             # Russian originals at repo root
├── scripts/                     # local-harness Python + shell helpers
├── terraform/                   # exactly ONE module: workload_cluster (§16.4)
├── tests/                       # Molecule + Vagrant + libvirt harness, fixtures
├── .ansible-lint                # role lint config
├── .yamllint                    # repo-wide YAML lint config
└── .gitignore

What appears only at runtime (gitignored, never committed):

k8s-lab/
├── .artifacts/                  # mgmt.kubeconfig, mgmt.auto.tfvars.json, clusters/* (mode 0600)
├── ansible/collections/         # ansible-galaxy collection install -p
├── tests/vagrant/debian13/.vagrant/      # Vagrant state
├── tests/molecule/<scenario>/.molecule/  # Molecule scenario state
└── tests/fixtures/terraform/**/.terraform/ + tfstate*  # local TF state

The runtime split is deliberate: the repo carries no environment data, no real kubeconfigs, no concrete tfvars. See plan §2.5 for the boundary contract.


ansible/

ansible/
├── ansible.cfg          # roles_path, collections_path, ssh_args, pipelining
├── requirements.yml     # collection deps with lower-bound pins (§2.11)
├── collections/         # gitignored — populated by `make deps`
└── roles/               # 14 roles, snake_case directory names — see table below

The 14 roles in canonical-flow order (full details in 09-roles-reference.md):

Role One-line purpose
base_system Host packages, kernel modules, sysctls, /opt/capi-lab root, btrfs mount contract.
binary_fetch Download + checksum-verify pinned kubectl, clusterctl, and k3s into /opt/capi-lab/bin.
lxd_host Install LXD via snap, pin channel, refresh policy, create the br-ext6 host bridge.
lxd_project Reconcile the capi-lab LXD project.
lxd_storage_pools btrfs pool capi-fast on a dedicated block device.
lxd_network_int_managed LXD-managed capi-int bridge (internal dual-stack plane).
lxd_profiles CAPN unprivileged-kubeadm profiles capi-controlplane / capi-worker.
lxd_bootstrap_instance Transient capi-bootstrap-0 LXC + LXD proxy device for runner reach.
bootstrap_k3s Single-node k3s server inside the bootstrap LXC.
bootstrap_clusterctl clusterctl init on bootstrap k3s — CAPI/CABPK/KCP/CAPN.
bootstrap_capn_secret CAPN identity Secret (incus-identity).
export_artifacts Emit .artifacts/mgmt.kubeconfig + mgmt.auto.tfvars.json.
pivot_clusterctl_move clusterctl init on mgmt-1 + clusterctl move from bootstrap.
cleanup_bootstrap Destroy capi-bootstrap-0.

Role contract (plan §2.6): tasks/main.yml is dispatcher-only (include_tasks per topic); defaults/main.yml carries the role's public vars prefixed with the snake_case directory name; meta/main.yml declares dependencies: (no ordering via prepare.yml or pre_tasks); each role has its own README.md. Test harness: tests/molecule/<role-in-kebab-case>/.


terraform/

terraform/
└── modules/
    └── workload_cluster/        # the ONLY TF module shipped here (§16.4)
        ├── main.tf              # 5 helm_release blocks + null_resource helm-tests
        ├── locals.tf
        ├── variables.tf
        ├── outputs.tf
        ├── providers.tf         # hashicorp/helm + hashicorp/kubernetes (read-side)
        └── versions.tf

No environment-specific TF roots exist in this repo — terraform/environments/, terraform/live/, prod/, dev/ are absent by design. The single module is consumed by:

  1. tests/fixtures/terraform/workload-clusters/lab-default/ (test consumer, the only TF root in this repo);
  2. private consumer repos for real sites, which wire their own tfvars, backends, and credentials.

Boundary rule: plan §2.5. Module inputs/outputs: 10-modules-and-charts.md.


charts/

charts/
├── capi-cluster-class/          # ClusterClass + Kubeadm*/LXC* templates (§16.2)
├── capi-workload-cluster/       # Cluster CR instance (§16.3)
├── cni-calico/                  # wrapper over projectcalico/tigera-operator + Gate B (§17.2)
├── metallb/                     # subchart wrapper over upstream metallb/metallb (§17.3)
└── metallb-config/              # IPAddressPool + L2Advertisement + Gate A (§17.3)

Per chart (schemas in 10-modules-and-charts.md):

Chart One-line purpose
capi-cluster-class Topology template — ClusterClass, KubeadmControlPlaneTemplate, KubeadmConfigTemplate, LXC*MachineTemplate. Chart-version-as-CR-name pattern lives here.
capi-workload-cluster The Cluster CR that references a ClusterClass — one release per workload.
cni-calico Wraps tigera-operator as a subchart, sets Installation values, ships Gate B (CNI viability) helm-test Pod.
metallb Thin wrapper over upstream metallb/metallb (CRDs + controller + speaker).
metallb-config IPAddressPool + L2Advertisement bound to eth1, plus Gate A (external L2) helm-test Pod.

Helm-first delivery is a closed contract — plan §2.9. No raw YAML lives outside charts/; Ansible never creates Kubernetes objects via kubernetes.core.k8s state=present; Terraform never uses kubernetes_manifest.


tests/

tests/
├── molecule/
│   ├── Makefile                     # scenario pattern rules (`<scenario>-delegated-*`)
│   ├── shared/
│   │   ├── converge.yml             # universal — reads MOLECULE_SCENARIO_NAME
│   │   ├── verify.yml
│   │   ├── inventory/group_vars/k8slab_host.yml   # the ONE substrate group_vars file (§9.5)
│   │   └── tasks/                   # prepare.yml, prepare-btrfs-pool.yml,
│   │                                # prepare-clean-disk.yml, ext6-ra-source.yml (in-VM radvd),
│   │                                # wait-services.yml
│   ├── base-system/                 # one scenario per role — kebab-case
│   ├── binary-fetch/  lxd-host/  lxd-project/  lxd-storage-pools/
│   ├── lxd-network-int-managed/  lxd-profiles/  lxd-bootstrap-instance/
│   ├── bootstrap-k3s/  bootstrap-clusterctl/  bootstrap-capn-secret/
│   ├── export-artifacts/  cleanup-bootstrap/
│   └── e2e-local/                   # full canonical flow (§3.1) in one scenario
├── vagrant/debian13/
│   ├── Vagrantfile                  # libvirt provider; the shared local VM
│   ├── inventory.py                 # ad-hoc inventory generator for the running VM
│   ├── Makefile                     # `up`, `destroy`, `ssh`
│   └── libvirt-networks/            # mgmt-nat.xml, probe-ext6.xml (latter dormant)
└── fixtures/terraform/workload-clusters/lab-default/   # the only TF root in this repo (§16.5)
    └── main.tf, variables.tf, providers.tf, outputs.tf

Three contracts make this layout work:

  1. Scenario name == role directory name (snake-to-kebab). tests/molecule/base-system/ targets ansible/roles/base_system/; shared/converge.yml reads MOLECULE_SCENARIO_NAME to pick the role. Plan §9.5.
  2. One substrate group_vars file. shared/inventory/group_vars/k8slab_host.yml holds every prod-like substrate value — uplink, storage pool spec, wait budgets, the LXD proxy device. A scenario adds <scenario>/host_vars/k8slab-host.yml only for a true override. Plan §9.5.1.
  3. tests/fixtures/ carries no implementation — only invokes reusable roles/modules/charts with synthetic inputs. Plan §2.7.

Workflow: 12-testing.md. Entry points: top-level Makefile.


scripts/

scripts/
├── _harness.py                  # shared helpers: REPO_ROOT, vagrant ssh-config, run_make
├── molecule_run.py              # Molecule wrapper — brings up shared VM, exports K8SLAB_HOST_*
├── render_kubeconfig.py         # rewrite kubeconfig server URL into .artifacts/clusters/<name>.kubeconfig
├── export_bootstrap_facts.py    # emit .auto.tfvars.json from bootstrap cluster facts (§11.1)
└── wait_for_cluster.sh          # poll a kubeconfig until kube-apiserver returns Ready
  • _harness.py — shared helpers (REPO_ROOT, VAGRANT_DIR, PLATFORM_HOST_NAME = "k8slab-host", wrappers around vagrant ssh-config and make). Never invoked directly.
  • molecule_run.py — invoked by tests/molecule/Makefile pattern rules; brings up the shared VM, exports K8SLAB_HOST_*, then execs molecule <action> -s <scenario>.
  • render_kubeconfig.py — rewrite kubeconfig server URL into .artifacts/clusters/<cluster>.kubeconfig (mode 0600).
  • export_bootstrap_facts.py — emit .auto.tfvars.json from bootstrap facts. Plan §11.1.
  • wait_for_cluster.shkubectl get --raw=/readyz poll with a deadline.

Local-harness only; consumer repos do not need these.


clusterctl/

clusterctl/              # currently empty / reserved

bootstrap_clusterctl renders the pinned runtime clusterctl.yaml from ansible/roles/bootstrap_clusterctl/templates/clusterctl.yaml.j2 into /opt/capi-lab/etc/bootstrap_clusterctl/clusterctl.yaml. Versions track the §8 / §8a verified-version log in the plan.


.artifacts/

Runtime-only directory — gitignored except .gitkeep. Plan §11.1.

.artifacts/                      # mode 0700
├── .gitkeep                     # tracked
├── mgmt.kubeconfig              # mode 0600 — active management cluster kubeconfig
├── mgmt.auto.tfvars.json        # mode 0600 — handoff bundle for §16.5 TF fixture
├── harness-vm-id                # current Vagrant VM id (cascade-clean tracking)
└── clusters/<name>.kubeconfig   # mode 0600 — per-workload debug kubeconfigs

Contract:

  • File mode 0600, directory mode 0700, owner = runner user.
  • mgmt.kubeconfig is the same file through the whole canonical flow (§3.1): first points at bootstrap k3s, then export_artifacts rewrites it in place after pivot to point at mgmt-1.
  • mgmt.auto.tfvars.json is the JSON handoff from Ansible to Terraform (consumed via -var-file=).
  • clusters/<name>.kubeconfig is produced by make workload-kubeconfig from the kubeconfig TF output.

Cleanup: make clean-mgmt-bundle, make clean-workload-kubeconfig, make clean-local.


What is NOT in this repo

Plan §2.5 is enforced by omission. You will not find:

Not here Lives in
inventories/prod/, inventories/local/ with real hosts A private consumer repo.
host_vars/<real-fqdn>.yml with real IPs / ULA prefixes A private consumer repo.
Plaintext or vault secrets, real LXD trust certificates A private secrets store mounted into the consumer repo at runtime.
terraform/environments/<site>/ root modules A private consumer repo, importing terraform/modules/workload_cluster/.
*.tfvars / *.tfvars.json for real sites A private consumer repo (gitignored here by design).
make deploy TARGET=..., make destroy TARGET=... The consumer repo's own Makefile.
*.deploy.yml / *.production.yml playbooks A private consumer repo.

Consumer-repo pattern: 07-deployment-guide.md.


Naming conventions

Mixing them silently is forbidden — a target like base_system-delegated-test (snake + kebab in one name) is rejected at review.

Ansible roles

  • Role directory in ansible/roles/ is snake_case: base_system/, lxd_storage_pools/, bootstrap_clusterctl/.
  • Role display-name in task / handler names is kebab-case: base-system, lxd-storage-pools.
  • Task names follow <role> | <section> | <action> with the role part in kebab-case: base-system | preflight | assert opt root present.
  • Handler names follow <role> | handlers | <action>: lxd-host | handlers | restart snap.lxd.daemon.

The asymmetry is deliberate: Ansible variable names must match the directory (snake_case-only); display names are kebab-case for parity with Make targets and scenario directories. Plan §2.6.3.

Molecule scenarios + Make targets

  • Scenario directory under tests/molecule/<name>/ is kebab-case: tests/molecule/base-system/, tests/molecule/lxd-storage-pools/.
  • Scenario name == role directory name with _-. base_system (role) → base-system (scenario).
  • Make targets are kebab-case end-to-end: make -C tests/molecule base-system-delegated-test, make -C tests/molecule e2e-local-vagrant-converge.

The reverse mapping (scenario → role) is read at runtime from MOLECULE_SCENARIO_NAME by tests/molecule/shared/converge.yml. Plan §2.6.3, §9.5.2.

Helm charts

  • Chart directory in charts/ is kebab-case: charts/capi-cluster-class/, charts/cni-calico/.
  • Chart name in Chart.yaml matches the directory exactly.
  • Chart-version-as-CR-name pattern (plan §2.9, §12.10) appends Chart.Version with .-: capi-cluster-class-0-3-0.

Variable prefix rules

Ansible variable hygiene is a hard contract. Three categories; every variable must fall into exactly one.

Project globals: k8s_lab_*

Every project-wide variable carries the k8s_lab_ prefix. These are the stable inter-role interface — anything more than one role reads is a global by definition.

Examples (full list in 08-configuration-reference.md; schema in plan §8):

k8s_lab_opt_root: "/opt/capi-lab"
k8s_lab_project_name: "capi-lab"
k8s_lab_uplink_interface: "eth0"
k8s_lab_external_bridge_name: "br-ext6"
k8s_lab_internal_network_name: "capi-int"
k8s_lab_external_ipv6_prefix: "..."
k8s_lab_metallb_vip_range_v6: "..."
k8s_lab_lxd_host_address: "..."

The _section_ fragment (storage, internal, external, images) is part of the flat variable name, not a YAML namespace: k8s_lab_storage_pool_name is one identifier, not k8s_lab.storage.pool_name.

Naked globals like opt_root, enabled, api_publish_port are banned — plan §2.6.5. They collide silently with vars inherited from wider inventory, make grep-by-name useless, and hide ownership.

Role public vars: <role_name>_*

Every variable a role exposes in defaults/main.yml carries the role's snake_case directory name as a prefix:

# ansible/roles/base_system/defaults/main.yml
base_system_enabled: true
base_system_btrfs_pool_required: true
base_system_extra_kernel_modules: [wireguard]

# ansible/roles/lxd_host/defaults/main.yml
lxd_host_snap_channel: "6/stable"
lxd_host_snap_refresh_mode: "hold"

Only the role itself reads these. A role MUST NOT read variables with another role's <other_role>_* prefix — that creates coupling invisible to either role's contract. Cross-role communication goes through k8s_lab_* globals or set_fact (next section). Plan §2.6.2, §2.6.5. Booleans are affirmatively named: <role>_enabled, <role>_flow_control_*. do_* / with_* / run_* are banned.

Role private vars: _<role_name>_*

Internal facts, derived values, registers, and helper vars carry a leading underscore plus the role prefix:

# ansible/roles/lxd_storage_pools/tasks/pools.yml
- ansible.builtin.uri: ...
  register: _lxd_storage_pools_pools_query_register

- ansible.builtin.set_fact:
    _lxd_storage_pools_pools_existing: "{{ ... }}"

Patterns: facts _<role>_<section>_<fact>; registers _<role>_<section>_<purpose>_register. The leading underscore says "internal — do not depend on this from outside the role". Faceless register names like result, out, tmp are forbidden. Plan §2.6.2, §2.6.3.


Cross-role communication

A role needs a value computed by another. Exactly two supported mechanisms (plan §2.6.5):

  1. Project global with the k8s_lab_ prefix. For stable values in the substrate contract (paths, network names, addresses). Documented in plan §8 and 08-configuration-reference.md; sourced from inventory group_vars (production) or from tests/molecule/shared/inventory/group_vars/k8slab_host.yml (test harness).
  2. Runtime fact via set_fact named _<role>_<section>_<fact>, for values that only exist after a role has run. The producing role must be in the consumer's meta/main.yml dependencies:.

Anything else — naked globals, reading another role's <other_role>_* defaults, arranging order via pre_tasks / import_role to substitute meta-deps — is forbidden by plan §2.6.5. Violations create hidden dependencies that pass one Molecule scenario and break another.


The plan says why; this chapter says how it looks once you follow it. Mirror the patterns above when adding new code, and grep the existing layout for the closest analogue.