Files
netbird-iac/PAIN_POINTS.md
2026-02-15 18:37:15 +02:00

6.8 KiB

NetBird GitOps - Remaining Pain Points

This document captures challenges discovered during the POC that need resolution before production use.

Context

Use case: ~100+ operators, each with 2 devices (BlastPilot + BlastGS-Agent)
Workflow: Ticket-based onboarding, engineer creates PR, merge triggers setup key creation
Current pain: Manual setup key creation and peer renaming in dashboard


Pain Point 1: Peer Naming After Enrollment

Problem

When a peer enrolls using a setup key, it appears in the NetBird dashboard with its hostname (e.g., DESKTOP-ABC123 or raspberrypi). These hostnames are:

  • Often generic and meaningless
  • Not controllable via IaC (peer generates its own keypair locally)
  • Confusing when managing 100+ devices

Desired state: Peer appears as pilot-ivanov or gs-unit-042 immediately after enrollment.

Root Cause

NetBird's architecture requires peers to self-enroll:

  1. Setup key defines which groups the peer joins
  2. Peer runs netbird up --setup-key <key>
  3. Peer generates WireGuard keypair locally
  4. Peer registers with management server using its local hostname
  5. No API link between "which setup key was used" and "which peer enrolled"

Options

Option Description Effort Tradeoffs
A. Manual rename Engineer renames peer in dashboard after enrollment Zero 30 seconds per device, human in loop
B. Polling service Service watches for new peers, matches by timing/IP, renames Medium More infrastructure, heuristic matching
C. Per-user tracking groups Unique group per user, find peer by group membership High Group sprawl, cleanup needed
D. Installer modification Modify BlastPilot/BlastGS-Agent to set hostname before enrollment N/A Code freeze constraint

Recommendation

Option A is acceptable for ~100 operators with ticket-based workflow:

  • Ticket arrives -> engineer creates PR -> PR merges -> engineer sends setup key -> operator enrolls -> engineer renames peer (30 sec)
  • Total engineer time per onboarding: ~5 minutes
  • No additional infrastructure

Option B worth considering if:

  • Onboarding volume increases significantly
  • Full automation is required (no human in loop)

Pain Point 2: Per-User vs Per-Role Setup Keys

Current State

Setup keys are defined per-role in terraform/setup_keys.tf:

resource "netbird_setup_key" "gs_onboarding" {
  name        = "ground-station-onboarding"
  type        = "reusable"
  auto_groups = [netbird_group.ground_stations.id]
  ...
}

This means:

  • One reusable key per role
  • Key is shared across all operators of that role
  • No way to track "this key was issued to Ivanov"

Problems

  1. No audit trail - Can't answer "who enrolled device X?"
  2. Revocation is all-or-nothing - Revoking pilot-onboarding affects everyone
  3. No usage attribution - Can't enforce "one device per operator"

Options

Option Description Effort Tradeoffs
A. Accept per-role keys Current state, manual tracking in ticket system Zero No IaC-level audit trail
B. Per-user setup keys Create key per onboarding request Low More keys to manage, cleanup needed
C. One-off keys Each key has usage_limit = 1 Low Key destroyed after use, good for audit

Recommendation

Option C (one-off keys) provides the best tradeoff:

  • Create unique key per onboarding ticket
  • Key auto-expires after first use
  • Clear audit trail: key name links to ticket number
  • Easy to implement:
# Example: ticket-based one-off key
resource "netbird_setup_key" "ticket_1234_pilot" {
  name        = "ticket-1234-pilot-ivanov"
  type        = "one-off"
  auto_groups = [netbird_group.pilots.id]
  usage_limit = 1
  ephemeral   = false
}

Workflow:

  1. Ticket ACHILLES-1234: "Onboard pilot Ivanov"
  2. Engineer adds setup key ticket-1234-pilot-ivanov to Terraform
  3. PR merged, key created
  4. Engineer sends key to operator (see Pain Point 3)
  5. Operator uses key, it's consumed
  6. After enrollment, engineer renames peer to pilot-ivanov

Pain Point 3: Secure Key Distribution

Problem

After CI/CD creates a setup key, how does it reach the operator?

Setup keys are sensitive:

  • Anyone with the key can enroll a device into the network
  • Keys may be reusable (depends on configuration)
  • Keys should be transmitted securely

Current State

Setup keys are output by Terraform:

terraform output -raw gs_setup_key

But:

  • Requires local Terraform access
  • No automated distribution mechanism
  • Keys in state file (committed to git in POC - not ideal)

Options

Option Description Effort Tradeoffs
A. Manual retrieval Engineer runs terraform output locally Zero Requires CLI access, manual process
B. CI output to ticket CI posts key to ticket system via API Medium Keys in ticket history (audit trail)
C. Secrets manager Store keys in Vault/1Password, notify engineer Medium Another system to integrate
D. Encrypted email CI encrypts key, emails to operator High Key management complexity

Recommendation

Option A for now (consistent with manual rename):

  • Engineer retrieves key after CI completes
  • Engineer sends key to operator via secure channel (Signal, encrypted email)
  • Ticket updated with "key sent" status

Option B worth implementing if:

  • Volume increases
  • Want full automation
  • Ticket system has secure "hidden fields" feature

Given the constraints (code freeze, ~100 operators, ticket-based), the pragmatic workflow is:

1. Ticket created: "Onboard pilot Ivanov with BlastPilot + GS"

2. Engineer adds to Terraform:
   - ticket-1234-pilot (one-off, 7 days)
   - ticket-1234-gs (one-off, 7 days)

3. Engineer creates PR, gets review, merges

4. CI/CD applies changes, keys created

5. Engineer retrieves keys:
   terraform output -raw ticket_1234_pilot_key

6. Engineer sends keys to operator via secure channel

7. Operator enrolls both devices

8. Engineer renames peers in dashboard:
   DESKTOP-ABC123 -> pilot-ivanov
   raspberrypi -> gs-ivanov

9. Engineer closes ticket

Total engineer time: ~10 minutes per onboarding (pair of devices)
Automation level: Groups, policies, key creation automated; naming and distribution manual


Future Improvements (If Needed)

  1. Webhook listener for peer enrollment events -> auto-rename based on timing correlation
  2. Ticket system integration for automated key distribution
  3. Custom installer that prompts for device name before enrollment
  4. Batch onboarding tool for multiple operators at once

These can be addressed incrementally as the operation scales.