6.8 KiB
NetBird GitOps - Remaining Pain Points
This document captures challenges discovered during the POC that need resolution before production use.
Context
Use case: ~100+ operators, each with 2 devices (BlastPilot + BlastGS-Agent)
Workflow: Ticket-based onboarding, engineer creates PR, merge triggers setup key creation
Current pain: Manual setup key creation and peer renaming in dashboard
Pain Point 1: Peer Naming After Enrollment
Problem
When a peer enrolls using a setup key, it appears in the NetBird dashboard with its hostname (e.g., DESKTOP-ABC123 or raspberrypi). These hostnames are:
- Often generic and meaningless
- Not controllable via IaC (peer generates its own keypair locally)
- Confusing when managing 100+ devices
Desired state: Peer appears as pilot-ivanov or gs-unit-042 immediately after enrollment.
Root Cause
NetBird's architecture requires peers to self-enroll:
- Setup key defines which groups the peer joins
- Peer runs
netbird up --setup-key <key> - Peer generates WireGuard keypair locally
- Peer registers with management server using its local hostname
- No API link between "which setup key was used" and "which peer enrolled"
Options
| Option | Description | Effort | Tradeoffs |
|---|---|---|---|
| A. Manual rename | Engineer renames peer in dashboard after enrollment | Zero | 30 seconds per device, human in loop |
| B. Polling service | Service watches for new peers, matches by timing/IP, renames | Medium | More infrastructure, heuristic matching |
| C. Per-user tracking groups | Unique group per user, find peer by group membership | High | Group sprawl, cleanup needed |
| D. Installer modification | Modify BlastPilot/BlastGS-Agent to set hostname before enrollment | N/A | Code freeze constraint |
Recommendation
Option A is acceptable for ~100 operators with ticket-based workflow:
- Ticket arrives -> engineer creates PR -> PR merges -> engineer sends setup key -> operator enrolls -> engineer renames peer (30 sec)
- Total engineer time per onboarding: ~5 minutes
- No additional infrastructure
Option B worth considering if:
- Onboarding volume increases significantly
- Full automation is required (no human in loop)
Pain Point 2: Per-User vs Per-Role Setup Keys
Current State
Setup keys are defined per-role in terraform/setup_keys.tf:
resource "netbird_setup_key" "gs_onboarding" {
name = "ground-station-onboarding"
type = "reusable"
auto_groups = [netbird_group.ground_stations.id]
...
}
This means:
- One reusable key per role
- Key is shared across all operators of that role
- No way to track "this key was issued to Ivanov"
Problems
- No audit trail - Can't answer "who enrolled device X?"
- Revocation is all-or-nothing - Revoking
pilot-onboardingaffects everyone - No usage attribution - Can't enforce "one device per operator"
Options
| Option | Description | Effort | Tradeoffs |
|---|---|---|---|
| A. Accept per-role keys | Current state, manual tracking in ticket system | Zero | No IaC-level audit trail |
| B. Per-user setup keys | Create key per onboarding request | Low | More keys to manage, cleanup needed |
| C. One-off keys | Each key has usage_limit = 1 |
Low | Key destroyed after use, good for audit |
Recommendation
Option C (one-off keys) provides the best tradeoff:
- Create unique key per onboarding ticket
- Key auto-expires after first use
- Clear audit trail: key name links to ticket number
- Easy to implement:
# Example: ticket-based one-off key
resource "netbird_setup_key" "ticket_1234_pilot" {
name = "ticket-1234-pilot-ivanov"
type = "one-off"
auto_groups = [netbird_group.pilots.id]
usage_limit = 1
ephemeral = false
}
Workflow:
- Ticket ACHILLES-1234: "Onboard pilot Ivanov"
- Engineer adds setup key
ticket-1234-pilot-ivanovto Terraform - PR merged, key created
- Engineer sends key to operator (see Pain Point 3)
- Operator uses key, it's consumed
- After enrollment, engineer renames peer to
pilot-ivanov
Pain Point 3: Secure Key Distribution
Problem
After CI/CD creates a setup key, how does it reach the operator?
Setup keys are sensitive:
- Anyone with the key can enroll a device into the network
- Keys may be reusable (depends on configuration)
- Keys should be transmitted securely
Current State
Setup keys are output by Terraform:
terraform output -raw gs_setup_key
But:
- Requires local Terraform access
- No automated distribution mechanism
- Keys in state file (committed to git in POC - not ideal)
Options
| Option | Description | Effort | Tradeoffs |
|---|---|---|---|
| A. Manual retrieval | Engineer runs terraform output locally |
Zero | Requires CLI access, manual process |
| B. CI output to ticket | CI posts key to ticket system via API | Medium | Keys in ticket history (audit trail) |
| C. Secrets manager | Store keys in Vault/1Password, notify engineer | Medium | Another system to integrate |
| D. Encrypted email | CI encrypts key, emails to operator | High | Key management complexity |
Recommendation
Option A for now (consistent with manual rename):
- Engineer retrieves key after CI completes
- Engineer sends key to operator via secure channel (Signal, encrypted email)
- Ticket updated with "key sent" status
Option B worth implementing if:
- Volume increases
- Want full automation
- Ticket system has secure "hidden fields" feature
Summary: Recommended Workflow
Given the constraints (code freeze, ~100 operators, ticket-based), the pragmatic workflow is:
1. Ticket created: "Onboard pilot Ivanov with BlastPilot + GS"
2. Engineer adds to Terraform:
- ticket-1234-pilot (one-off, 7 days)
- ticket-1234-gs (one-off, 7 days)
3. Engineer creates PR, gets review, merges
4. CI/CD applies changes, keys created
5. Engineer retrieves keys:
terraform output -raw ticket_1234_pilot_key
6. Engineer sends keys to operator via secure channel
7. Operator enrolls both devices
8. Engineer renames peers in dashboard:
DESKTOP-ABC123 -> pilot-ivanov
raspberrypi -> gs-ivanov
9. Engineer closes ticket
Total engineer time: ~10 minutes per onboarding (pair of devices)
Automation level: Groups, policies, key creation automated; naming and distribution manual
Future Improvements (If Needed)
- Webhook listener for peer enrollment events -> auto-rename based on timing correlation
- Ticket system integration for automated key distribution
- Custom installer that prompts for device name before enrollment
- Batch onboarding tool for multiple operators at once
These can be addressed incrementally as the operation scales.