# NetBird GitOps - Remaining Pain Points This document captures challenges discovered during the POC that need resolution before production use. ## Context **Use case:** ~100+ operators, each with 2 devices (BlastPilot + BlastGS-Agent) **Workflow:** Ticket-based onboarding, engineer creates PR, merge triggers setup key creation **Current pain:** Manual setup key creation and peer renaming in dashboard --- ## Pain Point 1: Peer Naming After Enrollment ### Problem When a peer enrolls using a setup key, it appears in the NetBird dashboard with its hostname (e.g., `DESKTOP-ABC123` or `raspberrypi`). These hostnames are: - Often generic and meaningless - Not controllable via IaC (peer generates its own keypair locally) - Confusing when managing 100+ devices **Desired state:** Peer appears as `pilot-ivanov` or `gs-unit-042` immediately after enrollment. ### Root Cause NetBird's architecture requires peers to self-enroll: 1. Setup key defines which groups the peer joins 2. Peer runs `netbird up --setup-key ` 3. Peer generates WireGuard keypair locally 4. Peer registers with management server using its local hostname 5. **No API link between "which setup key was used" and "which peer enrolled"** ### Options | Option | Description | Effort | Tradeoffs | |--------|-------------|--------|-----------| | **A. Manual rename** | Engineer renames peer in dashboard after enrollment | Zero | 30 seconds per device, human in loop | | **B. Polling service** | Service watches for new peers, matches by timing/IP, renames | Medium | More infrastructure, heuristic matching | | **C. Per-user tracking groups** | Unique group per user, find peer by group membership | High | Group sprawl, cleanup needed | | **D. Installer modification** | Modify BlastPilot/BlastGS-Agent to set hostname before enrollment | N/A | Code freeze constraint | ### Recommendation **Option A** is acceptable for ~100 operators with ticket-based workflow: - Ticket arrives -> engineer creates PR -> PR merges -> engineer sends setup key -> operator enrolls -> **engineer renames peer (30 sec)** - Total engineer time per onboarding: ~5 minutes - No additional infrastructure **Option B** worth considering if: - Onboarding volume increases significantly - Full automation is required (no human in loop) --- ## Pain Point 2: Per-User vs Per-Role Setup Keys ### Current State Setup keys are defined per-role in `terraform/setup_keys.tf`: ```hcl resource "netbird_setup_key" "gs_onboarding" { name = "ground-station-onboarding" type = "reusable" auto_groups = [netbird_group.ground_stations.id] ... } ``` This means: - One reusable key per role - Key is shared across all operators of that role - No way to track "this key was issued to Ivanov" ### Problems 1. **No audit trail** - Can't answer "who enrolled device X?" 2. **Revocation is all-or-nothing** - Revoking `pilot-onboarding` affects everyone 3. **No usage attribution** - Can't enforce "one device per operator" ### Options | Option | Description | Effort | Tradeoffs | |--------|-------------|--------|-----------| | **A. Accept per-role keys** | Current state, manual tracking in ticket system | Zero | No IaC-level audit trail | | **B. Per-user setup keys** | Create key per onboarding request | Low | More keys to manage, cleanup needed | | **C. One-off keys** | Each key has `usage_limit = 1` | Low | Key destroyed after use, good for audit | ### Recommendation **Option C (one-off keys)** provides the best tradeoff: - Create unique key per onboarding ticket - Key auto-expires after first use - Clear audit trail: key name links to ticket number - Easy to implement: ```hcl # Example: ticket-based one-off key resource "netbird_setup_key" "ticket_1234_pilot" { name = "ticket-1234-pilot-ivanov" type = "one-off" auto_groups = [netbird_group.pilots.id] usage_limit = 1 ephemeral = false } ``` **Workflow:** 1. Ticket ACHILLES-1234: "Onboard pilot Ivanov" 2. Engineer adds setup key `ticket-1234-pilot-ivanov` to Terraform 3. PR merged, key created 4. Engineer sends key to operator (see Pain Point 3) 5. Operator uses key, it's consumed 6. After enrollment, engineer renames peer to `pilot-ivanov` --- ## Pain Point 3: Secure Key Distribution ### Problem After CI/CD creates a setup key, how does it reach the operator? Setup keys are sensitive: - Anyone with the key can enroll a device into the network - Keys may be reusable (depends on configuration) - Keys should be transmitted securely ### Current State Setup keys are output by Terraform: ```bash terraform output -raw gs_setup_key ``` But: - Requires local Terraform access - No automated distribution mechanism - Keys in state file (committed to git in POC - not ideal) ### Options | Option | Description | Effort | Tradeoffs | |--------|-------------|--------|-----------| | **A. Manual retrieval** | Engineer runs `terraform output` locally | Zero | Requires CLI access, manual process | | **B. CI output to ticket** | CI posts key to ticket system via API | Medium | Keys in ticket history (audit trail) | | **C. Secrets manager** | Store keys in Vault/1Password, notify engineer | Medium | Another system to integrate | | **D. Encrypted email** | CI encrypts key, emails to operator | High | Key management complexity | ### Recommendation **Option A** for now (consistent with manual rename): - Engineer retrieves key after CI completes - Engineer sends key to operator via secure channel (Signal, encrypted email) - Ticket updated with "key sent" status **Option B** worth implementing if: - Volume increases - Want full automation - Ticket system has secure "hidden fields" feature --- ## Summary: Recommended Workflow Given the constraints (code freeze, ~100 operators, ticket-based), the pragmatic workflow is: ``` 1. Ticket created: "Onboard pilot Ivanov with BlastPilot + GS" 2. Engineer adds to Terraform: - ticket-1234-pilot (one-off, 7 days) - ticket-1234-gs (one-off, 7 days) 3. Engineer creates PR, gets review, merges 4. CI/CD applies changes, keys created 5. Engineer retrieves keys: terraform output -raw ticket_1234_pilot_key 6. Engineer sends keys to operator via secure channel 7. Operator enrolls both devices 8. Engineer renames peers in dashboard: DESKTOP-ABC123 -> pilot-ivanov raspberrypi -> gs-ivanov 9. Engineer closes ticket ``` **Total engineer time:** ~10 minutes per onboarding (pair of devices) **Automation level:** Groups, policies, key creation automated; naming and distribution manual --- ## Future Improvements (If Needed) 1. **Webhook listener** for peer enrollment events -> auto-rename based on timing correlation 2. **Ticket system integration** for automated key distribution 3. **Custom installer** that prompts for device name before enrollment 4. **Batch onboarding tool** for multiple operators at once These can be addressed incrementally as the operation scales.