Files
netbird-iac/PAIN_POINTS.md
Prox ca546ff6d8
All checks were successful
Terraform / terraform (push) Successful in 7s
added netbird-watcher script
2026-02-15 19:11:39 +02:00

129 lines
3.5 KiB
Markdown

# NetBird GitOps - Pain Points Status
## Summary
| # | Pain Point | Status |
|---|------------|--------|
| 1 | Peer naming after enrollment | **SOLVED** - Watcher service |
| 2 | Per-user vs per-role setup keys | **SOLVED** - One-off keys per user |
| 3 | Secure key distribution | Documented workflow |
---
## Pain Point 1: Peer Naming After Enrollment - SOLVED
### Problem
When a peer enrolls using a setup key, it appears with its hostname (e.g., `DESKTOP-ABC123`), not a meaningful name.
### Solution
**Watcher service** automatically renames peers:
1. Setup key name = desired peer name (e.g., `pilot-ivanov`)
2. Operator enrolls -> peer appears as `DESKTOP-ABC123`
3. Watcher detects consumed key via API polling (every 30s)
4. Watcher finds peer created around key usage time
5. Watcher renames peer to match key name -> `pilot-ivanov`
**Implementation:** `watcher/netbird_watcher.py`
**Deployment:**
```bash
cd ansible/netbird-watcher
ansible-playbook -i poc-inventory.yml playbook.yml -e vault_netbird_token=<TOKEN>
```
**How correlation works:**
- Watcher polls `GET /api/setup-keys` for keys with `used_times > 0`
- Gets `last_used` timestamp from the key
- Polls `GET /api/peers` for peers created within 60 seconds of that timestamp
- Renames matching peer via `PUT /api/peers/{id}`
- Marks key as processed to avoid re-processing
---
## Pain Point 2: Per-User vs Per-Role Setup Keys - SOLVED
### Problem
Reusable per-role keys (e.g., `pilot-onboarding`) don't provide:
- Audit trail (who enrolled which device?)
- Individual revocation
- Usage attribution
### Solution
**One-off keys per user/device:**
```hcl
resource "netbird_setup_key" "pilot_ivanov" {
name = "pilot-ivanov"
type = "one-off" # Single use
auto_groups = [netbird_group.pilots.id]
usage_limit = 1
ephemeral = false
}
```
**Benefits:**
- Key name = audit trail (linked to ticket/user)
- Key is consumed after single use
- Individual keys can be revoked before use
- Watcher uses key name as peer name automatically
---
## Pain Point 3: Secure Key Distribution
### Current Workflow
1. CI/CD creates setup key
2. Engineer retrieves key locally: `terraform output -raw pilot_ivanov_key`
3. Engineer sends key to operator via secure channel (Signal, encrypted email)
4. Operator uses key within expiry window
### Considerations
- Keys are sensitive - anyone with key can enroll a device
- One-off keys mitigate risk - single use, can't be reused if leaked
- Short expiry (7 days) limits exposure window
### Future Improvements (If Needed)
| Option | Description |
|--------|-------------|
| Ticket integration | CI posts key directly to ticket system |
| Secrets manager | Store in Vault/1Password, notify engineer |
| Self-service portal | Operator requests key, gets it directly |
For ~100 operators with ticket-based workflow, manual retrieval is acceptable.
---
## Final Workflow
```
1. Ticket: "Onboard pilot Ivanov with BlastPilot"
2. Engineer adds to terraform/setup_keys.tf:
- netbird_setup_key.pilot_ivanov (one-off, 7 days)
3. Engineer creates PR -> CI shows plan
4. PR merged -> CI applies -> key created
5. Engineer retrieves: terraform output -raw pilot_ivanov_key
6. Engineer sends key to operator via Signal/email
7. Operator installs NetBird, enrolls with key
8. Watcher auto-renames peer to "pilot-ivanov"
9. Ticket closed
```
**Engineer time:** ~2 minutes (Terraform edit + key retrieval + send)
**Automation:** Full - groups, policies, keys, peer naming all automated