diff --git a/PAIN_POINTS.md b/PAIN_POINTS.md index 35fabe2..9d69245 100644 --- a/PAIN_POINTS.md +++ b/PAIN_POINTS.md @@ -1,204 +1,128 @@ -# NetBird GitOps - Remaining Pain Points +# NetBird GitOps - Pain Points Status -This document captures challenges discovered during the POC that need resolution before production use. +## Summary -## Context - -**Use case:** ~100+ operators, each with 2 devices (BlastPilot + BlastGS-Agent) -**Workflow:** Ticket-based onboarding, engineer creates PR, merge triggers setup key creation -**Current pain:** Manual setup key creation and peer renaming in dashboard +| # | Pain Point | Status | +|---|------------|--------| +| 1 | Peer naming after enrollment | **SOLVED** - Watcher service | +| 2 | Per-user vs per-role setup keys | **SOLVED** - One-off keys per user | +| 3 | Secure key distribution | Documented workflow | --- -## Pain Point 1: Peer Naming After Enrollment +## Pain Point 1: Peer Naming After Enrollment - SOLVED ### Problem -When a peer enrolls using a setup key, it appears in the NetBird dashboard with its hostname (e.g., `DESKTOP-ABC123` or `raspberrypi`). These hostnames are: -- Often generic and meaningless -- Not controllable via IaC (peer generates its own keypair locally) -- Confusing when managing 100+ devices +When a peer enrolls using a setup key, it appears with its hostname (e.g., `DESKTOP-ABC123`), not a meaningful name. -**Desired state:** Peer appears as `pilot-ivanov` or `gs-unit-042` immediately after enrollment. +### Solution -### Root Cause +**Watcher service** automatically renames peers: -NetBird's architecture requires peers to self-enroll: -1. Setup key defines which groups the peer joins -2. Peer runs `netbird up --setup-key ` -3. Peer generates WireGuard keypair locally -4. Peer registers with management server using its local hostname -5. **No API link between "which setup key was used" and "which peer enrolled"** +1. Setup key name = desired peer name (e.g., `pilot-ivanov`) +2. Operator enrolls -> peer appears as `DESKTOP-ABC123` +3. Watcher detects consumed key via API polling (every 30s) +4. Watcher finds peer created around key usage time +5. Watcher renames peer to match key name -> `pilot-ivanov` -### Options +**Implementation:** `watcher/netbird_watcher.py` -| Option | Description | Effort | Tradeoffs | -|--------|-------------|--------|-----------| -| **A. Manual rename** | Engineer renames peer in dashboard after enrollment | Zero | 30 seconds per device, human in loop | -| **B. Polling service** | Service watches for new peers, matches by timing/IP, renames | Medium | More infrastructure, heuristic matching | -| **C. Per-user tracking groups** | Unique group per user, find peer by group membership | High | Group sprawl, cleanup needed | -| **D. Installer modification** | Modify BlastPilot/BlastGS-Agent to set hostname before enrollment | N/A | Code freeze constraint | +**Deployment:** +```bash +cd ansible/netbird-watcher +ansible-playbook -i poc-inventory.yml playbook.yml -e vault_netbird_token= +``` -### Recommendation - -**Option A** is acceptable for ~100 operators with ticket-based workflow: -- Ticket arrives -> engineer creates PR -> PR merges -> engineer sends setup key -> operator enrolls -> **engineer renames peer (30 sec)** -- Total engineer time per onboarding: ~5 minutes -- No additional infrastructure - -**Option B** worth considering if: -- Onboarding volume increases significantly -- Full automation is required (no human in loop) +**How correlation works:** +- Watcher polls `GET /api/setup-keys` for keys with `used_times > 0` +- Gets `last_used` timestamp from the key +- Polls `GET /api/peers` for peers created within 60 seconds of that timestamp +- Renames matching peer via `PUT /api/peers/{id}` +- Marks key as processed to avoid re-processing --- -## Pain Point 2: Per-User vs Per-Role Setup Keys +## Pain Point 2: Per-User vs Per-Role Setup Keys - SOLVED -### Current State +### Problem -Setup keys are defined per-role in `terraform/setup_keys.tf`: -```hcl -resource "netbird_setup_key" "gs_onboarding" { - name = "ground-station-onboarding" - type = "reusable" - auto_groups = [netbird_group.ground_stations.id] - ... -} -``` +Reusable per-role keys (e.g., `pilot-onboarding`) don't provide: +- Audit trail (who enrolled which device?) +- Individual revocation +- Usage attribution -This means: -- One reusable key per role -- Key is shared across all operators of that role -- No way to track "this key was issued to Ivanov" +### Solution -### Problems - -1. **No audit trail** - Can't answer "who enrolled device X?" -2. **Revocation is all-or-nothing** - Revoking `pilot-onboarding` affects everyone -3. **No usage attribution** - Can't enforce "one device per operator" - -### Options - -| Option | Description | Effort | Tradeoffs | -|--------|-------------|--------|-----------| -| **A. Accept per-role keys** | Current state, manual tracking in ticket system | Zero | No IaC-level audit trail | -| **B. Per-user setup keys** | Create key per onboarding request | Low | More keys to manage, cleanup needed | -| **C. One-off keys** | Each key has `usage_limit = 1` | Low | Key destroyed after use, good for audit | - -### Recommendation - -**Option C (one-off keys)** provides the best tradeoff: -- Create unique key per onboarding ticket -- Key auto-expires after first use -- Clear audit trail: key name links to ticket number -- Easy to implement: +**One-off keys per user/device:** ```hcl -# Example: ticket-based one-off key -resource "netbird_setup_key" "ticket_1234_pilot" { - name = "ticket-1234-pilot-ivanov" - type = "one-off" +resource "netbird_setup_key" "pilot_ivanov" { + name = "pilot-ivanov" + type = "one-off" # Single use auto_groups = [netbird_group.pilots.id] usage_limit = 1 ephemeral = false } ``` -**Workflow:** -1. Ticket ACHILLES-1234: "Onboard pilot Ivanov" -2. Engineer adds setup key `ticket-1234-pilot-ivanov` to Terraform -3. PR merged, key created -4. Engineer sends key to operator (see Pain Point 3) -5. Operator uses key, it's consumed -6. After enrollment, engineer renames peer to `pilot-ivanov` +**Benefits:** +- Key name = audit trail (linked to ticket/user) +- Key is consumed after single use +- Individual keys can be revoked before use +- Watcher uses key name as peer name automatically --- ## Pain Point 3: Secure Key Distribution -### Problem +### Current Workflow -After CI/CD creates a setup key, how does it reach the operator? +1. CI/CD creates setup key +2. Engineer retrieves key locally: `terraform output -raw pilot_ivanov_key` +3. Engineer sends key to operator via secure channel (Signal, encrypted email) +4. Operator uses key within expiry window -Setup keys are sensitive: -- Anyone with the key can enroll a device into the network -- Keys may be reusable (depends on configuration) -- Keys should be transmitted securely +### Considerations -### Current State +- Keys are sensitive - anyone with key can enroll a device +- One-off keys mitigate risk - single use, can't be reused if leaked +- Short expiry (7 days) limits exposure window -Setup keys are output by Terraform: -```bash -terraform output -raw gs_setup_key -``` +### Future Improvements (If Needed) -But: -- Requires local Terraform access -- No automated distribution mechanism -- Keys in state file (committed to git in POC - not ideal) +| Option | Description | +|--------|-------------| +| Ticket integration | CI posts key directly to ticket system | +| Secrets manager | Store in Vault/1Password, notify engineer | +| Self-service portal | Operator requests key, gets it directly | -### Options - -| Option | Description | Effort | Tradeoffs | -|--------|-------------|--------|-----------| -| **A. Manual retrieval** | Engineer runs `terraform output` locally | Zero | Requires CLI access, manual process | -| **B. CI output to ticket** | CI posts key to ticket system via API | Medium | Keys in ticket history (audit trail) | -| **C. Secrets manager** | Store keys in Vault/1Password, notify engineer | Medium | Another system to integrate | -| **D. Encrypted email** | CI encrypts key, emails to operator | High | Key management complexity | - -### Recommendation - -**Option A** for now (consistent with manual rename): -- Engineer retrieves key after CI completes -- Engineer sends key to operator via secure channel (Signal, encrypted email) -- Ticket updated with "key sent" status - -**Option B** worth implementing if: -- Volume increases -- Want full automation -- Ticket system has secure "hidden fields" feature +For ~100 operators with ticket-based workflow, manual retrieval is acceptable. --- -## Summary: Recommended Workflow - -Given the constraints (code freeze, ~100 operators, ticket-based), the pragmatic workflow is: +## Final Workflow ``` -1. Ticket created: "Onboard pilot Ivanov with BlastPilot + GS" +1. Ticket: "Onboard pilot Ivanov with BlastPilot" -2. Engineer adds to Terraform: - - ticket-1234-pilot (one-off, 7 days) - - ticket-1234-gs (one-off, 7 days) +2. Engineer adds to terraform/setup_keys.tf: + - netbird_setup_key.pilot_ivanov (one-off, 7 days) -3. Engineer creates PR, gets review, merges +3. Engineer creates PR -> CI shows plan -4. CI/CD applies changes, keys created +4. PR merged -> CI applies -> key created -5. Engineer retrieves keys: - terraform output -raw ticket_1234_pilot_key +5. Engineer retrieves: terraform output -raw pilot_ivanov_key -6. Engineer sends keys to operator via secure channel +6. Engineer sends key to operator via Signal/email -7. Operator enrolls both devices +7. Operator installs NetBird, enrolls with key -8. Engineer renames peers in dashboard: - DESKTOP-ABC123 -> pilot-ivanov - raspberrypi -> gs-ivanov +8. Watcher auto-renames peer to "pilot-ivanov" -9. Engineer closes ticket +9. Ticket closed ``` -**Total engineer time:** ~10 minutes per onboarding (pair of devices) -**Automation level:** Groups, policies, key creation automated; naming and distribution manual - ---- - -## Future Improvements (If Needed) - -1. **Webhook listener** for peer enrollment events -> auto-rename based on timing correlation -2. **Ticket system integration** for automated key distribution -3. **Custom installer** that prompts for device name before enrollment -4. **Batch onboarding tool** for multiple operators at once - -These can be addressed incrementally as the operation scales. +**Engineer time:** ~2 minutes (Terraform edit + key retrieval + send) +**Automation:** Full - groups, policies, keys, peer naming all automated diff --git a/README.md b/README.md index fd30ee9..c23b8e3 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,7 @@ Proof-of-concept for managing NetBird VPN configuration via Infrastructure as Co ## Project Status: POC Complete **Start date:** 2026-02-15 -**Status:** Core functionality working, remaining pain points documented +**Status:** Full automation implemented including peer auto-naming ### What Works @@ -14,13 +14,7 @@ Proof-of-concept for managing NetBird VPN configuration via Infrastructure as Co - [x] Gitea Actions runner for CI/CD - [x] Terraform implementation - creates groups, policies, setup keys - [x] CI/CD pipeline - PR shows plan, merge-to-main applies changes - -### Remaining Pain Points - -See [PAIN_POINTS.md](./PAIN_POINTS.md) for detailed analysis of: -- Peer naming automation (no link between setup keys and enrolled peers) -- Per-user vs per-role setup keys -- Secure key distribution to operators +- [x] **Watcher service** - automatically renames peers based on setup key names --- @@ -29,185 +23,183 @@ See [PAIN_POINTS.md](./PAIN_POINTS.md) for detailed analysis of: ``` +-------------------+ PR/Merge +-------------------+ | Engineer | ----------------> | Gitea | -| (edits .tf) | | (gitea-poc.*) | -+-------------------+ +-------------------+ - | - | CI/CD +| (creates setup | | (CI/CD) | +| key: pilot-X) | +-------------------+ ++-------------------+ | + | terraform apply v - +-------------------+ - | Terraform | - | (in Actions) | - +-------------------+ - | - | API calls - v -+-------------------+ Enroll +-------------------+ -| Operators | ----------------> | NetBird | -| (use setup keys) | | (netbird-poc.*) | +-------------------+ +-------------------+ +| Watcher Service | <---- polls ----> | NetBird API | +| (auto-rename) | +-------------------+ ++-------------------+ ^ + | enrolls ++-------------------+ | +| Operator | -------------------------+ +| (uses setup key) | peer appears as "DESKTOP-XYZ" ++-------------------+ watcher renames to "pilot-X" ``` +## Complete Workflow + +1. **Ticket arrives:** "Onboard pilot Ivanov" +2. **Engineer adds to Terraform:** + ```hcl + resource "netbird_setup_key" "pilot_ivanov" { + name = "pilot-ivanov" # <-- This becomes the peer name + type = "one-off" + auto_groups = [netbird_group.pilots.id] + usage_limit = 1 + } + ``` +3. **Engineer creates PR** -> CI runs `terraform plan` +4. **PR merged** -> CI runs `terraform apply` -> setup key created +5. **Engineer retrieves key:** `terraform output -raw pilot_ivanov_key` +6. **Engineer sends key to operator** (via secure channel) +7. **Operator enrolls** -> peer appears as `DESKTOP-ABC123` +8. **Watcher detects** consumed key, renames peer to `pilot-ivanov` +9. **Done** - peer is correctly named, no manual intervention + +--- + ## Directory Structure ``` netbird-gitops-poc/ ├── ansible/ # Deployment playbooks │ ├── caddy/ # Shared reverse proxy -│ ├── gitea/ # Standalone Gitea (no OAuth) +│ ├── gitea/ # Standalone Gitea │ ├── gitea-runner/ # Gitea Actions runner -│ └── netbird/ # NetBird with embedded IdP -├── terraform/ # Terraform configuration (Gitea repo content) +│ ├── netbird/ # NetBird server +│ └── netbird-watcher/ # Peer renamer service +├── terraform/ # Terraform configuration │ ├── .gitea/workflows/ # CI/CD workflow -│ │ └── terraform.yml -│ ├── main.tf # Provider config -│ ├── variables.tf # Input variables -│ ├── groups.tf # Group resources -│ ├── policies.tf # Policy resources -│ ├── setup_keys.tf # Setup key resources -│ ├── outputs.tf # Output values -│ ├── terraform.tfstate # State (committed for POC) -│ ├── terraform.tfvars # Secrets (gitignored) -│ └── terraform.tfvars.example +│ ├── main.tf +│ ├── groups.tf +│ ├── policies.tf +│ ├── setup_keys.tf +│ └── outputs.tf +├── watcher/ # Watcher service source +│ ├── netbird_watcher.py +│ ├── netbird-watcher.service +│ └── README.md ├── README.md └── PAIN_POINTS.md ``` -## Quick Start +--- + +## Deployment ### Prerequisites - VPS with Docker - DNS records pointing to VPS - Ansible installed locally -- Terraform installed locally (for initial setup) +- Terraform installed locally -### 1. Deploy Infrastructure +### 1. Deploy Core Infrastructure ```bash -# 1. NetBird (generates secrets, needs vault password) +# NetBird cd ansible/netbird ./generate-vault.sh ansible-vault encrypt group_vars/vault.yml ansible-playbook -i poc-inventory.yml playbook-ssl.yml --ask-vault-pass -# 2. Gitea +# Gitea cd ../gitea ansible-playbook -i poc-inventory.yml playbook.yml -# 3. Caddy (reverse proxy for both) +# Caddy (reverse proxy) cd ../caddy ansible-playbook -i poc-inventory.yml playbook.yml -# 4. Gitea Runner (get token from Gitea Admin -> Actions -> Runners) +# Gitea Runner cd ../gitea-runner ansible-playbook -i poc-inventory.yml playbook.yml -e vault_gitea_runner_token= ``` -### 2. Initial Terraform Setup (Local) +### 2. Deploy Watcher Service + +```bash +cd ansible/netbird-watcher +ansible-playbook -i poc-inventory.yml playbook.yml -e vault_netbird_token= +``` + +### 3. Initialize Terraform ```bash cd terraform - -# Create tfvars with your NetBird PAT cp terraform.tfvars.example terraform.tfvars -# Edit terraform.tfvars with actual token - -# Initialize and apply +# Edit terraform.tfvars with NetBird PAT terraform init terraform apply ``` -### 3. Push to Gitea +### 4. Configure Gitea + +Push terraform directory to Gitea repo, configure secret `NETBIRD_TOKEN`. + +--- + +## Adding a New Operator + +1. Add setup key to `terraform/setup_keys.tf`: + ```hcl + resource "netbird_setup_key" "pilot_ivanov" { + name = "pilot-ivanov" + type = "one-off" + auto_groups = [netbird_group.pilots.id] + usage_limit = 1 + ephemeral = false + } + + output "pilot_ivanov_key" { + value = netbird_setup_key.pilot_ivanov.key + sensitive = true + } + ``` + +2. Commit, push, merge PR + +3. Retrieve key: + ```bash + terraform output -raw pilot_ivanov_key + ``` + +4. Send key to operator + +5. Operator enrolls -> watcher auto-renames peer + +--- + +## Monitoring + +### Watcher Service ```bash -cd terraform -git init -git add . -git commit -m "Initial Terraform config" -git remote add origin git@gitea-poc.networkmonitor.cc:admin/netbird-iac.git -git push -u origin main +# Status +systemctl status netbird-watcher + +# Logs +journalctl -u netbird-watcher -f + +# Processed keys +cat /var/lib/netbird-watcher/state.json ``` -### 4. Configure Gitea Secrets - -In Gitea repository Settings -> Actions -> Secrets: -- `NETBIRD_TOKEN`: Your NetBird PAT - -### 5. Make Changes via GitOps - -Edit Terraform files locally, push to create PR: - -```hcl -# groups.tf - add a new group -resource "netbird_group" "new_team" { - name = "new-team" -} -``` - -```bash -git checkout -b add-new-team -git add groups.tf -git commit -m "Add new-team group" -git push -u origin add-new-team -# Create PR in Gitea -> CI runs terraform plan -# Merge PR -> CI runs terraform apply -``` - ---- - -## CI/CD Workflow - -The `.gitea/workflows/terraform.yml` workflow: - -| Event | Action | -|-------|--------| -| Pull Request | `terraform plan` (preview changes) | -| Push to main | `terraform apply` (apply changes) | -| After apply | Commit updated state file | - -**State Management:** State is committed to git (acceptable for single-operator POC). For production, use a remote backend. - ---- - -## Key Discoveries - -### NetBird API Behavior - -1. **Peer IDs are not predictable** - Generated server-side at enrollment time -2. **No setup key -> peer link** - NetBird doesn't record which setup key enrolled a peer -3. **Peers self-enroll** - Cannot create peers via API (WireGuard keypair generated locally) -4. **Terraform URL format** - Use `https://domain.com` NOT `https://domain.com/api` - ---- - -## Credentials Reference (POC Only) - -| Service | Credential | Location | -|---------|------------|----------| -| NetBird PAT | `nbp_T3yD...` | Dashboard -> Team -> Service Users | -| Gitea | admin user | Created during setup | -| VPS | root | `observability-poc.networkmonitor.cc` | - -**Warning:** Rotate all credentials before any production use. - --- ## Cleanup ```bash # Destroy Terraform resources -cd terraform -terraform destroy +cd terraform && terraform destroy -# Stop VPS services +# Stop services on VPS ssh root@observability-poc.networkmonitor.cc +systemctl stop netbird-watcher cd /opt/caddy && docker compose down cd /opt/gitea && docker compose down cd /opt/netbird && docker compose down ``` - ---- - -## Next Steps - -See [PAIN_POINTS.md](./PAIN_POINTS.md) for remaining challenges to address before production use. diff --git a/ansible/netbird-watcher/group_vars/netbird_watcher_servers.yml b/ansible/netbird-watcher/group_vars/netbird_watcher_servers.yml new file mode 100644 index 0000000..803f368 --- /dev/null +++ b/ansible/netbird-watcher/group_vars/netbird_watcher_servers.yml @@ -0,0 +1,8 @@ +--- +# NetBird Watcher Configuration + +# NetBird management URL +netbird_url: "https://netbird-poc.networkmonitor.cc" + +# NetBird API token (set via -e vault_netbird_token=) +netbird_token: "{{ vault_netbird_token }}" diff --git a/ansible/netbird-watcher/playbook.yml b/ansible/netbird-watcher/playbook.yml new file mode 100644 index 0000000..3a38271 --- /dev/null +++ b/ansible/netbird-watcher/playbook.yml @@ -0,0 +1,117 @@ +--- +# ============================================================================= +# NetBird Watcher Deployment +# ============================================================================= +# Deploys the peer renamer watcher service. +# +# Prerequisites: +# 1. NetBird instance running +# 2. NetBird API token (PAT) +# +# Usage: +# ansible-playbook -i poc-inventory.yml playbook.yml -e vault_netbird_token= +# ============================================================================= + +- name: Deploy NetBird Watcher + hosts: netbird_watcher_servers + become: true + vars_files: + - group_vars/netbird_watcher_servers.yml + + pre_tasks: + - name: Validate token is provided + ansible.builtin.assert: + that: + - netbird_token is defined + - netbird_token | length > 0 + fail_msg: | + NetBird token not provided! + Run with: -e vault_netbird_token= + + tasks: + # ========================================================================= + # Create Directories + # ========================================================================= + - name: Create watcher directories + ansible.builtin.file: + path: "{{ item }}" + state: directory + mode: "0755" + loop: + - /opt/netbird-watcher + - /etc/netbird-watcher + - /var/lib/netbird-watcher + + # ========================================================================= + # Deploy Script + # ========================================================================= + - name: Copy watcher script + ansible.builtin.copy: + src: ../../watcher/netbird_watcher.py + dest: /opt/netbird-watcher/netbird_watcher.py + mode: "0755" + + # ========================================================================= + # Configure + # ========================================================================= + - name: Create config file + ansible.builtin.template: + src: templates/config.json.j2 + dest: /etc/netbird-watcher/config.json + mode: "0600" + notify: Restart netbird-watcher + + # ========================================================================= + # Systemd Service + # ========================================================================= + - name: Copy systemd service + ansible.builtin.copy: + src: ../../watcher/netbird-watcher.service + dest: /etc/systemd/system/netbird-watcher.service + mode: "0644" + notify: Restart netbird-watcher + + - name: Reload systemd + ansible.builtin.systemd: + daemon_reload: true + + - name: Start and enable watcher service + ansible.builtin.systemd: + name: netbird-watcher + state: started + enabled: true + + # ========================================================================= + # Verify + # ========================================================================= + - name: Wait for service to stabilize + ansible.builtin.pause: + seconds: 3 + + - name: Check service status + ansible.builtin.systemd: + name: netbird-watcher + register: watcher_status + + - name: Display deployment status + ansible.builtin.debug: + msg: | + ============================================ + NetBird Watcher Deployed! + ============================================ + + Service status: {{ watcher_status.status.ActiveState }} + + View logs: + journalctl -u netbird-watcher -f + + State file: + /var/lib/netbird-watcher/state.json + + ============================================ + + handlers: + - name: Restart netbird-watcher + ansible.builtin.systemd: + name: netbird-watcher + state: restarted diff --git a/ansible/netbird-watcher/poc-inventory.yml b/ansible/netbird-watcher/poc-inventory.yml new file mode 100644 index 0000000..be3ae73 --- /dev/null +++ b/ansible/netbird-watcher/poc-inventory.yml @@ -0,0 +1,8 @@ +--- +all: + children: + netbird_watcher_servers: + hosts: + poc-vps: + ansible_host: observability-poc.networkmonitor.cc + ansible_user: root diff --git a/ansible/netbird-watcher/templates/config.json.j2 b/ansible/netbird-watcher/templates/config.json.j2 new file mode 100644 index 0000000..9aa1d59 --- /dev/null +++ b/ansible/netbird-watcher/templates/config.json.j2 @@ -0,0 +1,4 @@ +{ + "url": "{{ netbird_url }}", + "token": "{{ netbird_token }}" +} diff --git a/watcher/README.md b/watcher/README.md new file mode 100644 index 0000000..b21ec45 --- /dev/null +++ b/watcher/README.md @@ -0,0 +1,104 @@ +# NetBird Peer Renamer Watcher + +Automatically renames NetBird peers after enrollment based on setup key names. + +## How It Works + +1. Engineer creates setup key named after the desired peer name (e.g., `pilot-ivanov`) +2. Operator enrolls using the setup key +3. Peer appears with random hostname (e.g., `DESKTOP-ABC123`) +4. **Watcher detects the consumed setup key and renames peer to `pilot-ivanov`** + +## Logic + +The watcher polls NetBird API every 30 seconds: + +1. Fetches all setup keys +2. Finds keys with `used_times > 0` that haven't been processed +3. For each consumed key: + - Looks up `last_used` timestamp + - Finds peer created around that time (within 60 seconds) + - Renames peer to match setup key name +4. Marks key as processed to avoid re-processing + +## Installation + +### Via Ansible + +```bash +cd ansible/netbird-watcher +ansible-playbook -i poc-inventory.yml playbook.yml -e vault_netbird_token= +``` + +### Manual + +```bash +# Copy script +sudo cp netbird_watcher.py /opt/netbird-watcher/ +sudo chmod +x /opt/netbird-watcher/netbird_watcher.py + +# Create config +sudo mkdir -p /etc/netbird-watcher +sudo cat > /etc/netbird-watcher/config.json << EOF +{ + "url": "https://netbird-poc.networkmonitor.cc", + "token": "nbp_YOUR_TOKEN" +} +EOF +sudo chmod 600 /etc/netbird-watcher/config.json + +# Create state directory +sudo mkdir -p /var/lib/netbird-watcher + +# Install service +sudo cp netbird-watcher.service /etc/systemd/system/ +sudo systemctl daemon-reload +sudo systemctl enable --now netbird-watcher +``` + +## Usage + +### Check status + +```bash +systemctl status netbird-watcher +journalctl -u netbird-watcher -f +``` + +### Run manually (one-shot) + +```bash +./netbird_watcher.py \ + --url https://netbird-poc.networkmonitor.cc \ + --token nbp_xxx \ + --once \ + --verbose +``` + +### State file + +Processed keys are tracked in `/var/lib/netbird-watcher/state.json`: + +```json +{ + "processed_keys": ["key-id-1", "key-id-2"] +} +``` + +To reprocess a key, remove its ID from this file. + +## Troubleshooting + +### Peer not renamed + +1. Check if setup key was consumed: `used_times > 0` +2. Check watcher logs: `journalctl -u netbird-watcher` +3. Ensure peer enrolled within 60 seconds of key consumption +4. Check if key was already processed (in state.json) + +### Reset state + +```bash +sudo rm /var/lib/netbird-watcher/state.json +sudo systemctl restart netbird-watcher +``` diff --git a/watcher/config.json.example b/watcher/config.json.example new file mode 100644 index 0000000..d084e6c --- /dev/null +++ b/watcher/config.json.example @@ -0,0 +1,4 @@ +{ + "url": "https://netbird-poc.networkmonitor.cc", + "token": "nbp_YOUR_TOKEN_HERE" +} diff --git a/watcher/netbird-watcher.service b/watcher/netbird-watcher.service new file mode 100644 index 0000000..b9d6d37 --- /dev/null +++ b/watcher/netbird-watcher.service @@ -0,0 +1,19 @@ +[Unit] +Description=NetBird Peer Renamer Watcher +After=network.target + +[Service] +Type=simple +ExecStart=/opt/netbird-watcher/netbird_watcher.py --config /etc/netbird-watcher/config.json +Restart=always +RestartSec=10 + +# Security hardening +NoNewPrivileges=true +ProtectSystem=strict +ProtectHome=true +ReadWritePaths=/var/lib/netbird-watcher +PrivateTmp=true + +[Install] +WantedBy=multi-user.target diff --git a/watcher/netbird_watcher.py b/watcher/netbird_watcher.py new file mode 100644 index 0000000..323c968 --- /dev/null +++ b/watcher/netbird_watcher.py @@ -0,0 +1,348 @@ +#!/usr/bin/env python3 +""" +NetBird Peer Renamer Watcher + +Polls NetBird API for consumed setup keys and automatically renames +the enrolled peers to match the setup key name. + +Setup key name = desired peer name. + +Usage: + ./netbird_watcher.py --config /etc/netbird-watcher/config.json + ./netbird_watcher.py --url https://netbird.example.com --token nbp_xxx + +Environment variables (alternative to flags): + NETBIRD_URL - NetBird management URL + NETBIRD_TOKEN - NetBird API token (PAT) +""" + +import argparse +import json +import logging +import os +import sys +import time +from datetime import datetime, timezone +from pathlib import Path +from typing import Optional +from urllib.error import HTTPError, URLError +from urllib.request import Request, urlopen + +# ----------------------------------------------------------------------------- +# Configuration +# ----------------------------------------------------------------------------- + +DEFAULT_STATE_FILE = "/var/lib/netbird-watcher/state.json" +DEFAULT_POLL_INTERVAL = 30 # seconds +PEER_MATCH_WINDOW = 60 # seconds - how close peer creation must be to key usage + + +# ----------------------------------------------------------------------------- +# Logging +# ----------------------------------------------------------------------------- + +def setup_logging(verbose: bool = False) -> logging.Logger: + level = logging.DEBUG if verbose else logging.INFO + logging.basicConfig( + level=level, + format="%(asctime)s [%(levelname)s] %(message)s", + datefmt="%Y-%m-%d %H:%M:%S", + ) + return logging.getLogger("netbird-watcher") + + +# ----------------------------------------------------------------------------- +# State Management +# ----------------------------------------------------------------------------- + +class State: + """Tracks which setup keys have been processed.""" + + def __init__(self, state_file: str): + self.state_file = Path(state_file) + self.processed_keys: set[str] = set() + self._load() + + def _load(self): + if self.state_file.exists(): + try: + data = json.loads(self.state_file.read_text()) + self.processed_keys = set(data.get("processed_keys", [])) + except (json.JSONDecodeError, IOError) as e: + logging.warning(f"Failed to load state file: {e}") + self.processed_keys = set() + + def save(self): + self.state_file.parent.mkdir(parents=True, exist_ok=True) + data = {"processed_keys": list(self.processed_keys)} + self.state_file.write_text(json.dumps(data, indent=2)) + + def is_processed(self, key_id: str) -> bool: + return key_id in self.processed_keys + + def mark_processed(self, key_id: str): + self.processed_keys.add(key_id) + self.save() + + +# ----------------------------------------------------------------------------- +# NetBird API Client +# ----------------------------------------------------------------------------- + +class NetBirdAPI: + """Simple NetBird API client.""" + + def __init__(self, url: str, token: str): + self.base_url = url.rstrip("/") + self.token = token + self.logger = logging.getLogger("netbird-api") + + def _request(self, method: str, endpoint: str, data: Optional[dict] = None) -> dict: + url = f"{self.base_url}/api{endpoint}" + headers = { + "Authorization": f"Token {self.token}", + "Content-Type": "application/json", + } + + body = json.dumps(data).encode() if data else None + req = Request(url, data=body, headers=headers, method=method) + + try: + with urlopen(req, timeout=30) as resp: + return json.loads(resp.read().decode()) + except HTTPError as e: + self.logger.error(f"HTTP {e.code} for {method} {endpoint}: {e.read().decode()}") + raise + except URLError as e: + self.logger.error(f"URL error for {method} {endpoint}: {e}") + raise + + def get_setup_keys(self) -> list[dict]: + result = self._request("GET", "/setup-keys") + return result if isinstance(result, list) else [] + + def get_peers(self) -> list[dict]: + result = self._request("GET", "/peers") + return result if isinstance(result, list) else [] + + def rename_peer(self, peer_id: str, new_name: str) -> dict: + return self._request("PUT", f"/peers/{peer_id}", {"name": new_name}) + + +# ----------------------------------------------------------------------------- +# Watcher Logic +# ----------------------------------------------------------------------------- + +def parse_timestamp(ts: str) -> Optional[datetime]: + """Parse NetBird timestamp to datetime.""" + if not ts or ts.startswith("0001-01-01"): + return None + try: + # Handle various formats + ts = ts.replace("Z", "+00:00") + if "." in ts: + # Truncate nanoseconds to microseconds + parts = ts.split(".") + frac = parts[1] + tz_idx = frac.find("+") if "+" in frac else frac.find("-") if "-" in frac else len(frac) + frac_sec = frac[:tz_idx][:6] # max 6 digits for microseconds + tz_part = frac[tz_idx:] if tz_idx < len(frac) else "+00:00" + ts = f"{parts[0]}.{frac_sec}{tz_part}" + return datetime.fromisoformat(ts) + except ValueError: + return None + + +def find_matching_peer( + key_name: str, + key_last_used: datetime, + key_auto_groups: list[str], + peers: list[dict], + logger: logging.Logger, +) -> Optional[dict]: + """ + Find the peer that enrolled using this setup key. + + Strategy: + 1. Peer must be in one of the key's auto_groups + 2. Peer must have been created within PEER_MATCH_WINDOW of key usage + 3. Peer name should NOT already match key name (not yet renamed) + 4. Pick the closest match by creation time + """ + candidates = [] + + for peer in peers: + peer_name = peer.get("name", "") + peer_id = peer.get("id", "") + peer_groups = [g.get("id") for g in peer.get("groups", [])] + + # Skip if already renamed to target name + if peer_name == key_name: + logger.debug(f"Peer {peer_id} already named '{key_name}', skipping") + continue + + # Check group membership + if not any(g in peer_groups for g in key_auto_groups): + continue + + # Check creation time + # Note: NetBird peer object doesn't have 'created_at' directly accessible + # We use 'last_seen' or 'connected' status as proxy + # Actually, let's use the peer's 'connected' first time or 'last_seen' + + # Looking at NetBird API, peers have 'last_seen' but not always 'created_at' + # For newly enrolled peers, last_seen should be very recent + peer_last_seen = parse_timestamp(peer.get("last_seen", "")) + + if peer_last_seen: + time_diff = abs((peer_last_seen - key_last_used).total_seconds()) + if time_diff <= PEER_MATCH_WINDOW: + candidates.append((peer, time_diff)) + logger.debug( + f"Candidate peer: {peer_name} ({peer_id}), " + f"last_seen={peer_last_seen}, diff={time_diff:.1f}s" + ) + + if not candidates: + return None + + # Return closest match + candidates.sort(key=lambda x: x[1]) + return candidates[0][0] + + +def process_consumed_keys( + api: NetBirdAPI, + state: State, + logger: logging.Logger, +) -> int: + """ + Process all consumed setup keys and rename their peers. + Returns count of peers renamed. + """ + renamed_count = 0 + + try: + setup_keys = api.get_setup_keys() + peers = api.get_peers() + except Exception as e: + logger.error(f"Failed to fetch data from NetBird API: {e}") + return 0 + + for key in setup_keys: + key_id = key.get("id", "") + key_name = key.get("name", "") + used_times = key.get("used_times", 0) + last_used_str = key.get("last_used", "") + auto_groups = key.get("auto_groups", []) + + # Skip if not consumed + if used_times == 0: + continue + + # Skip if already processed + if state.is_processed(key_id): + continue + + # Skip system/default keys (optional - adjust pattern as needed) + if key_name.lower() in ("default", "all"): + state.mark_processed(key_id) + continue + + last_used = parse_timestamp(last_used_str) + if not last_used: + logger.warning(f"Key '{key_name}' ({key_id}) has invalid last_used: {last_used_str}") + state.mark_processed(key_id) + continue + + logger.info(f"Processing consumed key: '{key_name}' (used_times={used_times})") + + # Find matching peer + peer = find_matching_peer(key_name, last_used, auto_groups, peers, logger) + + if peer: + peer_id = peer.get("id", "") + old_name = peer.get("name", "unknown") + + if not peer_id: + logger.error(f"Peer has no ID, cannot rename") + state.mark_processed(key_id) + continue + + try: + api.rename_peer(peer_id, key_name) + logger.info(f"Renamed peer '{old_name}' -> '{key_name}' (id={peer_id})") + renamed_count += 1 + except Exception as e: + logger.error(f"Failed to rename peer {peer_id}: {e}") + else: + logger.warning( + f"No matching peer found for key '{key_name}'. " + f"Peer may not have enrolled yet or was already renamed." + ) + + # Mark as processed regardless (to avoid infinite retries) + state.mark_processed(key_id) + + return renamed_count + + +# ----------------------------------------------------------------------------- +# Main +# ----------------------------------------------------------------------------- + +def main(): + parser = argparse.ArgumentParser(description="NetBird Peer Renamer Watcher") + parser.add_argument("--url", help="NetBird management URL") + parser.add_argument("--token", help="NetBird API token") + parser.add_argument("--config", help="Path to config JSON file") + parser.add_argument("--state-file", default=DEFAULT_STATE_FILE, help="Path to state file") + parser.add_argument("--interval", type=int, default=DEFAULT_POLL_INTERVAL, help="Poll interval in seconds") + parser.add_argument("--once", action="store_true", help="Run once and exit (for cron)") + parser.add_argument("--verbose", "-v", action="store_true", help="Verbose logging") + args = parser.parse_args() + + logger = setup_logging(args.verbose) + + # Load config + url = args.url or os.environ.get("NETBIRD_URL") + token = args.token or os.environ.get("NETBIRD_TOKEN") + + if args.config: + try: + config = json.loads(Path(args.config).read_text()) + url = url or config.get("url") + token = token or config.get("token") + except Exception as e: + logger.error(f"Failed to load config file: {e}") + sys.exit(1) + + if not url or not token: + logger.error("NetBird URL and token are required. Use --url/--token, env vars, or --config") + sys.exit(1) + + # Initialize + api = NetBirdAPI(url, token) + state = State(args.state_file) + + logger.info(f"NetBird Watcher started (url={url}, interval={args.interval}s)") + + if args.once: + # Single run mode (for cron) + count = process_consumed_keys(api, state, logger) + logger.info(f"Processed {count} peer(s)") + else: + # Daemon mode + while True: + try: + count = process_consumed_keys(api, state, logger) + if count > 0: + logger.info(f"Processed {count} peer(s) this cycle") + except Exception as e: + logger.exception(f"Error in processing cycle: {e}") + + time.sleep(args.interval) + + +if __name__ == "__main__": + main()