added netbird-watcher script
All checks were successful
Terraform / terraform (push) Successful in 7s

This commit is contained in:
Prox
2026-02-15 19:11:39 +02:00
parent ec0d96f6a0
commit ca546ff6d8
10 changed files with 803 additions and 275 deletions

View File

@@ -1,204 +1,128 @@
# NetBird GitOps - Remaining Pain Points
# NetBird GitOps - Pain Points Status
This document captures challenges discovered during the POC that need resolution before production use.
## Summary
## Context
**Use case:** ~100+ operators, each with 2 devices (BlastPilot + BlastGS-Agent)
**Workflow:** Ticket-based onboarding, engineer creates PR, merge triggers setup key creation
**Current pain:** Manual setup key creation and peer renaming in dashboard
| # | Pain Point | Status |
|---|------------|--------|
| 1 | Peer naming after enrollment | **SOLVED** - Watcher service |
| 2 | Per-user vs per-role setup keys | **SOLVED** - One-off keys per user |
| 3 | Secure key distribution | Documented workflow |
---
## Pain Point 1: Peer Naming After Enrollment
## Pain Point 1: Peer Naming After Enrollment - SOLVED
### Problem
When a peer enrolls using a setup key, it appears in the NetBird dashboard with its hostname (e.g., `DESKTOP-ABC123` or `raspberrypi`). These hostnames are:
- Often generic and meaningless
- Not controllable via IaC (peer generates its own keypair locally)
- Confusing when managing 100+ devices
When a peer enrolls using a setup key, it appears with its hostname (e.g., `DESKTOP-ABC123`), not a meaningful name.
**Desired state:** Peer appears as `pilot-ivanov` or `gs-unit-042` immediately after enrollment.
### Solution
### Root Cause
**Watcher service** automatically renames peers:
NetBird's architecture requires peers to self-enroll:
1. Setup key defines which groups the peer joins
2. Peer runs `netbird up --setup-key <key>`
3. Peer generates WireGuard keypair locally
4. Peer registers with management server using its local hostname
5. **No API link between "which setup key was used" and "which peer enrolled"**
1. Setup key name = desired peer name (e.g., `pilot-ivanov`)
2. Operator enrolls -> peer appears as `DESKTOP-ABC123`
3. Watcher detects consumed key via API polling (every 30s)
4. Watcher finds peer created around key usage time
5. Watcher renames peer to match key name -> `pilot-ivanov`
### Options
**Implementation:** `watcher/netbird_watcher.py`
| Option | Description | Effort | Tradeoffs |
|--------|-------------|--------|-----------|
| **A. Manual rename** | Engineer renames peer in dashboard after enrollment | Zero | 30 seconds per device, human in loop |
| **B. Polling service** | Service watches for new peers, matches by timing/IP, renames | Medium | More infrastructure, heuristic matching |
| **C. Per-user tracking groups** | Unique group per user, find peer by group membership | High | Group sprawl, cleanup needed |
| **D. Installer modification** | Modify BlastPilot/BlastGS-Agent to set hostname before enrollment | N/A | Code freeze constraint |
**Deployment:**
```bash
cd ansible/netbird-watcher
ansible-playbook -i poc-inventory.yml playbook.yml -e vault_netbird_token=<TOKEN>
```
### Recommendation
**Option A** is acceptable for ~100 operators with ticket-based workflow:
- Ticket arrives -> engineer creates PR -> PR merges -> engineer sends setup key -> operator enrolls -> **engineer renames peer (30 sec)**
- Total engineer time per onboarding: ~5 minutes
- No additional infrastructure
**Option B** worth considering if:
- Onboarding volume increases significantly
- Full automation is required (no human in loop)
**How correlation works:**
- Watcher polls `GET /api/setup-keys` for keys with `used_times > 0`
- Gets `last_used` timestamp from the key
- Polls `GET /api/peers` for peers created within 60 seconds of that timestamp
- Renames matching peer via `PUT /api/peers/{id}`
- Marks key as processed to avoid re-processing
---
## Pain Point 2: Per-User vs Per-Role Setup Keys
## Pain Point 2: Per-User vs Per-Role Setup Keys - SOLVED
### Current State
### Problem
Setup keys are defined per-role in `terraform/setup_keys.tf`:
```hcl
resource "netbird_setup_key" "gs_onboarding" {
name = "ground-station-onboarding"
type = "reusable"
auto_groups = [netbird_group.ground_stations.id]
...
}
```
Reusable per-role keys (e.g., `pilot-onboarding`) don't provide:
- Audit trail (who enrolled which device?)
- Individual revocation
- Usage attribution
This means:
- One reusable key per role
- Key is shared across all operators of that role
- No way to track "this key was issued to Ivanov"
### Solution
### Problems
1. **No audit trail** - Can't answer "who enrolled device X?"
2. **Revocation is all-or-nothing** - Revoking `pilot-onboarding` affects everyone
3. **No usage attribution** - Can't enforce "one device per operator"
### Options
| Option | Description | Effort | Tradeoffs |
|--------|-------------|--------|-----------|
| **A. Accept per-role keys** | Current state, manual tracking in ticket system | Zero | No IaC-level audit trail |
| **B. Per-user setup keys** | Create key per onboarding request | Low | More keys to manage, cleanup needed |
| **C. One-off keys** | Each key has `usage_limit = 1` | Low | Key destroyed after use, good for audit |
### Recommendation
**Option C (one-off keys)** provides the best tradeoff:
- Create unique key per onboarding ticket
- Key auto-expires after first use
- Clear audit trail: key name links to ticket number
- Easy to implement:
**One-off keys per user/device:**
```hcl
# Example: ticket-based one-off key
resource "netbird_setup_key" "ticket_1234_pilot" {
name = "ticket-1234-pilot-ivanov"
type = "one-off"
resource "netbird_setup_key" "pilot_ivanov" {
name = "pilot-ivanov"
type = "one-off" # Single use
auto_groups = [netbird_group.pilots.id]
usage_limit = 1
ephemeral = false
}
```
**Workflow:**
1. Ticket ACHILLES-1234: "Onboard pilot Ivanov"
2. Engineer adds setup key `ticket-1234-pilot-ivanov` to Terraform
3. PR merged, key created
4. Engineer sends key to operator (see Pain Point 3)
5. Operator uses key, it's consumed
6. After enrollment, engineer renames peer to `pilot-ivanov`
**Benefits:**
- Key name = audit trail (linked to ticket/user)
- Key is consumed after single use
- Individual keys can be revoked before use
- Watcher uses key name as peer name automatically
---
## Pain Point 3: Secure Key Distribution
### Problem
### Current Workflow
After CI/CD creates a setup key, how does it reach the operator?
1. CI/CD creates setup key
2. Engineer retrieves key locally: `terraform output -raw pilot_ivanov_key`
3. Engineer sends key to operator via secure channel (Signal, encrypted email)
4. Operator uses key within expiry window
Setup keys are sensitive:
- Anyone with the key can enroll a device into the network
- Keys may be reusable (depends on configuration)
- Keys should be transmitted securely
### Considerations
### Current State
- Keys are sensitive - anyone with key can enroll a device
- One-off keys mitigate risk - single use, can't be reused if leaked
- Short expiry (7 days) limits exposure window
Setup keys are output by Terraform:
```bash
terraform output -raw gs_setup_key
```
### Future Improvements (If Needed)
But:
- Requires local Terraform access
- No automated distribution mechanism
- Keys in state file (committed to git in POC - not ideal)
| Option | Description |
|--------|-------------|
| Ticket integration | CI posts key directly to ticket system |
| Secrets manager | Store in Vault/1Password, notify engineer |
| Self-service portal | Operator requests key, gets it directly |
### Options
| Option | Description | Effort | Tradeoffs |
|--------|-------------|--------|-----------|
| **A. Manual retrieval** | Engineer runs `terraform output` locally | Zero | Requires CLI access, manual process |
| **B. CI output to ticket** | CI posts key to ticket system via API | Medium | Keys in ticket history (audit trail) |
| **C. Secrets manager** | Store keys in Vault/1Password, notify engineer | Medium | Another system to integrate |
| **D. Encrypted email** | CI encrypts key, emails to operator | High | Key management complexity |
### Recommendation
**Option A** for now (consistent with manual rename):
- Engineer retrieves key after CI completes
- Engineer sends key to operator via secure channel (Signal, encrypted email)
- Ticket updated with "key sent" status
**Option B** worth implementing if:
- Volume increases
- Want full automation
- Ticket system has secure "hidden fields" feature
For ~100 operators with ticket-based workflow, manual retrieval is acceptable.
---
## Summary: Recommended Workflow
Given the constraints (code freeze, ~100 operators, ticket-based), the pragmatic workflow is:
## Final Workflow
```
1. Ticket created: "Onboard pilot Ivanov with BlastPilot + GS"
1. Ticket: "Onboard pilot Ivanov with BlastPilot"
2. Engineer adds to Terraform:
- ticket-1234-pilot (one-off, 7 days)
- ticket-1234-gs (one-off, 7 days)
2. Engineer adds to terraform/setup_keys.tf:
- netbird_setup_key.pilot_ivanov (one-off, 7 days)
3. Engineer creates PR, gets review, merges
3. Engineer creates PR -> CI shows plan
4. CI/CD applies changes, keys created
4. PR merged -> CI applies -> key created
5. Engineer retrieves keys:
terraform output -raw ticket_1234_pilot_key
5. Engineer retrieves: terraform output -raw pilot_ivanov_key
6. Engineer sends keys to operator via secure channel
6. Engineer sends key to operator via Signal/email
7. Operator enrolls both devices
7. Operator installs NetBird, enrolls with key
8. Engineer renames peers in dashboard:
DESKTOP-ABC123 -> pilot-ivanov
raspberrypi -> gs-ivanov
8. Watcher auto-renames peer to "pilot-ivanov"
9. Engineer closes ticket
9. Ticket closed
```
**Total engineer time:** ~10 minutes per onboarding (pair of devices)
**Automation level:** Groups, policies, key creation automated; naming and distribution manual
---
## Future Improvements (If Needed)
1. **Webhook listener** for peer enrollment events -> auto-rename based on timing correlation
2. **Ticket system integration** for automated key distribution
3. **Custom installer** that prompts for device name before enrollment
4. **Batch onboarding tool** for multiple operators at once
These can be addressed incrementally as the operation scales.
**Engineer time:** ~2 minutes (Terraform edit + key retrieval + send)
**Automation:** Full - groups, policies, keys, peer naming all automated

244
README.md
View File

@@ -5,7 +5,7 @@ Proof-of-concept for managing NetBird VPN configuration via Infrastructure as Co
## Project Status: POC Complete
**Start date:** 2026-02-15
**Status:** Core functionality working, remaining pain points documented
**Status:** Full automation implemented including peer auto-naming
### What Works
@@ -14,13 +14,7 @@ Proof-of-concept for managing NetBird VPN configuration via Infrastructure as Co
- [x] Gitea Actions runner for CI/CD
- [x] Terraform implementation - creates groups, policies, setup keys
- [x] CI/CD pipeline - PR shows plan, merge-to-main applies changes
### Remaining Pain Points
See [PAIN_POINTS.md](./PAIN_POINTS.md) for detailed analysis of:
- Peer naming automation (no link between setup keys and enrolled peers)
- Per-user vs per-role setup keys
- Secure key distribution to operators
- [x] **Watcher service** - automatically renames peers based on setup key names
---
@@ -29,185 +23,183 @@ See [PAIN_POINTS.md](./PAIN_POINTS.md) for detailed analysis of:
```
+-------------------+ PR/Merge +-------------------+
| Engineer | ----------------> | Gitea |
| (edits .tf) | | (gitea-poc.*) |
+-------------------+ +-------------------+
|
| CI/CD
| (creates setup | | (CI/CD) |
| key: pilot-X) | +-------------------+
+-------------------+ |
| terraform apply
v
+-------------------+
| Terraform |
| (in Actions) |
+-------------------+
|
| API calls
v
+-------------------+ Enroll +-------------------+
| Operators | ----------------> | NetBird |
| (use setup keys) | | (netbird-poc.*) |
+-------------------+ +-------------------+
| Watcher Service | <---- polls ----> | NetBird API |
| (auto-rename) | +-------------------+
+-------------------+ ^
| enrolls
+-------------------+ |
| Operator | -------------------------+
| (uses setup key) | peer appears as "DESKTOP-XYZ"
+-------------------+ watcher renames to "pilot-X"
```
## Complete Workflow
1. **Ticket arrives:** "Onboard pilot Ivanov"
2. **Engineer adds to Terraform:**
```hcl
resource "netbird_setup_key" "pilot_ivanov" {
name = "pilot-ivanov" # <-- This becomes the peer name
type = "one-off"
auto_groups = [netbird_group.pilots.id]
usage_limit = 1
}
```
3. **Engineer creates PR** -> CI runs `terraform plan`
4. **PR merged** -> CI runs `terraform apply` -> setup key created
5. **Engineer retrieves key:** `terraform output -raw pilot_ivanov_key`
6. **Engineer sends key to operator** (via secure channel)
7. **Operator enrolls** -> peer appears as `DESKTOP-ABC123`
8. **Watcher detects** consumed key, renames peer to `pilot-ivanov`
9. **Done** - peer is correctly named, no manual intervention
---
## Directory Structure
```
netbird-gitops-poc/
├── ansible/ # Deployment playbooks
│ ├── caddy/ # Shared reverse proxy
│ ├── gitea/ # Standalone Gitea (no OAuth)
│ ├── gitea/ # Standalone Gitea
│ ├── gitea-runner/ # Gitea Actions runner
── netbird/ # NetBird with embedded IdP
├── terraform/ # Terraform configuration (Gitea repo content)
── netbird/ # NetBird server
│ └── netbird-watcher/ # Peer renamer service
├── terraform/ # Terraform configuration
│ ├── .gitea/workflows/ # CI/CD workflow
── terraform.yml
│ ├── main.tf # Provider config
│ ├── variables.tf # Input variables
│ ├── groups.tf # Group resources
── policies.tf # Policy resources
│ ├── setup_keys.tf # Setup key resources
│ ├── outputs.tf # Output values
│ ├── terraform.tfstate # State (committed for POC)
── terraform.tfvars # Secrets (gitignored)
│ └── terraform.tfvars.example
── main.tf
│ ├── groups.tf
│ ├── policies.tf
│ ├── setup_keys.tf
── outputs.tf
├── watcher/ # Watcher service source
│ ├── netbird_watcher.py
│ ├── netbird-watcher.service
── README.md
├── README.md
└── PAIN_POINTS.md
```
## Quick Start
---
## Deployment
### Prerequisites
- VPS with Docker
- DNS records pointing to VPS
- Ansible installed locally
- Terraform installed locally (for initial setup)
- Terraform installed locally
### 1. Deploy Infrastructure
### 1. Deploy Core Infrastructure
```bash
# 1. NetBird (generates secrets, needs vault password)
# NetBird
cd ansible/netbird
./generate-vault.sh
ansible-vault encrypt group_vars/vault.yml
ansible-playbook -i poc-inventory.yml playbook-ssl.yml --ask-vault-pass
# 2. Gitea
# Gitea
cd ../gitea
ansible-playbook -i poc-inventory.yml playbook.yml
# 3. Caddy (reverse proxy for both)
# Caddy (reverse proxy)
cd ../caddy
ansible-playbook -i poc-inventory.yml playbook.yml
# 4. Gitea Runner (get token from Gitea Admin -> Actions -> Runners)
# Gitea Runner
cd ../gitea-runner
ansible-playbook -i poc-inventory.yml playbook.yml -e vault_gitea_runner_token=<TOKEN>
```
### 2. Initial Terraform Setup (Local)
### 2. Deploy Watcher Service
```bash
cd ansible/netbird-watcher
ansible-playbook -i poc-inventory.yml playbook.yml -e vault_netbird_token=<TOKEN>
```
### 3. Initialize Terraform
```bash
cd terraform
# Create tfvars with your NetBird PAT
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars with actual token
# Initialize and apply
# Edit terraform.tfvars with NetBird PAT
terraform init
terraform apply
```
### 3. Push to Gitea
### 4. Configure Gitea
Push terraform directory to Gitea repo, configure secret `NETBIRD_TOKEN`.
---
## Adding a New Operator
1. Add setup key to `terraform/setup_keys.tf`:
```hcl
resource "netbird_setup_key" "pilot_ivanov" {
name = "pilot-ivanov"
type = "one-off"
auto_groups = [netbird_group.pilots.id]
usage_limit = 1
ephemeral = false
}
output "pilot_ivanov_key" {
value = netbird_setup_key.pilot_ivanov.key
sensitive = true
}
```
2. Commit, push, merge PR
3. Retrieve key:
```bash
terraform output -raw pilot_ivanov_key
```
4. Send key to operator
5. Operator enrolls -> watcher auto-renames peer
---
## Monitoring
### Watcher Service
```bash
cd terraform
git init
git add .
git commit -m "Initial Terraform config"
git remote add origin git@gitea-poc.networkmonitor.cc:admin/netbird-iac.git
git push -u origin main
# Status
systemctl status netbird-watcher
# Logs
journalctl -u netbird-watcher -f
# Processed keys
cat /var/lib/netbird-watcher/state.json
```
### 4. Configure Gitea Secrets
In Gitea repository Settings -> Actions -> Secrets:
- `NETBIRD_TOKEN`: Your NetBird PAT
### 5. Make Changes via GitOps
Edit Terraform files locally, push to create PR:
```hcl
# groups.tf - add a new group
resource "netbird_group" "new_team" {
name = "new-team"
}
```
```bash
git checkout -b add-new-team
git add groups.tf
git commit -m "Add new-team group"
git push -u origin add-new-team
# Create PR in Gitea -> CI runs terraform plan
# Merge PR -> CI runs terraform apply
```
---
## CI/CD Workflow
The `.gitea/workflows/terraform.yml` workflow:
| Event | Action |
|-------|--------|
| Pull Request | `terraform plan` (preview changes) |
| Push to main | `terraform apply` (apply changes) |
| After apply | Commit updated state file |
**State Management:** State is committed to git (acceptable for single-operator POC). For production, use a remote backend.
---
## Key Discoveries
### NetBird API Behavior
1. **Peer IDs are not predictable** - Generated server-side at enrollment time
2. **No setup key -> peer link** - NetBird doesn't record which setup key enrolled a peer
3. **Peers self-enroll** - Cannot create peers via API (WireGuard keypair generated locally)
4. **Terraform URL format** - Use `https://domain.com` NOT `https://domain.com/api`
---
## Credentials Reference (POC Only)
| Service | Credential | Location |
|---------|------------|----------|
| NetBird PAT | `nbp_T3yD...` | Dashboard -> Team -> Service Users |
| Gitea | admin user | Created during setup |
| VPS | root | `observability-poc.networkmonitor.cc` |
**Warning:** Rotate all credentials before any production use.
---
## Cleanup
```bash
# Destroy Terraform resources
cd terraform
terraform destroy
cd terraform && terraform destroy
# Stop VPS services
# Stop services on VPS
ssh root@observability-poc.networkmonitor.cc
systemctl stop netbird-watcher
cd /opt/caddy && docker compose down
cd /opt/gitea && docker compose down
cd /opt/netbird && docker compose down
```
---
## Next Steps
See [PAIN_POINTS.md](./PAIN_POINTS.md) for remaining challenges to address before production use.

View File

@@ -0,0 +1,8 @@
---
# NetBird Watcher Configuration
# NetBird management URL
netbird_url: "https://netbird-poc.networkmonitor.cc"
# NetBird API token (set via -e vault_netbird_token=<TOKEN>)
netbird_token: "{{ vault_netbird_token }}"

View File

@@ -0,0 +1,117 @@
---
# =============================================================================
# NetBird Watcher Deployment
# =============================================================================
# Deploys the peer renamer watcher service.
#
# Prerequisites:
# 1. NetBird instance running
# 2. NetBird API token (PAT)
#
# Usage:
# ansible-playbook -i poc-inventory.yml playbook.yml -e vault_netbird_token=<TOKEN>
# =============================================================================
- name: Deploy NetBird Watcher
hosts: netbird_watcher_servers
become: true
vars_files:
- group_vars/netbird_watcher_servers.yml
pre_tasks:
- name: Validate token is provided
ansible.builtin.assert:
that:
- netbird_token is defined
- netbird_token | length > 0
fail_msg: |
NetBird token not provided!
Run with: -e vault_netbird_token=<TOKEN>
tasks:
# =========================================================================
# Create Directories
# =========================================================================
- name: Create watcher directories
ansible.builtin.file:
path: "{{ item }}"
state: directory
mode: "0755"
loop:
- /opt/netbird-watcher
- /etc/netbird-watcher
- /var/lib/netbird-watcher
# =========================================================================
# Deploy Script
# =========================================================================
- name: Copy watcher script
ansible.builtin.copy:
src: ../../watcher/netbird_watcher.py
dest: /opt/netbird-watcher/netbird_watcher.py
mode: "0755"
# =========================================================================
# Configure
# =========================================================================
- name: Create config file
ansible.builtin.template:
src: templates/config.json.j2
dest: /etc/netbird-watcher/config.json
mode: "0600"
notify: Restart netbird-watcher
# =========================================================================
# Systemd Service
# =========================================================================
- name: Copy systemd service
ansible.builtin.copy:
src: ../../watcher/netbird-watcher.service
dest: /etc/systemd/system/netbird-watcher.service
mode: "0644"
notify: Restart netbird-watcher
- name: Reload systemd
ansible.builtin.systemd:
daemon_reload: true
- name: Start and enable watcher service
ansible.builtin.systemd:
name: netbird-watcher
state: started
enabled: true
# =========================================================================
# Verify
# =========================================================================
- name: Wait for service to stabilize
ansible.builtin.pause:
seconds: 3
- name: Check service status
ansible.builtin.systemd:
name: netbird-watcher
register: watcher_status
- name: Display deployment status
ansible.builtin.debug:
msg: |
============================================
NetBird Watcher Deployed!
============================================
Service status: {{ watcher_status.status.ActiveState }}
View logs:
journalctl -u netbird-watcher -f
State file:
/var/lib/netbird-watcher/state.json
============================================
handlers:
- name: Restart netbird-watcher
ansible.builtin.systemd:
name: netbird-watcher
state: restarted

View File

@@ -0,0 +1,8 @@
---
all:
children:
netbird_watcher_servers:
hosts:
poc-vps:
ansible_host: observability-poc.networkmonitor.cc
ansible_user: root

View File

@@ -0,0 +1,4 @@
{
"url": "{{ netbird_url }}",
"token": "{{ netbird_token }}"
}

104
watcher/README.md Normal file
View File

@@ -0,0 +1,104 @@
# NetBird Peer Renamer Watcher
Automatically renames NetBird peers after enrollment based on setup key names.
## How It Works
1. Engineer creates setup key named after the desired peer name (e.g., `pilot-ivanov`)
2. Operator enrolls using the setup key
3. Peer appears with random hostname (e.g., `DESKTOP-ABC123`)
4. **Watcher detects the consumed setup key and renames peer to `pilot-ivanov`**
## Logic
The watcher polls NetBird API every 30 seconds:
1. Fetches all setup keys
2. Finds keys with `used_times > 0` that haven't been processed
3. For each consumed key:
- Looks up `last_used` timestamp
- Finds peer created around that time (within 60 seconds)
- Renames peer to match setup key name
4. Marks key as processed to avoid re-processing
## Installation
### Via Ansible
```bash
cd ansible/netbird-watcher
ansible-playbook -i poc-inventory.yml playbook.yml -e vault_netbird_token=<TOKEN>
```
### Manual
```bash
# Copy script
sudo cp netbird_watcher.py /opt/netbird-watcher/
sudo chmod +x /opt/netbird-watcher/netbird_watcher.py
# Create config
sudo mkdir -p /etc/netbird-watcher
sudo cat > /etc/netbird-watcher/config.json << EOF
{
"url": "https://netbird-poc.networkmonitor.cc",
"token": "nbp_YOUR_TOKEN"
}
EOF
sudo chmod 600 /etc/netbird-watcher/config.json
# Create state directory
sudo mkdir -p /var/lib/netbird-watcher
# Install service
sudo cp netbird-watcher.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now netbird-watcher
```
## Usage
### Check status
```bash
systemctl status netbird-watcher
journalctl -u netbird-watcher -f
```
### Run manually (one-shot)
```bash
./netbird_watcher.py \
--url https://netbird-poc.networkmonitor.cc \
--token nbp_xxx \
--once \
--verbose
```
### State file
Processed keys are tracked in `/var/lib/netbird-watcher/state.json`:
```json
{
"processed_keys": ["key-id-1", "key-id-2"]
}
```
To reprocess a key, remove its ID from this file.
## Troubleshooting
### Peer not renamed
1. Check if setup key was consumed: `used_times > 0`
2. Check watcher logs: `journalctl -u netbird-watcher`
3. Ensure peer enrolled within 60 seconds of key consumption
4. Check if key was already processed (in state.json)
### Reset state
```bash
sudo rm /var/lib/netbird-watcher/state.json
sudo systemctl restart netbird-watcher
```

View File

@@ -0,0 +1,4 @@
{
"url": "https://netbird-poc.networkmonitor.cc",
"token": "nbp_YOUR_TOKEN_HERE"
}

View File

@@ -0,0 +1,19 @@
[Unit]
Description=NetBird Peer Renamer Watcher
After=network.target
[Service]
Type=simple
ExecStart=/opt/netbird-watcher/netbird_watcher.py --config /etc/netbird-watcher/config.json
Restart=always
RestartSec=10
# Security hardening
NoNewPrivileges=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/var/lib/netbird-watcher
PrivateTmp=true
[Install]
WantedBy=multi-user.target

348
watcher/netbird_watcher.py Normal file
View File

@@ -0,0 +1,348 @@
#!/usr/bin/env python3
"""
NetBird Peer Renamer Watcher
Polls NetBird API for consumed setup keys and automatically renames
the enrolled peers to match the setup key name.
Setup key name = desired peer name.
Usage:
./netbird_watcher.py --config /etc/netbird-watcher/config.json
./netbird_watcher.py --url https://netbird.example.com --token nbp_xxx
Environment variables (alternative to flags):
NETBIRD_URL - NetBird management URL
NETBIRD_TOKEN - NetBird API token (PAT)
"""
import argparse
import json
import logging
import os
import sys
import time
from datetime import datetime, timezone
from pathlib import Path
from typing import Optional
from urllib.error import HTTPError, URLError
from urllib.request import Request, urlopen
# -----------------------------------------------------------------------------
# Configuration
# -----------------------------------------------------------------------------
DEFAULT_STATE_FILE = "/var/lib/netbird-watcher/state.json"
DEFAULT_POLL_INTERVAL = 30 # seconds
PEER_MATCH_WINDOW = 60 # seconds - how close peer creation must be to key usage
# -----------------------------------------------------------------------------
# Logging
# -----------------------------------------------------------------------------
def setup_logging(verbose: bool = False) -> logging.Logger:
level = logging.DEBUG if verbose else logging.INFO
logging.basicConfig(
level=level,
format="%(asctime)s [%(levelname)s] %(message)s",
datefmt="%Y-%m-%d %H:%M:%S",
)
return logging.getLogger("netbird-watcher")
# -----------------------------------------------------------------------------
# State Management
# -----------------------------------------------------------------------------
class State:
"""Tracks which setup keys have been processed."""
def __init__(self, state_file: str):
self.state_file = Path(state_file)
self.processed_keys: set[str] = set()
self._load()
def _load(self):
if self.state_file.exists():
try:
data = json.loads(self.state_file.read_text())
self.processed_keys = set(data.get("processed_keys", []))
except (json.JSONDecodeError, IOError) as e:
logging.warning(f"Failed to load state file: {e}")
self.processed_keys = set()
def save(self):
self.state_file.parent.mkdir(parents=True, exist_ok=True)
data = {"processed_keys": list(self.processed_keys)}
self.state_file.write_text(json.dumps(data, indent=2))
def is_processed(self, key_id: str) -> bool:
return key_id in self.processed_keys
def mark_processed(self, key_id: str):
self.processed_keys.add(key_id)
self.save()
# -----------------------------------------------------------------------------
# NetBird API Client
# -----------------------------------------------------------------------------
class NetBirdAPI:
"""Simple NetBird API client."""
def __init__(self, url: str, token: str):
self.base_url = url.rstrip("/")
self.token = token
self.logger = logging.getLogger("netbird-api")
def _request(self, method: str, endpoint: str, data: Optional[dict] = None) -> dict:
url = f"{self.base_url}/api{endpoint}"
headers = {
"Authorization": f"Token {self.token}",
"Content-Type": "application/json",
}
body = json.dumps(data).encode() if data else None
req = Request(url, data=body, headers=headers, method=method)
try:
with urlopen(req, timeout=30) as resp:
return json.loads(resp.read().decode())
except HTTPError as e:
self.logger.error(f"HTTP {e.code} for {method} {endpoint}: {e.read().decode()}")
raise
except URLError as e:
self.logger.error(f"URL error for {method} {endpoint}: {e}")
raise
def get_setup_keys(self) -> list[dict]:
result = self._request("GET", "/setup-keys")
return result if isinstance(result, list) else []
def get_peers(self) -> list[dict]:
result = self._request("GET", "/peers")
return result if isinstance(result, list) else []
def rename_peer(self, peer_id: str, new_name: str) -> dict:
return self._request("PUT", f"/peers/{peer_id}", {"name": new_name})
# -----------------------------------------------------------------------------
# Watcher Logic
# -----------------------------------------------------------------------------
def parse_timestamp(ts: str) -> Optional[datetime]:
"""Parse NetBird timestamp to datetime."""
if not ts or ts.startswith("0001-01-01"):
return None
try:
# Handle various formats
ts = ts.replace("Z", "+00:00")
if "." in ts:
# Truncate nanoseconds to microseconds
parts = ts.split(".")
frac = parts[1]
tz_idx = frac.find("+") if "+" in frac else frac.find("-") if "-" in frac else len(frac)
frac_sec = frac[:tz_idx][:6] # max 6 digits for microseconds
tz_part = frac[tz_idx:] if tz_idx < len(frac) else "+00:00"
ts = f"{parts[0]}.{frac_sec}{tz_part}"
return datetime.fromisoformat(ts)
except ValueError:
return None
def find_matching_peer(
key_name: str,
key_last_used: datetime,
key_auto_groups: list[str],
peers: list[dict],
logger: logging.Logger,
) -> Optional[dict]:
"""
Find the peer that enrolled using this setup key.
Strategy:
1. Peer must be in one of the key's auto_groups
2. Peer must have been created within PEER_MATCH_WINDOW of key usage
3. Peer name should NOT already match key name (not yet renamed)
4. Pick the closest match by creation time
"""
candidates = []
for peer in peers:
peer_name = peer.get("name", "")
peer_id = peer.get("id", "")
peer_groups = [g.get("id") for g in peer.get("groups", [])]
# Skip if already renamed to target name
if peer_name == key_name:
logger.debug(f"Peer {peer_id} already named '{key_name}', skipping")
continue
# Check group membership
if not any(g in peer_groups for g in key_auto_groups):
continue
# Check creation time
# Note: NetBird peer object doesn't have 'created_at' directly accessible
# We use 'last_seen' or 'connected' status as proxy
# Actually, let's use the peer's 'connected' first time or 'last_seen'
# Looking at NetBird API, peers have 'last_seen' but not always 'created_at'
# For newly enrolled peers, last_seen should be very recent
peer_last_seen = parse_timestamp(peer.get("last_seen", ""))
if peer_last_seen:
time_diff = abs((peer_last_seen - key_last_used).total_seconds())
if time_diff <= PEER_MATCH_WINDOW:
candidates.append((peer, time_diff))
logger.debug(
f"Candidate peer: {peer_name} ({peer_id}), "
f"last_seen={peer_last_seen}, diff={time_diff:.1f}s"
)
if not candidates:
return None
# Return closest match
candidates.sort(key=lambda x: x[1])
return candidates[0][0]
def process_consumed_keys(
api: NetBirdAPI,
state: State,
logger: logging.Logger,
) -> int:
"""
Process all consumed setup keys and rename their peers.
Returns count of peers renamed.
"""
renamed_count = 0
try:
setup_keys = api.get_setup_keys()
peers = api.get_peers()
except Exception as e:
logger.error(f"Failed to fetch data from NetBird API: {e}")
return 0
for key in setup_keys:
key_id = key.get("id", "")
key_name = key.get("name", "")
used_times = key.get("used_times", 0)
last_used_str = key.get("last_used", "")
auto_groups = key.get("auto_groups", [])
# Skip if not consumed
if used_times == 0:
continue
# Skip if already processed
if state.is_processed(key_id):
continue
# Skip system/default keys (optional - adjust pattern as needed)
if key_name.lower() in ("default", "all"):
state.mark_processed(key_id)
continue
last_used = parse_timestamp(last_used_str)
if not last_used:
logger.warning(f"Key '{key_name}' ({key_id}) has invalid last_used: {last_used_str}")
state.mark_processed(key_id)
continue
logger.info(f"Processing consumed key: '{key_name}' (used_times={used_times})")
# Find matching peer
peer = find_matching_peer(key_name, last_used, auto_groups, peers, logger)
if peer:
peer_id = peer.get("id", "")
old_name = peer.get("name", "unknown")
if not peer_id:
logger.error(f"Peer has no ID, cannot rename")
state.mark_processed(key_id)
continue
try:
api.rename_peer(peer_id, key_name)
logger.info(f"Renamed peer '{old_name}' -> '{key_name}' (id={peer_id})")
renamed_count += 1
except Exception as e:
logger.error(f"Failed to rename peer {peer_id}: {e}")
else:
logger.warning(
f"No matching peer found for key '{key_name}'. "
f"Peer may not have enrolled yet or was already renamed."
)
# Mark as processed regardless (to avoid infinite retries)
state.mark_processed(key_id)
return renamed_count
# -----------------------------------------------------------------------------
# Main
# -----------------------------------------------------------------------------
def main():
parser = argparse.ArgumentParser(description="NetBird Peer Renamer Watcher")
parser.add_argument("--url", help="NetBird management URL")
parser.add_argument("--token", help="NetBird API token")
parser.add_argument("--config", help="Path to config JSON file")
parser.add_argument("--state-file", default=DEFAULT_STATE_FILE, help="Path to state file")
parser.add_argument("--interval", type=int, default=DEFAULT_POLL_INTERVAL, help="Poll interval in seconds")
parser.add_argument("--once", action="store_true", help="Run once and exit (for cron)")
parser.add_argument("--verbose", "-v", action="store_true", help="Verbose logging")
args = parser.parse_args()
logger = setup_logging(args.verbose)
# Load config
url = args.url or os.environ.get("NETBIRD_URL")
token = args.token or os.environ.get("NETBIRD_TOKEN")
if args.config:
try:
config = json.loads(Path(args.config).read_text())
url = url or config.get("url")
token = token or config.get("token")
except Exception as e:
logger.error(f"Failed to load config file: {e}")
sys.exit(1)
if not url or not token:
logger.error("NetBird URL and token are required. Use --url/--token, env vars, or --config")
sys.exit(1)
# Initialize
api = NetBirdAPI(url, token)
state = State(args.state_file)
logger.info(f"NetBird Watcher started (url={url}, interval={args.interval}s)")
if args.once:
# Single run mode (for cron)
count = process_consumed_keys(api, state, logger)
logger.info(f"Processed {count} peer(s)")
else:
# Daemon mode
while True:
try:
count = process_consumed_keys(api, state, logger)
if count > 0:
logger.info(f"Processed {count} peer(s) this cycle")
except Exception as e:
logger.exception(f"Error in processing cycle: {e}")
time.sleep(args.interval)
if __name__ == "__main__":
main()