This is the multi-page printable view of this section. Click here to print.
AgileTV CDN Manager (esb3027)
- 1: Getting Started
- 2: System Requirements Guide
- 3: Networking Guide
- 4: Architecture Guide
- 5: Installation Guide
- 5.1: Installation Checklist
- 5.2: Single-Node Installation
- 5.3: Multi-Node Installation
- 5.4: Air-Gapped Deployment Guide
- 5.5: Upgrade Guide
- 5.6: Next Steps
- 6: Configuration Guide
- 7: Performance Tuning Guide
- 8: Operations Guide
- 9: Metrics & Monitoring Guide
- 10: API Guide
- 10.1: Authentication API
- 10.2: Health API
- 10.3: Selection Input API
- 10.4: Data Store API
- 10.5: Subnets API
- 10.6: Routing API
- 10.7: Discovery API
- 10.8: Metrics API
- 10.9: Configuration API
- 10.10: Operator UI API
- 10.11: OpenAPI Specification
- 11: Troubleshooting Guide
- 12: Glossary
1 - Getting Started
Overview
The AgileTV CDN Manager (product code ESB3027) is a cloud-native control plane for managing CDN deployments. It provides centralized orchestration for authentication, configuration, routing, and metrics collection across CDN infrastructure.
Before You Start:
- Deployment type: Lab (single-node) or Production (multi-node)? See Installation Guide
- Hardware: Nodes meeting specifications for your deployment type
- OS: RHEL 9 or compatible clone (Oracle Linux, AlmaLinux, Rocky Linux)
- Software: Installation ISO from AgileTV customer portal; Extras ISO for air-gapped
- Network: Firewall ports configured per Networking Guide
Deployment Models
| Deployment Model | Description | Typical Use Case |
|---|---|---|
| Self-Hosted | K3s Kubernetes cluster on customer premises | Production deployments |
| Lab/Single-Node | Minimal single-node installation | Acceptance testing, demonstrations, development |
Functionality remains consistent across deployment models.
Prerequisites
- Installation ISO: Obtain
esb3027-acd-manager-X.Y.Z.isofrom AgileTV customer portal - Extras ISO (air-gapped): Obtain
esb3027-acd-manager-extras-X.Y.Z.isofor offline installations - OS: RHEL 9 or compatible clone (Oracle Linux, AlmaLinux, Rocky Linux)
- Kubernetes familiarity: Basic understanding of pods, deployments, and Helm charts
For detailed hardware, network, and operating system requirements, see the System Requirements Guide.
Installation
Ready to install? The Installation Guide provides step-by-step procedures for both lab and production deployments:
- Lab/Single-Node: Quick deployment for testing and demonstrations
- Production/Multi-Node: High-availability cluster with 3+ nodes
See the Installation Guide to get started.
Accessing the System
Following successful deployment, the following interfaces are available:
| Service | URL Path | Authentication |
|---|---|---|
| MIB Frontend | /gui | Zitadel SSO |
| API Gateway | /api | Bearer token |
| Zitadel Console | /ui/console | See Glossary |
| Grafana | /grafana | See Glossary |
All services are accessed via https://<cluster-host><path>.
Note: A self-signed SSL certificate is deployed by default. When accessing services through a browser, you will need to accept the self-signed certificate warning. For production deployments, configure a valid SSL certificate before exposing the system to users.
Initial user configuration is performed through Zitadel. Refer to the Configuration Guide for authentication setup procedures. For detailed guidance on managing users, roles, and permissions in the Zitadel Console, see Zitadel’s User Management Documentation.
Documentation Navigation
The following guides provide detailed information for specific operational tasks:
| Guide | Description |
|---|---|
| System Requirements | Hardware, operating system, and network specifications |
| Architecture | Detailed system architecture and scaling guidance |
| Installation | Step-by-step installation and upgrade procedures |
| Configuration | System configuration and customization |
| Performance Tuning | Optimization tips for improved performance |
| API Guide | REST API reference and integration examples |
| Operations | Day-to-day operational procedures |
| Metrics & Monitoring | Monitoring dashboards and alerting configuration |
| Troubleshooting | Common issues and resolution procedures |
| Glossary | Definitions of technical terms |
| Release Notes | Version-specific changes and known issues |
2 - System Requirements Guide
Overview
This document specifies the hardware, operating system, and networking requirements for deploying the AgileTV CDN Manager (ESB3027). Requirements vary based on deployment type and node role within the cluster.
Cluster Sizing
Production Deployments
Production deployments require a minimum of three nodes to achieve high availability. The cluster architecture employs distinct node roles:
| Role | Description |
|---|---|
| Server Node (Control Plane Only) | Runs control plane components (etcd, Kubernetes API server) only; does not host application workloads; requires separate Agent nodes |
| Server Node (Combined) | Runs control plane components and hosts application workloads; default configuration |
| Agent Node | Executes application workloads only; does not participate in cluster quorum |
Server nodes can be deployed in either Control Plane Only or Combined role configurations. The choice depends on your deployment requirements:
- Control Plane Only: Dedicated control plane nodes with lower resource requirements; requires separate Agent nodes for workloads
- Combined: Server nodes run both control plane and workloads; minimum 3 nodes required for HA
Why Use Control Plane Only Nodes?
Dedicated Control Plane Only nodes provide several benefits for larger deployments:
- Resource Isolation: Control plane components (etcd, API server, scheduler) run on dedicated hardware without competing with application workloads for CPU and memory
- Stability: Application workload spikes or misbehaving pods cannot impact control plane performance
- Security: Smaller attack surface on control plane nodes; fewer containers and services running
- Predictable Performance: Control plane responsiveness remains consistent regardless of application load
- Flexible Sizing: Control Plane Only nodes can use lower-specification hardware (2 cores, 4 GiB) since they don’t run application workloads
For most small to medium deployments, Combined role servers are simpler and more cost-effective. Control Plane Only nodes are recommended for larger deployments with significant workload requirements or where control plane stability is critical.
High Availability Considerations
Production deployments require 3 nodes running control plane (etcd) and 3 nodes capable of running workloads. These can be the same nodes (Combined role) or separate nodes (CP-Only + Agent).
Node Role Combinations:
| Configuration | Control Plane Nodes | Workload Nodes | Total Nodes |
|---|---|---|---|
| All Combined | 3 Combined servers | 3 Combined servers | 3 |
| Separated | 3 CP-Only servers | 3 Agent nodes | 6 |
| Hybrid | 2 CP-Only + 1 Combined | 1 Combined + 2 Agent | 5 |
Any combination works as long as you have 3 control plane nodes and 3 workload-capable nodes.
Note: Regardless of the deployment configuration, a minimum of 3 nodes capable of running workloads is required for production deployments. This ensures both high availability and sufficient capacity for application pods.
For detailed fault tolerance information and data replication strategies, see the Architecture Guide.
Hardware Requirements
Single-Node Lab Deployment
Lab deployments are intended for acceptance testing, demonstrations, and development only. These configurations are not suitable for production workloads.
| Resource | Minimum | Recommended |
|---|---|---|
| CPU | 8 cores | 12 cores |
| Memory | 16 GiB | 24 GiB |
| Disk* | 128 GiB | 128 GiB |
Production Cluster - Server Node (Control Plane Only)
Server nodes dedicated to control plane functions have modest resource requirements:
| Resource | Minimum | Recommended |
|---|---|---|
| CPU | 2 cores | 4 cores |
| Memory | 4 GiB | 8 GiB |
| Disk* | 64 GiB | 128 GiB |
These nodes run only control plane components and require separate Agent nodes to run application workloads.
Production Cluster - Server Node (Control Plane + Workloads)
Combined role nodes require resources for both control plane and application workloads:
| Resource | Minimum | Recommended |
|---|---|---|
| CPU | 16 cores | 24 cores |
| Memory | 32 GiB | 48 GiB |
| Disk* | 256 GiB | 256 GiB |
Production Cluster - Agent Node
Agent nodes execute application workloads and require the following resources:
| Resource | Minimum | Recommended |
|---|---|---|
| CPU | 4 cores | 8 cores |
| Memory | 6 GiB | 16 GiB |
| Disk* | 64 GiB | 128 GiB |
Storage Notes
* Disk Space: All disk space values must be available in the
/var/lib/longhornpartition. It is recommended that/var/lib/longhornbe a separate partition on a fast SSD for optimal performance, though SSD is not strictly required.Longhorn Capacity: Longhorn storage requires an additional 30% capacity headroom for internal operations and scaling. If less than 30% of the total partition capacity is available, Longhorn may mark volumes as “full” and prevent further writes. Plan disk capacity accordingly.
Storage Performance
For optimal performance, the following storage characteristics are recommended:
- Disk Type: SSD or NVMe storage for Longhorn volumes
- Filesystem: XFS or ext4 with default mount options
- Partition Layout: Dedicated
/var/lib/longhornpartition for persistent storage
Virtual machines and bare-metal hardware are both supported. Nested virtualization (running multiple nodes under a single hypervisor) may impact performance and is not recommended for production deployments.
Operating System Requirements
Supported Operating Systems
The CDN Manager supports Red Hat Enterprise Linux and compatible distributions:
| Operating System | Status |
|---|---|
| Red Hat Enterprise Linux 9 | Supported |
| Red Hat Enterprise Linux 10 | Untested |
| Red Hat Enterprise Linux 8 | Not supported |
Compatible Clones
The following RHEL-compatible distributions are supported when major version requirements are satisfied:
- Oracle Linux 9
- AlmaLinux 9
- Rocky Linux 9
Air-Gapped Deployments
Important: For air-gapped deployments (no internet access), the OS installation ISO must be mounted on all nodes before running the installer or join commands. The installer needs to install one or more packages from the distribution’s repository.
Oracle Linux UEK Kernel
Note: For Oracle Linux 9.7 and later using the Unbreakable Enterprise Kernel (UEK), you must install the
kernel-uek-modules-extra-netfilter-$(uname -r)package before running the installer:# Mount OS ISO first (required for air-gapped) mount -o loop /path/to/oracle-linux-9.iso /mnt/iso # Install required kernel modules dnf install kernel-uek-modules-extra-netfilter-$(uname -r)This package provides netfilter kernel modules required by K3s and Longhorn.
SELinux
SELinux is supported when installed in “Enforcing” mode. The installation process will configure appropriate SELinux policies automatically.
Networking Requirements
Network Interface
Each cluster node must have at least one network interface card (NIC) configured as the default gateway. If the node lacks a pre-configured default route, one must be established prior to installation.
Port Requirements
The cluster requires the following network connectivity:
| Category | Ports | Purpose |
|---|---|---|
| Inter-Node | 2379-2380, 6443, 8472/UDP, 10250, 5001, 9500, 8500 | etcd, API server, Flannel VXLAN, Kubelet, Spegel, Longhorn |
| External Access | 80, 443 | HTTP redirect and HTTPS ingress |
| Application (optional) | 6379, 8125 TCP/UDP, 9093, 9095 | Redis, Telegraf, Alertmanager, Kafka external |
Important: Complete port requirements, network ranges, and firewall configuration procedures are provided in the Networking Guide. Do not expose VictoriaMetrics (8428, 8429), Grafana (3000), or PostgreSQL (5432) directly—access these services only through the secure HTTPS ingress (port 443).
Resource Planning
Calculating Cluster Capacity
When planning cluster capacity, consider the following factors:
- Base Overhead: Kubernetes system components consume approximately 1-2 cores and 2-4 GiB memory per node
- Application Workloads: Refer to individual component resource requirements in the Architecture Guide
- Headroom: Maintain 20-30% resource headroom for workload spikes and automatic scaling
Scaling Considerations
The CDN Manager supports horizontal scaling for most components. The Horizontal Pod Autoscaler (HPA) can automatically adjust replica counts based on resource utilization. Detailed scaling guidance is available in the Architecture Guide.
Example Production Deployment
A minimal production deployment with 3 server nodes (combined role) and 2 agent nodes would require:
| Node Type | Count | CPU Total | Memory Total | Disk Total |
|---|---|---|---|---|
| Server (Combined) | 3 | 48 cores | 96 GiB | 768 GiB |
| Agent | 2 | 8 cores | 12 GiB | 128 GiB |
| Total | 5 | 56 cores | 108 GiB | 896 GiB |
This configuration provides:
- High availability (survives loss of 1 server node)
- Capacity for application workloads across all nodes
- Headroom for horizontal scaling
Next Steps
After verifying system requirements:
- Review the Installation Guide for deployment procedures
- Consult the Networking Guide for firewall configuration
- Examine the Architecture Guide for component resource requirements
3 - Networking Guide
Overview
This guide describes the network architecture and firewall configuration requirements for the AgileTV CDN Manager (ESB3027). Proper network configuration is essential for cluster communication and external access to services.
Note: The installer script automatically detects if firewalld is enabled. If so, it will verify that the required inter-node ports are open through the firewall in the default zone before proceeding. If any required ports are missing, the installer will report an error and exit. Application service ports (such as Kafka, VictoriaMetrics, and Telegraf) are not checked by the installer as they are configurable.
Network Architecture
Physical Network
Each cluster node must have at least one network interface card (NIC) configured as the default gateway. If the node lacks a pre-configured default route, one must be established prior to installation.
Overlay Network
Kubernetes creates virtual network interfaces for pods that are typically not associated with any specific firewalld zone. The cluster uses the following network ranges:
| Network | CIDR | Purpose |
|---|---|---|
| Pod Network | 10.42.0.0/16 | Inter-pod communication |
| Service Network | 10.43.0.0/16 | Kubernetes service discovery |
Firewall regulations should target the primary physical interface. The overlay network traffic is handled by Flannel VXLAN.
IP Routing
Proper IP routing is critical for cluster communication. Ensure your network infrastructure allows routing between all subnets used by the cluster.
Port Requirements
Inter-Node Communication
The following ports must be permitted between all cluster nodes for Kubernetes and cluster infrastructure:
| Port | Protocol | Source | Destination | Purpose |
|---|---|---|---|---|
| 2379-2380 | TCP | Server nodes | Server nodes | etcd cluster communication |
| 6443 | TCP | All nodes | Server nodes | Kubernetes API server |
| 8472 | UDP | All nodes | All nodes | Flannel VXLAN overlay network |
| 10250 | TCP | All nodes | All nodes | Kubelet metrics and management |
| 5001 | TCP | All nodes | Server nodes | Spegel registry mirror |
| 9500-9503 | TCP | All nodes | All nodes | Longhorn management API |
| 8500-8504 | TCP | All nodes | All nodes | Longhorn agent communication |
| 10000-30000 | TCP | All nodes | All nodes | Longhorn data replication |
| 3260 | TCP | All nodes | All nodes | Longhorn iSCSI |
| 2049 | TCP | All nodes | All nodes | Longhorn RWX (NFS) |
Application Services Ports
The following ports must be accessible for application services within the cluster:
| Port | Protocol | Service |
|---|---|---|
| 6379 | TCP | Redis |
| 9092 | TCP | Kafka (internal cluster communication) |
| 9093 | TCP | Kafka (controller) |
| 9094 | TCP | Kafka (internal) |
| 9095 | TCP | Kafka (external client connections) |
| 8428 | TCP | VictoriaMetrics (Analytics) |
| 8880 | TCP | VictoriaMetrics (Alerting) |
| 8429 | TCP | VictoriaMetrics (Billing) |
| 9093 | TCP | Alertmanager |
| 8125 | TCP/UDP | Telegraf (metrics collection) |
| 8080 | TCP | Telegraf (API/Metrics) |
| 8086 | TCP | Telegraf (API/Metrics) |
External Access Ports
The following ports must be accessible from external clients to cluster nodes:
| Port | Protocol | Service |
|---|---|---|
| 80 | TCP | HTTP ingress (Optional, redirects to HTTPS) |
| 443 | TCP | HTTPS ingress (Required, all services) |
| 9095 | TCP | Kafka (external client connections) |
| 6379 | TCP | Redis (external client connections) |
| 8125 | TCP/UDP | Telegraf (metrics collection) |
Firewall Configuration
firewalld Configuration
firewalld Configuration
For systems using firewalld, it is recommended to use separate zones for internal cluster traffic and external public access. This ensures that sensitive inter-node communication is restricted to the internal network.
Assign Interfaces to Zones: First, assign your network interfaces to the appropriate zones. For example, if
eth0is your public interface andeth1is your internal cluster interface:firewall-cmd --permanent --zone=public --add-interface=eth0 firewall-cmd --permanent --zone=internal --add-interface=eth1Configure Firewall Rules: The following commands configure the minimum required firewall rules.
# Inter-node communication (Internal Zone) firewall-cmd --permanent --zone=internal --add-port=2379-2380/tcp firewall-cmd --permanent --zone=internal --add-port=6443/tcp firewall-cmd --permanent --zone=internal --add-port=8472/udp firewall-cmd --permanent --zone=internal --add-port=10250/tcp firewall-cmd --permanent --zone=internal --add-port=5001/tcp firewall-cmd --permanent --zone=internal --add-port=9500-9503/tcp firewall-cmd --permanent --zone=internal --add-port=8500-8504/tcp firewall-cmd --permanent --zone=internal --add-port=10000-30000/tcp firewall-cmd --permanent --zone=internal --add-port=3260/tcp firewall-cmd --permanent --zone=internal --add-port=2049/tcp # Pod and service networks (Internal Zone) firewall-cmd --permanent --zone=internal --add-source=10.42.0.0/16 firewall-cmd --permanent --zone=internal --add-source=10.43.0.0/16 # External access (Public Zone) firewall-cmd --permanent --zone=public --add-port=80/tcp firewall-cmd --permanent --zone=public --add-port=443/tcp firewall-cmd --permanent --zone=public --add-port=9095/tcp firewall-cmd --permanent --zone=public --add-port=6379/tcp firewall-cmd --permanent --zone=public --add-port=8125/tcp firewall-cmd --permanent --zone=public --add-port=8125/udp # Apply changes firewall-cmd --reloadFor more restrictive configurations, you can scope rules to specific source subnets using
--add-source=<subnet>within the internal zone.
Internal Application Ports (Optional)
For internal cluster communication, the following ports may be opened if direct application access is required:
firewall-cmd --permanent --add-port=9092/tcp
Note: This port is used for internal Kafka cluster communication only.
Security Warning: Do not expose VictoriaMetrics (8428, 8429), or PostgreSQL (5432) directly. These services require authentication and their direct ports do not use TLS connections, creating a security risk. Always access these services through the secure HTTPS ingress (port 443).
Externally Accessible Application Ports: The following application ports are safe for external access and are already configured in the External Access section:
| Port | Service | Notes |
|---|---|---|
| 9095 | Kafka | External client connections |
| 6379 | Redis | External client connections |
| 8125 | Telegraf | Metrics collection |
Verification
Verify firewall rules are applied:
firewall-cmd --list-all
Verify ports are accessible between nodes:
# From one node, test connectivity to another
nc -zv <node-ip> 6443
nc -zv <node-ip> 8472
Kubernetes Port Forwarding
For accessing internal Kubernetes services that are not exposed via ingress or services, use kubectl port-forward to create a secure tunnel from your local machine to the service.
Basic Port Forwarding
# Forward local port to a service
kubectl port-forward -n <namespace> svc/<service-name> <local-port>:<service-port>
# Example: Forward local port 8080 to Grafana (port 3000)
kubectl port-forward -n default svc/acd-manager-grafana 8080:3000
Note: “Local” refers to the machine where you run kubectl. This can be:
- A Server node in the cluster (common for administrative tasks)
- A remote machine with
kubectlconfigured to access the cluster
Accessing the Forwarded Service
Once the port-forward is established, access the service at http://localhost:<local-port> from the machine where you ran kubectl port-forward.
If running on a Server node: To access the forwarded port from your local workstation, you need to:
- Ensure the firewall on the Server node allows traffic on the forwarded port from your network
- Use the Server node’s IP address instead of
localhostfrom your workstation
# From your workstation (if firewall allows)
curl http://<server-node-ip>:<local-port>
For simplicity, consider running port-forward from your local machine (if kubectl is configured for remote cluster access) rather than from a Server node.
Background Port Forwarding
To run port-forward in the background:
kubectl port-forward -n <namespace> svc/<service-name> <local-port>:<service-port> &
Security Considerations
Port forwarding is recommended for:
- Administrative interfaces (e.g., Longhorn UI) that should not be publicly exposed
- Debugging and troubleshooting internal services
- Temporary access to services without modifying ingress configuration
The port-forward tunnel remains active only while the kubectl port-forward command is running. Press Ctrl+C to terminate the tunnel.
Example: The Longhorn storage UI is intentionally not exposed via ingress due to security risks. Access it via port-forward:
kubectl port-forward -n longhorn-system svc/longhorn-frontend 8080:80Then navigate to
http://localhost:8080in your browser.
Network Security Considerations
Network Segmentation
For production deployments, consider network segmentation:
- Management Network: Dedicated network for Kubernetes control plane traffic
- Application Network: Separate network for application service traffic
- External Network: Public-facing network for ingress traffic
Traffic Encryption
- All external traffic uses HTTPS (TLS 1.2 or higher)
- Internal cluster traffic uses Flannel VXLAN encryption (if enabled)
- Database connections (PostgreSQL, Redis) are internal to the cluster
Access Control
- External access is limited to ports 80 and 443 by default
- Application service ports should not be exposed externally
- Use Kubernetes NetworkPolicies for fine-grained pod-to-pod traffic control
Troubleshooting
Nodes Cannot Communicate
Verify firewall rules allow inter-node traffic:
firewall-cmd --list-allTest connectivity between nodes:
ping <node-ip> nc -zv <node-ip> 6443Check network routing:
ip route
Pods Cannot Reach Services
Verify Flannel is running:
kubectl get pods -n kube-system | grep flannelCheck VXLAN interface:
ip link show flannel.1Verify pod network routes:
ip route | grep 10.42
External Access Fails
Verify ingress controller is running:
kubectl get pods -n kube-system | grep traefikCheck ingress configuration:
kubectl get ingressVerify external firewall allows ports 80 and 443
Next Steps
After configuring networking:
- Installation Guide - Proceed with cluster installation
- System Requirements - Review hardware and OS requirements
- Architecture Guide - Understand component communication patterns
4 - Architecture Guide
Overview
The AgileTV CDN Manager (ESB3027) is a cloud-native Kubernetes application designed for managing CDN operations. This guide provides a detailed description of the system architecture, component interactions, and scaling considerations.
High-Level Architecture
The CDN Manager follows a microservices architecture deployed on Kubernetes. The system is organized into logical layers:
graph LR
Clients[API Clients] --> Ingress[Ingress Controller]
Ingress --> Manager[Core Manager]
Ingress --> Frontend[MIB Frontend]
Ingress --> Grafana[Grafana]
Manager --> Redis[(Redis)]
Manager --> Kafka[(Kafka)]
Manager --> PostgreSQL[(PostgreSQL)]
Manager --> Zitadel[Zitadel IAM]
Manager --> Confd[Configuration Service]
Grafana --> VM[(VictoriaMetrics)]
Confd -.-> Gateway[NGinx Gateway]
Gateway --> Director[CDN Director]Component Architecture
Ingress Layer
The ingress layer manages all incoming traffic to the cluster:
| Component | Role |
|---|---|
| Ingress Controller | Primary ingress for all cluster traffic; routes requests to internal services based on path |
| NGinx Gateway | Reverse proxy for routing traffic to external CDN Directors; used by MIB Frontend to communicate with remote Confd instances on CDN Director nodes |
Traffic flow:
- API clients and Operator UI connect via the Ingress Controller at
/apiand/guipaths respectively - Grafana dashboards are accessed via the Ingress Controller at
/grafana - Zitadel authentication console is accessed via the Ingress Controller at
/ui/console - MIB Frontend uses NGinx Gateway when communicating with external Confd instances on CDN Director nodes
Application Services
The application layer contains the core CDN Manager services:
| Component | Role | Scaling |
|---|---|---|
| Core Manager | Main REST API server (v1/v2 endpoints); handles authentication, configuration, routing, and discovery | Horizontally scalable via HPA |
| MIB Frontend | Web-based configuration GUI for operators | Horizontally scalable via HPA |
| Confd | Configuration service for routing configuration; synchronizes with Core Manager application | Single instance |
| Grafana | Monitoring and visualization dashboards | Single instance |
| Selection Input Worker | Consumes selection input events from Kafka and updates configuration | Single instance |
| Metrics Aggregator | Collects and aggregates metrics from CDN components | Single instance |
| Telegraf | System-level metrics collection from cluster nodes | DaemonSet (one per node) |
| Alertmanager | Alert routing and notification management | Single instance |
Data Layer
The data layer provides persistent and ephemeral storage:
| Component | Role | Scaling |
|---|---|---|
| Redis | In-memory caching, session storage, and ephemeral state | Master + replicas (read-only) |
| Kafka | Event streaming for selection input and metrics; provides durable message queue | Controller cluster (odd count) |
| PostgreSQL | Persistent configuration and state storage | 3-node cluster with HA |
| VictoriaMetrics (Analytics) | Real-time and short-term metrics for operational dashboards | Single instance |
| VictoriaMetrics (Billing) | Long-term metrics retention (1+ years) for billing and license compliance | Single instance |
External Integrations
| Component | Role |
|---|---|
| Zitadel IAM | Identity and access management; provides OAuth2/OIDC authentication |
| CDN Director (ESB3024) | Edge routing infrastructure; receives configuration from Confd |
Detailed Component Descriptions
Core Manager
The Core Manager is the central application server that exposes the REST API. It is implemented in Rust using the Actix-web framework.
Key Responsibilities:
- Authentication and session management via Zitadel
- Configuration document storage and retrieval
- Selection input CRUD operations
- Routing rule evaluation and GeoIP lookups
- Service discovery for CDN Directors and edge servers
- Operator UI helper endpoints
API Endpoints:
/api/v1/auth/*- Authentication (login, token, logout)/api/v1/configuration- Configuration management/api/v1/selection_input/*- Selection input operations/api/v2/selection_input/*- Enhanced selection input with list operations/api/v1/routing/*- Routing evaluation and validation/api/v1/discovery/*- Host and namespace discovery/api/v1/metrics- System metrics/api/v1/health/*- Liveness and readiness probes/api/v1/operator_ui/*- Operator helper endpoints
Runtime Modes: The Core Manager supports multiple runtime modes, each deployed as a separate container:
http-server- Primary HTTP API server (default)metrics-aggregator- Background worker for metrics collectionselection-input- Background worker for Kafka selection input consumption
MIB Frontend
The MIB Frontend provides a web-based GUI for configuration management.
Key Features:
- Intuitive web interface for CDN configuration
- Real-time configuration validation
- Integration with Zitadel for SSO authentication
- Uses NGinx Gateway for external Director communication
Confd (Configuration Service)
Confd provides routing configuration services and synchronizes with the Core Manager application.
Key Responsibilities:
- Hosts the service configuration for routing decisions
- Provides API and CLI for configuration management
- Synchronizes routing configuration with Core Manager
- Maintains configuration state in PostgreSQL
Selection Input Worker
The Selection Input Worker processes selection input events from the Kafka stream.
Key Responsibilities:
- Consumes messages from the
selection_inputKafka topic - Validates and transforms input data
- Updates configuration in the data store
- Maintains message ordering within partitions
Scaling Limitation: The Selection Input Worker cannot be scaled beyond a single consumer per Kafka partition, as message ordering must be preserved.
Metrics Aggregator
The Metrics Aggregator collects and processes metrics from CDN components.
Key Responsibilities:
- Polls metrics from Director instances
- Aggregates usage statistics
- Writes data to VictoriaMetrics (Analytics) for dashboards
- Writes long-term data to VictoriaMetrics (Billing) for compliance
Telegraf
Telegraf is deployed as a DaemonSet to collect host-level metrics.
Key Responsibilities:
- CPU, memory, disk, and network metrics from each node
- Container-level resource usage
- Kubernetes cluster metrics
- Forwards metrics to VictoriaMetrics
Grafana
Grafana provides visualization and dashboard capabilities.
Features:
- Pre-built dashboards for CDN monitoring
- Custom dashboard support
- VictoriaMetrics as data source
- Alerting integration with Alertmanager
Access: https://<host>/grafana
Alertmanager
Alertmanager handles alert routing and notifications.
Key Responsibilities:
- Receives alerts from Grafana and other sources
- Deduplicates and groups alerts
- Routes to notification channels (email, webhook, etc.)
- Manages alert silencing and inhibition
Data Storage
Redis
Redis provides in-memory storage for:
- User sessions and authentication tokens
- Ephemeral configuration cache
- Real-time state synchronization
Deployment: Master + read replicas for high availability
Kafka
Kafka provides durable event streaming for:
- Selection input events
- Metrics data streams
- Inter-service communication
Deployment: Controller cluster with 3 replicas for production, 1 replica for lab deployments
Node Affinity: Kafka replicas must be scheduled on separate nodes to ensure high availability. The Helm chart configures pod anti-affinity rules to enforce this distribution.
Topics:
selection_input- Selection input eventsmetrics- Metrics data streams
Note: For lab/single-node deployments, the Kafka replica count must be set to 1 in the Helm values. Production deployments require 3 replicas for fault tolerance.
PostgreSQL
PostgreSQL provides persistent storage for:
- Configuration documents
- User and permission data
- System state
Deployment: 3-node cluster managed by Cloudnative PG (CNPG) operator
High Availability: The CNPG operator manages automatic failover and ensures high availability:
- One primary node handles read/write operations
- Two replica nodes provide redundancy and can be promoted to primary on failure
- Automatic failover occurs within seconds of primary node failure
- Synchronous replication ensures data consistency
Note: The PostgreSQL cluster is deployed and managed automatically by the CNPG operator. Manual intervention is typically not required for normal operations.
VictoriaMetrics
Two VictoriaMetrics instances serve different purposes:
VictoriaMetrics (Analytics):
- Real-time and short-term metrics storage
- Supports Grafana dashboards
- Retention: Configurable (typically 30-90 days)
VictoriaMetrics (Billing):
- Long-term metrics retention
- Billing and license compliance data
- Retention: Minimum 1 year
Authentication and Authorization
Zitadel Integration
Zitadel provides identity and access management:
Authentication Flow:
- User accesses MIB Frontend or API
- Redirected to Zitadel for authentication
- Zitadel validates credentials and issues session token
- Session token exchanged for access token
- Access token included in API requests (Bearer authentication)
Default Credentials: See the Glossary for default login credentials.
Access Paths:
- Zitadel Console:
/ui/console - API authentication:
/api/v1/auth/*
CORS Configuration
Zitadel enforces Cross-Origin Resource Sharing (CORS) policies. The external hostname configured in Zitadel must match the first entry in global.hosts.manager in the Helm values.
Network Architecture
Traffic Flow
graph TB
External[External Clients] --> Ingress[Ingress Controller]
External --> Redis[(Redis)]
External --> Kafka[(Kafka)]
External --> Telegraf[Telegraf]
Ingress --> Manager[Core Manager]
Ingress --> Frontend[MIB Frontend]
Ingress --> Grafana[Grafana]
Ingress --> Zitadel[Zitadel]Note: Certain services (Redis, Kafka, Telegraf) can be accessed directly by external clients without traversing the ingress controller. This is typically used for metrics collection, event streaming, and direct data access scenarios.
Internal Communication
All internal services communicate over the Kubernetes overlay network (Flannel VXLAN). Services discover each other via Kubernetes DNS.
External Communication
- CDN Directors: Accessed via NGinx Gateway for simplified routing
- MaxMind GeoIP: Local database files (no external calls)
Scaling
Horizontal Pod Autoscaler (HPA)
The following components support automatic horizontal scaling via HPA:
| Component | Minimum | Maximum | Scale Metrics |
|---|---|---|---|
| Core Manager | 3 | 8 | CPU (50%), Memory (80%) |
| NGinx Gateway | 2 | 4 | CPU (75%), Memory (80%) |
| MIB Frontend | 2 | 4 | CPU (75%), Memory (90%) |
Note: HPA is enabled by default in the Helm chart. The default configuration is tuned for production deployments. Adjust min/max values based on expected load and available cluster capacity.
Manual Scaling
Components can also be scaled manually by setting replica counts in the Helm values:
manager:
replicaCount: 3
mib-frontend:
replicaCount: 2
Important: When manually setting replica counts, you must disable the Horizontal Pod Autoscaler (HPA) for the corresponding component. If HPA remains enabled, it will override manual replica settings. To disable HPA, set
autoscaling.hpa.enabled: falsefor the component in your Helm values.
Components That Do Not Scale
The following components do not support horizontal scaling:
| Component | Reason |
|---|---|
| Confd | Single instance required for configuration consistency |
| PostgreSQL | Cloudnative PG cluster; scaled by adding replicas via operator configuration |
| Kafka | Scaled by adding controllers, not via replica count |
| VictoriaMetrics | Stateful; single instance per role |
| Redis | Master is single; replicas are read-only |
| Grafana | Single instance sufficient for dashboard access |
| Alertmanager | Single instance for alert routing |
| Selection Input Worker | Kafka message ordering requires single consumer |
| Metrics Aggregator | Single instance for consistent metrics aggregation |
Node Scaling
Additional Agent nodes can be added to the cluster at any time to increase workload capacity. Kubernetes automatically schedules pods to nodes with available resources.
Cluster Balancing
The CDN Manager deployment includes the Kubernetes Descheduler to maintain balanced resource utilization across cluster nodes:
- Automatic Rebalancing: The descheduler periodically analyzes pod distribution and evicts pods from overutilized nodes
- Node Balance: Helps prevent resource hotspots by redistributing workloads across available nodes
- Integration with HPA: Works in conjunction with Horizontal Pod Autoscaler to optimize both pod count and placement
The descheduler runs as a background process and does not require manual intervention under normal operating conditions.
Resource Configuration
For detailed resource preset configurations and planning guidance, see the Configuration Guide.
High Availability
Server Node Redundancy
Production deployments require a minimum of 3 Server nodes:
- Survives loss of 1 server node
- Maintains quorum for etcd and Kafka
For enhanced availability, use 5 Server nodes:
- Survives loss of 2 server nodes
- Recommended for critical production environments
For large-scale deployments, 7 or more Server nodes can be used:
- Survives loss of 3+ server nodes
- Suitable for high-capacity production environments
Pod Distribution
Kubernetes automatically distributes pods across nodes to maximize availability:
- Pods with the same deployment are scheduled on different nodes when possible
- Pod Disruption Budgets (PDB) ensure minimum availability during maintenance
Data Replication
| Component | Replication Strategy |
|---|---|
| Redis | Single instance (backup via Longhorn snapshots) |
| Kafka | Replicated partitions (default: 3) |
| PostgreSQL | 3-node cluster via Cloudnative PG |
| VictoriaMetrics | Single instance (backup via snapshots) |
| Longhorn | Single replica with pod-node affinity |
Longhorn Storage: Longhorn volumes are configured with a single replica by default. Pod scheduling is configured with node affinity to prefer scheduling pods on the same node as their persistent volume data. This approach optimizes I/O performance while maintaining data locality.
Next Steps
After understanding the architecture:
- Installation Guide - Deploy the CDN Manager
- Configuration Guide - Configure components for your environment
- Operations Guide - Day-to-day operational procedures
- Performance Tuning Guide - Optimize system performance
- Metrics & Monitoring - Set up monitoring and alerting
5 - Installation Guide
Overview
This guide provides detailed instructions for installing the AgileTV CDN Manager (ESB3027) in various deployment scenarios. The installation process varies depending on the target environment and desired configuration.
Estimated Installation Time:
| Deployment Type | Time |
|---|---|
| Single-Node (Lab) | ~15 minutes |
| Multi-Node (3 servers) | ~30 minutes |
Actual installation time may vary depending on hardware performance, network speed, and whether air-gapped procedures are required.
Note: These estimates assume the operating system is already installed on all nodes. OS installation is outside the scope of this guide.
Installation Types
| Installation Type | Description | Use Case |
|---|---|---|
| Single-Node (Lab) | Minimal installation on a single host | Acceptance testing, demonstrations, development |
| Multi-Node (Production) | Full high-availability cluster with 3+ server nodes | Production deployments |
Installation Process Summary
The installation follows a sequential process:
- Prepare the host system - Verify requirements and mount the installation ISO
- Install the Kubernetes cluster - Deploy K3s, Longhorn storage, and PostgreSQL
- Join additional nodes (production only) - Expand the cluster for HA or capacity
- Deploy the Manager application - Install the CDN Manager Helm chart
- Post-installation configuration - Configure authentication, networking, and users
Quick Links
| Guide | Description |
|---|---|
| Installation Checklist | Step-by-step checklist to track progress |
| Single-Node Installation | Lab and acceptance testing deployment |
| Multi-Node Installation | Production high-availability deployment |
| Air-Gapped Deployment | Air-gapped environment installation |
| Upgrade Guide | Upgrading from previous versions |
| Next Steps | Post-installation configuration tasks |
Prerequisites
Before beginning installation, ensure the following requirements are met:
- Hardware: Nodes meeting the System Requirements including CPU, memory, and disk specifications
- Operating System: RHEL 9 or compatible clone (details); air-gapped deployments require the OS ISO mounted on all nodes
- Network: Proper firewall configuration between nodes (port requirements, firewall configuration)
- Software: Installation ISO obtained from AgileTV; air-gapped deployments also require the Extras ISO
- Kernel Tuning: For production deployments, apply recommended sysctl settings (Performance Tuning Guide)
We recommend using the Installation Checklist to track your progress through the installation process.
Getting Help
If you encounter issues during installation:
- Review the Troubleshooting Guide for common issues
- Check the System Requirements to verify your environment
- Consult the Release Notes for version-specific known issues
5.1 - Installation Checklist
Overview
Use this checklist to track your installation progress. Print this page or keep it open during your installation to ensure all steps are completed correctly.
Pre-Installation
Hardware and Software
- Verify hardware meets System Requirements
- Confirm operating system is supported (RHEL 9 or compatible clone)
- Configure firewall rules between nodes (details)
- Apply recommended sysctl settings (details)
- Obtain installation ISO (
esb3027-acd-manager-X.Y.Z.iso)
Air-Gapped Deployments
- Obtain Extras ISO (
esb3027-acd-manager-extras-X.Y.Z.iso) - Mount OS ISO on all nodes before installation
- Verify OS packages are accessible from mounted ISO
Special Requirements
- Oracle Linux UEK: Install
kernel-uek-modules-extra-netfilter-$(uname -r)package - Control Plane Only nodes: Set
SKIP_REQUIREMENTS_CHECK=1if below lab minimums - SELinux: Set to “Enforcing” mode before running installer (cannot enable after)
Cluster Installation
Single-Node Deployment
Follow the Single-Node Installation Guide.
- Mount installation ISO (Step 1)
- Install the base cluster (Step 2)
- Verify cluster status (Step 3)
- Air-gapped only: Load container images (Step 4)
- Create configuration file (Step 5)
- Optional: Load MaxMind GeoIP databases (Step 6)
- Deploy the Manager Helm chart (Step 7)
- Verify deployment (Step 8)
Multi-Node Deployment
Follow the Multi-Node Installation Guide.
Primary Server Node
- Mount installation ISO (Step 1)
- Install the base cluster (Step 2)
- Verify system pods are running (Step 2)
- Retrieve the node token (Step 3)
Additional Server Nodes
- Mount installation ISO (Step 5)
- Join the cluster (Step 5)
- Verify each node joins (Step 5)
- Optional: Taint Control Plane Only nodes (Step 5b)
Agent Nodes (Optional)
- Mount installation ISO (Step 6)
- Join the cluster as an agent (Step 6)
- Verify each agent joins (Step 6)
Cluster Verification
- Verify all nodes are ready (Step 7)
- Verify system pods running on all nodes (Step 7)
- Air-gapped only: Load container images on each node (Step 9)
Application Deployment
- Create configuration file (Step 10)
- Optional: Load MaxMind GeoIP databases (Step 11)
- Optional: Configure TLS certificates from trusted CA (Step 12)
- Deploy the Manager Helm chart (Step 13)
- Verify all pods are running and distributed (Step 14)
- Configure DNS records for manager hostname (Step 15)
Post-Installation
Initial Access
- Access the system via HTTPS
- Accept self-signed certificate warning (if using default certificate)
- Log in with default credentials (see Glossary)
Security Configuration
- Create new administrator account in Zitadel
- Delete or secure the default admin account
- Configure additional users and permissions
- Review Zitadel Administrator Documentation for role assignments
Monitoring and Operations
- Access Grafana dashboards at
/grafana - Review pre-built monitoring dashboards
- Configure alerting rules (optional)
- Set up notification channels (optional)
Next Steps
- Review Next Steps Guide for additional configuration
- Configure CDN routing rules
- Set up GeoIP-based routing (if using MaxMind databases)
- Review Operations Guide for day-to-day procedures
Troubleshooting
If you encounter issues during installation:
- Check pod status:
kubectl describe pod <pod-name> - Review logs:
kubectl logs <pod-name> - Check cluster events:
kubectl get events --sort-by='.lastTimestamp' - Review the Troubleshooting Guide for common issues
5.2 - Single-Node Installation
Warning: Single-node deployments are for lab environments, acceptance testing, and demonstrations only. This configuration is not suitable for production workloads. For production deployments, see the Multi-Node Installation Guide, which requires a minimum of 3 server nodes for high availability.
Air-Gapped Deployment? This guide assumes internet connectivity. For air-gapped deployments, see the Air-Gapped Deployment Guide for additional requirements and procedures.
Overview
This guide describes the installation of the AgileTV CDN Manager on a single node. This configuration is intended for lab environments, acceptance testing, and demonstrations only. It is not suitable for production workloads.
Prerequisites
Hardware Requirements
Refer to the System Requirements Guide for hardware specifications. Single-node deployments require the “Single-Node (Lab)” configuration.
Operating System
Refer to the System Requirements Guide for supported operating systems.
Software Access
- Installation ISO:
esb3027-acd-manager-X.Y.Z.iso - Extras ISO (air-gapped only):
esb3027-acd-manager-extras-X.Y.Z.iso
Network Configuration
Ensure that required firewall ports are configured before installation. See the Networking Guide for complete firewall configuration requirements.
SELinux
If SELinux is to be used, it must be set to “Enforcing” mode before running the installer script. The installer will configure appropriate SELinux policies automatically. SELinux cannot be enabled after installation.
Installation Steps
Step 1: Mount the ISO
Create a mount point and mount the installation ISO:
mkdir -p /mnt/esb3027
mount -o loop,ro esb3027-acd-manager-X.Y.Z.iso /mnt/esb3027
Replace X.Y.Z with the actual version number.
Step 2: Install the Base Cluster
Run the installer to set up the K3s Kubernetes cluster:
/mnt/esb3027/install
This installs:
- K3s Kubernetes distribution
- Longhorn distributed storage
- Cloudnative PG operator for PostgreSQL
- Base system dependencies
The installer will configure the node as both a server and agent node.
Step 3: Verify Cluster Status
After the installer completes, verify that all components are operational before proceeding. This verification serves as an important checkpoint to confirm the installation is progressing correctly.
1. Verify the node is ready:
kubectl get nodes
Expected output:
NAME STATUS ROLES AGE VERSION
k3s-server Ready control-plane,etcd,master 2m v1.33.4+k3s1
2. Verify system pods in both namespaces are running:
# Check kube-system namespace (Kubernetes core components)
kubectl get pods -n kube-system
# Check longhorn-system namespace (distributed storage)
kubectl get pods -n longhorn-system
All pods should show Running status. If any pods are still Pending or ContainerCreating, wait until they are ready. Proceeding with incomplete system pods can cause subsequent steps to fail in unpredictable ways.
This verification confirms:
- K3s cluster is operational
- Longhorn distributed storage is running
- Cloudnative PG operator is deployed
- All core components are healthy before continuing
Step 4: Air-Gapped Deployments (If Applicable)
If deploying in an air-gapped environment, load container images from the extras ISO:
mkdir -p /mnt/esb3027-extras
mount -o loop,ro esb3027-acd-manager-extras-X.Y.Z.iso /mnt/esb3027-extras
/mnt/esb3027-extras/load-images
Step 5: Create Configuration File
Create a Helm values file for your deployment. At minimum, configure the manager hostname and at least one router:
# ~/values.yaml
global:
hosts:
manager:
- host: manager.local
routers:
- name: default
address: 127.0.0.1
The routers configuration specifies CDN Director instances. For lab deployments, a placeholder entry is sufficient. For production, specify the actual Director hostnames or IP addresses.
For single-node deployments, you must also disable Kafka replication:
kafka:
replicaCount: 1
controller:
replicaCount: 1
Step 6: Load MaxMind GeoIP Databases (Optional)
If you plan to use GeoIP-based routing or validation features, load the MaxMind GeoIP databases. The following databases are used by the manager:
GeoIP2-City.mmdb- The City DatabaseGeoLite2-ASN.mmdb- The ASN DatabaseGeoIP2-Anonymous-IP.mmdb- The VPN and Anonymous IP Database
A helper utility is provided on the ISO to create the Kubernetes volume:
/mnt/esb3027/generate-maxmind-volume
The utility will prompt for the locations of the three database files and the name of the volume. After running this command, reference the volume in your configuration file:
manager:
maxmindDbVolume: maxmind-db-volume
Replace maxmind-db-volume with the volume name you specified when running the utility.
Tip: When naming the volume, include a revision number or date (e.g.,
maxmind-db-volume-2026-04ormaxmind-db-volume-v2). This simplifies future updates: create a new volume with an updated name, update thevalues.yamlto reference the new volume, and delete the old volume after verification.
Step 7: Deploy the Manager Helm Chart
Deploy the CDN Manager application:
helm install acd-manager /mnt/esb3027/charts/acd-manager --values ~/values.yaml
Monitor the deployment progress:
kubectl get pods
Wait for all pods to show Running status before proceeding.
Note: The default Helm timeout is 5 minutes. If the installation fails due to a rollout timeout, retry with a larger timeout value:
helm install acd-manager /mnt/esb3027/charts/acd-manager --values ~/values.yaml --timeout 10m
If a previous installation attempt failed and you receive an error that the release name is already in use, uninstall the previous release before retrying:
helm uninstall acd-manager
helm install acd-manager /mnt/esb3027/charts/acd-manager --values ~/values.yaml
Step 8: Verify Deployment
Verify all application pods are running:
kubectl get pods
Expected output for a single-node deployment (pod names will vary):
NAME READY STATUS RESTARTS AGE
acd-manager-5b98d569d9-abc12 1/1 Running 0 3m
acd-manager-confd-6fb78548c4-xnrh4 1/1 Running 0 3m
acd-manager-gateway-8bc8446fc-chs26 1/1 Running 0 3m
acd-manager-kafka-controller-0 2/2 Running 0 3m
acd-manager-metrics-aggregator-76d96c4964-lwdcj 1/1 Running 0 3m
acd-manager-mib-frontend-7bdb69684b-6qxn8 1/1 Running 0 3m
acd-manager-postgresql-0 1/1 Running 0 3m
acd-manager-redis-master-0 2/2 Running 0 3m
acd-manager-redis-replicas-0 2/2 Running 0 3m
acd-manager-selection-input-5fb694b857-qxt67 1/1 Running 0 3m
acd-manager-zitadel-8448b4c4fc-2pkd8 1/1 Running 0 3m
acd-manager-zitadel-init-hh6j7 0/1 Completed 0 4m
acd-manager-zitadel-setup-nwp8k 0/2 Completed 0 4m
alertmanager-0 1/1 Running 0 3m
grafana-6d948cfdc6-77ggk 1/1 Running 0 3m
victoria-metrics-agent-dc87df588-tn8wv 1/1 Running 0 3m
victoria-metrics-alert-757c44c58f-kk9lp 1/1 Running 0 3m
victoria-metrics-longterm-server-0 1/1 Running 0 3m
victoria-metrics-server-0 1/1 Running 0 3m
Note: Init pods (such as zitadel-init and zitadel-setup) will show Completed status after successful initialization. This is expected behavior.
Post-Installation
After installation completes, proceed to the Next Steps guide for:
- Initial user configuration
- Accessing the web interfaces
- Configuring authentication
- Setting up monitoring
Accessing the System
Refer to the Accessing the System section in the Getting Started guide for service URLs and default credentials.
Note: A self-signed SSL certificate is deployed by default. You will need to accept the certificate warning in your browser.
Troubleshooting
If pods fail to start:
- Check pod status:
kubectl describe pod <pod-name> - Review logs:
kubectl logs <pod-name> - Verify resources:
kubectl top pods
See the Troubleshooting Guide for additional assistance.
Next Steps
After successful installation:
- Next Steps Guide - Post-installation configuration
- Configuration Guide - System configuration
- Operations Guide - Day-to-day operations
Appendix: Example Configuration
The following values.yaml provides a minimal working configuration for lab deployments:
# Minimal lab configuration for single-node deployment
global:
hosts:
manager:
- host: manager.local
routers:
- name: default
address: 127.0.0.1
# Single-node: Disable Kafka replication
kafka:
replicaCount: 1
controller:
replicaCount: 1
Customization notes:
- Replace
manager.localwith your desired hostname - The
routersentry specifies CDN Director instances. The placeholder127.0.0.1may be used if a Director instance isn’t available, or specify actual Director hostnames for production testing - For air-gapped deployments, see Step 4: Air-Gapped Deployments
5.3 - Multi-Node Installation
Overview
This guide describes the installation of the AgileTV CDN Manager across multiple nodes for production deployments. This configuration provides high availability and horizontal scaling capabilities.
Air-Gapped Deployment? This guide assumes internet connectivity. For air-gapped deployments, see the Air-Gapped Deployment Guide for additional requirements and procedures.
Prerequisites
Hardware Requirements
Refer to the System Requirements Guide for hardware specifications. Production deployments require:
- Minimum 3 Server nodes (Control Plane Only or Combined role)
- Optional Agent nodes for additional workload capacity
Operating System
Refer to the System Requirements Guide for supported operating systems.
Software Access
- Installation ISO:
esb3027-acd-manager-X.Y.Z.iso(for each node) - Extras ISO (air-gapped only):
esb3027-acd-manager-extras-X.Y.Z.iso
Network Configuration
Ensure that required firewall ports are configured between all nodes before installation. See the Networking Guide for complete firewall configuration requirements.
Multiple Network Interfaces
If your nodes have multiple network interfaces and you want to use a separate interface for cluster traffic (not the default route interface), configure the INSTALL_K3S_EXEC environment variable before installing the cluster or joining nodes.
For example, if bond0 has the default route but you want cluster traffic on bond1:
# For server nodes
export INSTALL_K3S_EXEC="server --node-ip 10.0.0.10 --flannel-iface=bond1"
# For agent nodes
export INSTALL_K3S_EXEC="agent --node-ip 10.0.0.20 --flannel-iface=bond1"
Where:
- Mode: Use
serverfor the primary node establishing the cluster, or for additional server nodes. Useagentfor agent nodes joining the cluster. --node-ip: The IP address of the interface to use for cluster traffic--flannel-iface: The network interface name for Flannel VXLAN overlay traffic
Set this variable on each node before running the install or join scripts.
SELinux
If SELinux is to be used, it must be set to “Enforcing” mode before running the installer script. The installer will configure appropriate SELinux policies automatically. SELinux cannot be enabled after installation.
Installation Steps
Step 1: Prepare the Primary Server Node
Mount the installation ISO on the primary server node:
mkdir -p /mnt/esb3027
mount -o loop,ro esb3027-acd-manager-X.Y.Z.iso /mnt/esb3027
Replace X.Y.Z with the actual version number.
Step 2: Install the Base Cluster on Primary Server
If your node has multiple network interfaces and you need to specify a separate interface for cluster traffic, set the INSTALL_K3S_EXEC environment variable before running the installer (see Multiple Network Interfaces):
export INSTALL_K3S_EXEC="server --node-ip <node-ip> --flannel-iface=<interface>"
Run the installer to set up the K3s Kubernetes cluster:
/mnt/esb3027/install
This installs:
- K3s Kubernetes distribution
- Longhorn distributed storage
- Cloudnative PG operator for PostgreSQL
- Base system dependencies
Important: After the installer completes, verify that all system pods in both namespaces are in the Running state before proceeding:
# Check kube-system namespace (Kubernetes core components)
kubectl get pods -n kube-system
# Check longhorn-system namespace (distributed storage)
kubectl get pods -n longhorn-system
All pods should show Running status. If any pods are still Pending or ContainerCreating, wait until they are ready. Proceeding with incomplete system pods can cause subsequent steps to fail in unpredictable ways.
This verification confirms:
- K3s cluster is operational
- Longhorn distributed storage is running
- Cloudnative PG operator is deployed
- All core components are healthy before continuing
Step 3: Retrieve the Node Token
Retrieve the node token for joining additional nodes:
cat /var/lib/rancher/k3s/server/node-token
Save this token for use on additional nodes. Also note the IP address of the primary server node.
Step 4: Server vs Agent Node Roles
Before joining additional nodes, determine which nodes will serve as Server nodes vs Agent nodes:
| Role | Control Plane | Workloads | HA Quorum | Use Case |
|---|---|---|---|---|
| Server Node (Combined) | Yes (etcd, API server) | Yes | Participates | Default production role; minimum 3 nodes |
| Server Node (Control Plane Only) | Yes (etcd, API server) | No | Participates | Dedicated control plane; requires separate Agent nodes |
| Agent Node | No | Yes | No | Additional workload capacity only |
Guidance:
- Combined role (default): Server nodes run both control plane and workloads; minimum 3 nodes required for HA
- Control Plane Only: Dedicate nodes to control plane functions; requires at least 3 Server nodes plus 3+ Agent nodes for workloads
- Agent nodes are required if using Control Plane Only servers; optional if using Combined role servers
- For most deployments, 3 Server nodes (Combined role) with no Agent nodes is sufficient
- Add Agent nodes to scale workload capacity without affecting control plane quorum
Proceed to Step 5 to join Server nodes. Agent nodes are joined after all Server nodes are ready.
Step 5: Join Additional Server Nodes
On each additional server node:
Mount the ISO:
mkdir -p /mnt/esb3027 mount -o loop,ro esb3027-acd-manager-X.Y.Z.iso /mnt/esb3027Join the cluster:
If your node has multiple network interfaces, set the INSTALL_K3S_EXEC environment variable with the server mode before running the join script (see Multiple Network Interfaces):
export INSTALL_K3S_EXEC="server --node-ip <node-ip> --flannel-iface=<interface>"
Run the join script:
/mnt/esb3027/join-server https://<primary-server-ip>:6443 <node-token>
Replace <primary-server-ip> with the IP address of the primary server and <node-token> with the token retrieved in Step 3.
- Verify the node joined successfully:
kubectl get nodes
Repeat for each server node. A minimum of 3 server nodes is required for high availability.
Step 5b: Taint Control Plane Only Nodes (Optional)
If you are using dedicated Control Plane Only nodes (not Combined role), apply taints to prevent workload scheduling:
kubectl taint nodes <node-name> CriticalAddonsOnly=true:NoSchedule
Apply this taint to each Control Plane Only node. Verify taints are applied:
kubectl describe nodes | grep -A 5 "Taints"
Note: This step is only required if you want dedicated control plane nodes. For Combined role deployments, do not apply taints.
Important: Control Plane Only Server nodes can be deployed with lower hardware specifications (2 cores, 4 GiB, 64 GiB) than the installer’s default minimum requirements. If your Control Plane Only Server nodes do not meet the Single-Node Lab configuration minimums (8 cores, 16 GiB, 128 GiB), you must set the
SKIP_REQUIREMENTS_CHECKenvironment variable before running the installer or join command:# For the primary server node export SKIP_REQUIREMENTS_CHECK=1 /mnt/esb3027/install # For additional Control Plane Only Server nodes export SKIP_REQUIREMENTS_CHECK=1 /mnt/esb3027/join-server https://<primary-server-ip>:6443 <node-token>Note: This applies to Server nodes only. Agent nodes have separate minimum requirements.
Step 6: Join Agent Nodes (Optional)
On each agent node:
Mount the ISO:
mkdir -p /mnt/esb3027 mount -o loop,ro esb3027-acd-manager-X.Y.Z.iso /mnt/esb3027Join the cluster as an agent:
If your node has multiple network interfaces, set the INSTALL_K3S_EXEC environment variable with the agent mode before running the join script (see Multiple Network Interfaces):
export INSTALL_K3S_EXEC="agent --node-ip <node-ip> --flannel-iface=<interface>"
Run the join script:
/mnt/esb3027/join-agent https://<primary-server-ip>:6443 <node-token>
- Verify the node joined successfully from an existing server node:
kubectl get nodes
Agent nodes provide additional workload capacity but do not participate in the control plane quorum.
Step 7: Verify Cluster Status
After all nodes are joined, verify the cluster is operational:
1. Verify all nodes are ready:
kubectl get nodes
Expected output:
NAME STATUS ROLES AGE VERSION
k3s-server-0 Ready control-plane,etcd,master 5m v1.33.4+k3s1
k3s-server-1 Ready control-plane,etcd,master 3m v1.33.4+k3s1
k3s-server-2 Ready control-plane,etcd,master 2m v1.33.4+k3s1
k3s-agent-1 Ready <none> 1m v1.33.4+k3s1
k3s-agent-2 Ready <none> 1m v1.33.4+k3s1
2. Verify system pods in both namespaces are running:
# Check kube-system namespace (Kubernetes core components)
kubectl get pods -n kube-system
# Check longhorn-system namespace (distributed storage)
kubectl get pods -n longhorn-system
All pods should show Running status. If any pods are still Pending or ContainerCreating, wait until they are ready.
This verification confirms:
- K3s cluster is operational across all nodes
- Longhorn distributed storage is running
- Cloudnative PG operator is deployed
- All core components are healthy before proceeding to application deployment
Step 9: Air-Gapped Deployments (If Applicable)
If deploying in an air-gapped environment, on each node:
mkdir -p /mnt/esb3027-extras
mount -o loop,ro esb3027-acd-manager-extras-X.Y.Z.iso /mnt/esb3027-extras
/mnt/esb3027-extras/load-images
Step 10: Create Configuration File
Create a Helm values file for your deployment. At minimum, configure the manager hostnames, Zitadel external domain, and at least one router:
# ~/values.yaml
global:
hosts:
manager:
- host: manager.example.com
- host: manager-backup.example.com
routers:
- name: director-1
address: 192.0.2.1
- name: director-2
address: 192.0.2.2
zitadel:
zitadel:
ExternalDomain: manager.example.com
Tip: A complete default values.yaml file is available on the installation ISO at /mnt/esb3027/values.yaml. Copy this file to use as a starting point for your configuration.
Important: The zitadel.zitadel.ExternalDomain must match the first entry in global.hosts.manager or authentication will fail due to CORS policy violations.
Important: For multi-node deployments, Kafka replication is enabled by default with 3 replicas. Do not modify the kafka.replicaCount or kafka.controller.replicaCount settings unless you understand the implications for data durability.
Step 11: Load MaxMind GeoIP Databases (Optional)
If you plan to use GeoIP-based routing or validation features, load the MaxMind GeoIP databases. The following databases are used by the manager:
GeoIP2-City.mmdb- The City DatabaseGeoLite2-ASN.mmdb- The ASN DatabaseGeoIP2-Anonymous-IP.mmdb- The VPN and Anonymous IP Database
A helper utility is provided on the ISO to create the Kubernetes volume:
/mnt/esb3027/generate-maxmind-volume
The utility will prompt for the locations of the three database files and the name of the volume. After running this command, reference the volume in your configuration file:
manager:
maxmindDbVolume: maxmind-db-volume
Replace maxmind-db-volume with the volume name you specified when running the utility.
Tip: When naming the volume, include a revision number or date (e.g.,
maxmind-db-volume-2026-04ormaxmind-db-volume-v2). This simplifies future updates: create a new volume with an updated name, update thevalues.yamlto reference the new volume, and delete the old volume after verification.
Step 12: Configure TLS Certificates (Optional)
For production deployments, configure a valid TLS certificate from a trusted Certificate Authority (CA). A self-signed certificate is deployed by default if no certificate is provided.
Method 1: Create TLS Secret Manually
Create a Kubernetes TLS secret with your certificate and key:
kubectl create secret tls acd-manager-tls --cert=tls.crt --key=tls.key
Method 2: Helm-Managed Secret
Add the certificate directly to your values.yaml:
ingress:
secrets:
acd-manager-tls: |
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
tls:
- hosts:
- manager.example.com
secretName: acd-manager-tls
Configuring All Ingress Controllers
All ingress controllers must be configured with the same certificate secret and hostname:
ingress:
hostname: manager.example.com
tls: true
secretName: acd-manager-tls
zitadel:
ingress:
tls:
- hosts:
- manager.example.com
secretName: acd-manager-tls
confd:
ingress:
hostname: manager.example.com
tls: true
secretName: acd-manager-tls
mib-frontend:
ingress:
hostname: manager.example.com
tls: true
secretName: acd-manager-tls
Important: The hostname must match the first entry in global.hosts.manager for Zitadel CORS compatibility. The secret name has a maximum length of 53 characters.
Step 13: Deploy the Manager Helm Chart
Deploy the CDN Manager application:
helm install acd-manager /mnt/esb3027/charts/acd-manager --values ~/values.yaml
Note: By default, helm install runs silently until completion. To see real-time output during deployment, add the --debug flag:
helm install acd-manager /mnt/esb3027/charts/acd-manager --values ~/values.yaml --debug
Tip: For better organization, split your configuration into multiple files and specify them with repeated --values flags:
helm install acd-manager /mnt/esb3027/charts/acd-manager \
--values ~/values-base.yaml \
--values ~/values-tls.yaml \
--values ~/values-autoscaling.yaml
Later files override earlier files, allowing you to maintain a base configuration with environment-specific overrides.
Monitor the deployment progress:
kubectl get pods
Wait for all pods to show Running status before proceeding.
Note: The default Helm timeout is 5 minutes. If the installation fails due to a rollout timeout, retry with a larger timeout value:
helm install acd-manager /mnt/esb3027/charts/acd-manager --values ~/values.yaml --timeout 10m
If a previous installation attempt failed and you receive an error that the release name is already in use, uninstall the previous release before retrying:
helm uninstall acd-manager
helm install acd-manager /mnt/esb3027/charts/acd-manager --values ~/values.yaml
Step 14: Verify Deployment
Verify all application pods are running:
kubectl get pods
Note: During the initial deployment, several pods may enter a CrashLoopBackoff state depending on the timing of other containers starting up. This is expected behavior as some services wait for dependencies (such as databases or Kafka) to become available. The deployment should stabilize automatically after a few minutes.
Verify pods are distributed across nodes:
kubectl get pods -o wide
Expected output for a 3-node cluster (pod names will vary):
NAME READY STATUS RESTARTS AGE
acd-cluster-postgresql-1 1/1 Running 0 11m
acd-cluster-postgresql-2 1/1 Running 0 11m
acd-cluster-postgresql-3 1/1 Running 0 10m
acd-manager-5b98d569d9-2pbph 1/1 Running 0 3m
acd-manager-5b98d569d9-m54f9 1/1 Running 0 3m
acd-manager-5b98d569d9-pq26f 1/1 Running 0 3m
acd-manager-confd-6fb78548c4-xnrh4 1/1 Running 0 3m
acd-manager-gateway-8bc8446fc-chs26 1/1 Running 0 3m
acd-manager-gateway-8bc8446fc-wzrml 1/1 Running 0 3m
acd-manager-kafka-controller-0 2/2 Running 0 3m
acd-manager-kafka-controller-1 2/2 Running 0 3m
acd-manager-kafka-controller-2 2/2 Running 0 3m
acd-manager-metrics-aggregator-76d96c4964-lwdcj 1/1 Running 2 3m
acd-manager-mib-frontend-7bdb69684b-6qxn8 1/1 Running 0 3m
acd-manager-mib-frontend-7bdb69684b-pkjrw 1/1 Running 0 3m
acd-manager-redis-master-0 2/2 Running 0 3m
acd-manager-redis-replicas-0 2/2 Running 0 3m
acd-manager-selection-input-5fb694b857-qxt67 1/1 Running 2 3m
acd-manager-zitadel-8448b4c4fc-2pkd8 1/1 Running 0 3m
acd-manager-zitadel-8448b4c4fc-vchp9 1/1 Running 0 3m
acd-manager-zitadel-init-hh6j7 0/1 Completed 0 4m
acd-manager-zitadel-setup-nwp8k 0/2 Completed 0 4m
alertmanager-0 1/1 Running 0 3m
grafana-6d948cfdc6-77ggk 1/1 Running 0 3m
telegraf-54779f5f46-2jfj5 1/1 Running 0 3m
victoria-metrics-agent-dc87df588-tn8wv 1/1 Running 0 3m
victoria-metrics-alert-757c44c58f-kk9lp 1/1 Running 0 3m
victoria-metrics-longterm-server-0 1/1 Running 0 3m
victoria-metrics-server-0 1/1 Running 0 3m
Note: Init pods (such as zitadel-init and zitadel-setup) will show Completed status after successful initialization. This is expected behavior. Some pods may show restart counts as they wait for dependencies to become available.
Step 15: Configure DNS (Optional)
Add DNS records for the manager hostname. For high availability, configure multiple A records pointing to different server nodes:
manager.example.com. IN A <server-1-ip>
manager.example.com. IN A <server-2-ip>
manager.example.com. IN A <server-3-ip>
Alternatively, configure a load balancer to distribute traffic across nodes.
Post-Installation
After installation completes, proceed to the Next Steps guide for:
- Initial user configuration
- Accessing the web interfaces
- Configuring authentication
- Setting up monitoring
Accessing the System
Refer to the Accessing the System section in the Getting Started guide for service URLs and default credentials.
Note: A self-signed SSL certificate is deployed by default. For production deployments, configure a valid SSL certificate before exposing the system to users.
High Availability Considerations
Pod Distribution
The Helm chart configures pod anti-affinity rules to ensure:
- Kafka controllers are scheduled on separate nodes
- PostgreSQL cluster members are distributed across nodes
- Application pods are spread across available nodes
Data Replication and Failure Tolerance
For detailed information on data replication strategies and failure scenario tolerance, refer to the Architecture Guide and System Requirements Guide.
Troubleshooting
If pods fail to start or nodes fail to join:
- Check node status:
kubectl get nodes - Describe problematic pods:
kubectl describe pod <pod-name> - Review logs:
kubectl logs <pod-name> - Check cluster events:
kubectl get events --sort-by='.lastTimestamp'
See the Troubleshooting Guide for additional assistance.
Next Steps
After successful installation:
- Next Steps Guide - Post-installation configuration
- Configuration Guide - System configuration
- Operations Guide - Day-to-day operations
5.4 - Air-Gapped Deployment Guide
Overview
This guide describes the installation of the AgileTV CDN Manager in air-gapped environments (no internet access). Air-gapped deployments require additional preparation compared to connected deployments.
Key differences from connected deployments:
- Both Installation ISO and Extras ISO are required
- OS installation ISO must be mounted on all nodes
- Container images must be loaded from the Extras ISO on each node
- Additional firewall considerations for OS package repositories
Prerequisites
Required ISOs
Before beginning installation, obtain the following:
| ISO | Filename | Purpose |
|---|---|---|
| Installation ISO | esb3027-acd-manager-X.Y.Z.iso | Kubernetes cluster and Manager application |
| Extras ISO | esb3027-acd-manager-extras-X.Y.Z.iso | Container images for air-gapped environments |
| OS Installation ISO | RHEL 9 or compatible clone | Operating system packages (required on all nodes) |
Single-Node vs Multi-Node
Air-gapped procedures apply to both deployment types:
- Lab/Single-Node: Follow Single-Node Installation with additional air-gapped steps below
- Production/Multi-Node: Follow Multi-Node Installation with additional air-gapped steps below
Network Configuration
Air-gapped environments may have internal network mirrors for OS packages. If no internal mirror exists, the OS installation ISO must be mounted on each node to provide packages during installation.
Air-Gapped Installation Steps
Step 1: Prepare All Nodes
On each node (primary server, additional servers, and agents):
Mount the OS installation ISO:
mkdir -p /mnt/os mount -o loop,ro /path/to/rhel-9.iso /mnt/osConfigure local repository (if no internal mirror):
cat > /etc/yum.repos.d/local.repo <<EOF [local] name=Local OS Repository baseurl=file:///mnt/os/BaseOS enabled=1 gpgcheck=0 EOFVerify repository is accessible:
dnf repolist
Step 2: Mount Installation ISOs
On the primary server node first, then each additional node:
# Mount Installation ISO
mkdir -p /mnt/esb3027
mount -o loop,ro esb3027-acd-manager-X.Y.Z.iso /mnt/esb3027
# Mount Extras ISO
mkdir -p /mnt/esb3027-extras
mount -o loop,ro esb3027-acd-manager-extras-X.Y.Z.iso /mnt/esb3027-extras
Step 3: Install Kubernetes Cluster
Primary Server Node
/mnt/esb3027/install
Wait for the installer to complete and verify system pods are running:
kubectl get nodes
kubectl get pods -n kube-system
kubectl get pods -n longhorn-system
Additional Server Nodes (Multi-Node Only)
On each additional server node:
/mnt/esb3027/join-server https://<primary-server-ip>:6443 <node-token>
Agent Nodes (Optional)
On each agent node:
/mnt/esb3027/join-agent https://<primary-server-ip>:6443 <node-token>
Step 4: Load Container Images
On each node in the cluster:
/mnt/esb3027-extras/load-images
This script loads all container images from the Extras ISO into the local container runtime.
Important: This step must be performed on every node (primary server, additional servers, and agents) before deploying the Manager application.
Step 5: Create Configuration File
Create a Helm values file for your deployment. At minimum, configure the manager hostname and router addresses:
# ~/values.yaml
global:
hosts:
manager:
- host: manager.local
routers:
- name: default
address: 127.0.0.1
# Single-node: Disable Kafka replication
kafka:
replicaCount: 1
controller:
replicaCount: 1
For multi-node deployments, see the Multi-Node Installation Guide for complete configuration requirements.
Step 6: Deploy the Manager
Deploy the CDN Manager Helm chart:
helm install acd-manager /mnt/esb3027/charts/acd-manager --values ~/values.yaml
Monitor the deployment progress:
kubectl get pods --watch
Wait for all pods to show Running status before proceeding.
Step 7: Verify Deployment
Verify all application pods are running:
kubectl get pods
All pods should show Running status (except init pods which show Completed).
Post-Installation
After installation completes:
- Access the system via HTTPS at
https://<manager-host> - Configure authentication via Zitadel at
https://<manager-host>/ui/console - Set up monitoring via Grafana at
https://<manager-host>/grafana
See the Next Steps Guide for detailed post-installation configuration.
Updating MaxMind GeoIP Databases
If using GeoIP-based routing, load the MaxMind databases:
/mnt/esb3027/generate-maxmind-volume
The utility will prompt for the database file locations and volume name. Reference the volume in your values.yaml:
manager:
maxmindDbVolume: maxmind-geoip-2026-04
See the Operations Guide for database update procedures.
Troubleshooting
Image Pull Errors
If pods fail with image pull errors:
- Verify the
load-imagesscript completed successfully on all nodes - Check container runtime image list:
crictl images | grep <image-name> - Ensure image tags in Helm chart match tags on the Extras ISO
OS Package Errors
If the installer reports missing OS packages:
- Verify OS ISO is mounted on the affected node
- Check repository configuration:
dnf repolist dnf info <package-name> - Ensure the ISO matches the installed OS version
Longhorn Volume Issues
If Longhorn volumes fail to mount:
- Verify all nodes have the
load-imagesscript completed - Check Longhorn system pods:
kubectl get pods -n longhorn-system - Review Longhorn UI via port-forward:
kubectl port-forward -n longhorn-system svc/longhorn-frontend 8080:80
Next Steps
After successful installation:
- Next Steps Guide - Post-installation configuration
- Operations Guide - Day-to-day operational procedures
- Troubleshooting Guide - Common issues and resolution
5.5 - Upgrade Guide
Overview
This guide describes the procedure for upgrading the AgileTV CDN Manager (ESB3027) to a newer version. The upgrade process involves updating the Kubernetes cluster components and redeploying the Helm chart with the new version.
Prerequisites
Backup Requirements
Before beginning any upgrade, ensure you have:
- PostgreSQL Backup: Verify recent backups are available via the Cloudnative PG operator
- Configuration Backup: Save your current
values.yamlfile(s) - TLS Certificates: Ensure certificate files are backed up
- MaxMind Volumes: Note the current volume names if using GeoIP databases
Version Compatibility
Review the Release Notes for the target version to check for:
- Breaking changes requiring manual intervention
- Required intermediate upgrade steps
- New configuration options that should be set
Cluster Health
Verify the cluster is healthy before upgrading:
kubectl get nodes
kubectl get pods
kubectl get pvc
All nodes should show Ready status and all pods should be Running (or Completed for job pods).
Upgrade Methods
There are three upgrade methods available. Choose the one that best fits your situation:
| Method | Downtime | Use Case |
|---|---|---|
| Rolling Upgrade | Minimal | Patch releases; minor version upgrades; configuration updates |
| Clean Upgrade | Brief | Major version upgrades; component changes; troubleshooting |
| Full Reinstall | Extended | Cluster rebuilds; troubleshooting persistent issues; ensuring clean state |
Method Selection Guidance:
Rolling Upgrade (Method 1) is the default choice for most upgrades. Use this for patch releases (e.g., 1.6.0 → 1.6.1) and even minor version upgrades (e.g., 1.4.0 → 1.6.0) where no breaking changes are documented. This method preserves all existing resources and performs an in-place update. Note: This method supports Helm’s automatic rollback (
helm rollback) if the upgrade fails, allowing quick recovery to the previous state.Clean Upgrade (Method 2) is recommended for major version upgrades (e.g., 1.x → 2.x) or when the release notes indicate significant component changes. This method ensures all resources are recreated with the new version, avoiding potential issues with stale configurations. Also use this method when troubleshooting upgrade failures from Method 1.
Full Reinstall (Method 3) should only be used when a completely clean cluster state is required. This includes troubleshooting persistent cluster-level issues, recovering from failed upgrades that cannot be rolled back, or when migrating between significantly different deployment configurations. This method requires verified backups and should be planned for extended downtime.
Upgrade Steps
Method 1: Rolling Upgrade (Recommended)
This method performs an in-place rolling upgrade with minimal downtime. All upgrade commands are executed from the primary server node.
Step 1: Obtain the New Installation ISO
Unmount the old ISO (if mounted) and mount the new installation ISO:
umount /mnt/esb3027 2>/dev/null || true
mount -o loop,ro esb3027-acd-manager-X.Y.Z.iso /mnt/esb3027
Replace X.Y.Z with the target version number.
Step 2: Update Containers and Cluster Software
Run the installation script to update the container images and cluster software:
/mnt/esb3027/install
Wait for the script to complete.
Step 2b: Air-Gapped Environments (If Applicable)
If deploying in an air-gapped environment, also mount and load the extras ISO:
# Mount the Extras ISO
mkdir -p /mnt/esb3027-extras
mount -o loop,ro esb3027-acd-manager-extras-X.Y.Z.iso /mnt/esb3027-extras
# Load container images from the extras ISO
/mnt/esb3027-extras/load-images
Replace X.Y.Z with the target version number.
Step 4: Review and Update Configuration
Compare the default values.yaml from the new ISO with your current configuration:
diff /mnt/esb3027/values.yaml ~/values.yaml
Update your configuration file to include any new required settings. Common updates include:
# ~/values.yaml
global:
hosts:
manager:
- host: manager.example.com
routers:
- name: director-1
address: 192.0.2.1
zitadel:
zitadel:
ExternalDomain: manager.example.com
# Add any new required settings for the target version
Important: Do not modify settings unrelated to the upgrade unless specifically documented in the release notes.
Step 5: Update MaxMind GeoIP Volumes (If Applicable)
If you use MaxMind GeoIP databases, use the utility from the new ISO to create an updated volume:
/mnt/esb3027/generate-maxmind-volume
Update your values.yaml to reference the new volume name:
manager:
maxmindDbVolume: maxmind-geoip-2026-04
Tip: Using dated or versioned volume names (e.g.,
maxmind-geoip-2026-04) allows you to create new volumes during upgrades and delete old ones after verification.
Step 6: Update TLS Certificates (If Needed)
If your TLS certificates need renewal or the new version requires certificate updates, create or update the secret:
kubectl create secret tls acd-manager-tls --cert=tls.crt --key=tls.key --dry-run=client -o yaml | kubectl apply -f -
Step 7: Upgrade the Helm Release
Perform a Helm upgrade with the new chart:
helm upgrade acd-manager /mnt/esb3027/charts/acd-manager \
--values ~/values.yaml
Note: The upgrade performs a rolling update of each deployment in the chart. Deployments are upgraded one at a time, with pods being terminated and recreated sequentially. StatefulSets (PostgreSQL, Kafka, Redis) roll out one pod at a time to maintain data availability.
Monitor the upgrade progress:
kubectl get pods --watch
Wait for all pods to stabilize and show Running status before considering the upgrade complete. Some pods may temporarily enter CrashLoopBackoff during the transition as they wait for dependencies to become available.
Step 8: Verify the Upgrade
Check the deployed version:
helm list
kubectl get deployments -o wide
Verify application functionality:
- Access the MIB Frontend and confirm it loads
- Test API connectivity
- Verify Grafana dashboards are accessible
- Check that Zitadel authentication is working
Step 9: Clean Up
After confirming the upgrade is successful:
Unmount the old ISO (if still mounted):
umount /mnt/esb3027Delete old MaxMind volumes (if replaced):
kubectl get pvc kubectl delete pvc <old-volume-name>Remove old configuration files if no longer needed.
Method 2: Clean Upgrade (Helm Uninstall/Install)
This method removes the existing Helm release before installing the new version. This is useful for major version upgrades or when troubleshooting upgrade issues. All upgrade commands are executed from the primary server node.
Warning: This method causes brief downtime as all resources are deleted before reinstallation.
Step 1: Obtain the New Installation ISO
Mount the new installation ISO:
umount /mnt/esb3027 2>/dev/null || true
mount -o loop,ro esb3027-acd-manager-X.Y.Z.iso /mnt/esb3027
Step 2: Backup Configuration
Save your current Helm values:
helm get values acd-manager -o yaml > ~/values-backup.yaml
Step 3: Uninstall the Existing Release
Remove the existing Helm release:
helm uninstall acd-manager
Wait for pods to terminate:
kubectl get pods --watch
Note: Helm uninstall does not remove PersistentVolumes (PVs) or PersistentVolumeClaims (PVCs). All data stored in PostgreSQL, Kafka, Redis, and Longhorn volumes is preserved during the uninstall process. When the new version is installed, it will reattach to the existing PVCs and restore data automatically.
Step 4: Review and Update Configuration
Compare the default values.yaml from the new ISO with your configuration:
diff /mnt/esb3027/values.yaml ~/values.yaml
Update your configuration file as needed.
Step 5: Install the New Release
Install the new version:
helm install acd-manager /mnt/esb3027/charts/acd-manager \
--values ~/values.yaml
Monitor the deployment:
kubectl get pods --watch
Wait for all pods to stabilize before proceeding.
Step 6: Verify the Upgrade
Verify the upgrade as described in Method 1, Step 8.
Method 3: Full Reinstall (Cluster Rebuild)
This method completely removes Kubernetes and reinstalls from scratch. Use only for cluster rebuilds or when other upgrade methods fail.
Warning: This method causes extended downtime and permanent data loss. The K3s uninstall process destroys all Longhorn PersistentVolumes (PVs) and PersistentVolumeClaims (PVCs). All data stored in PostgreSQL, Kafka, Redis, and application volumes will be permanently lost. Verified backups are required before proceeding.
Warning: This method should only be used when necessary. Ensure you have verified backups before proceeding.
Step 1: Stop Kubernetes Services
On all nodes (server and agent), stop the K3s service:
systemctl stop k3s
Step 2: Uninstall K3s (Server Nodes Only)
On the primary server node first, then each additional server node:
/usr/local/bin/k3s-uninstall.sh
Step 3: Clean Up Residual State (All Nodes)
On all nodes, remove residual state:
/usr/local/bin/k3s-kill-all.sh
rm -rf /var/lib/rancher/k3s/*
Warning: This removes all cluster data including Longhorn PersistentVolumes (PVs) and PersistentVolumeClaims (PVCs). All data stored in PostgreSQL, Kafka, Redis, and application volumes will be permanently lost. Ensure verified backups are available before proceeding.
Step 4: Reinstall K3s Cluster and Deploy Manager
Follow the installation procedure in the Installation Guide to reinstall the cluster and deploy the Helm chart. At this point, you are in the same state as a fresh installation:
- Primary server installation
- Additional server joins (if applicable)
- Agent joins (if applicable)
- Helm chart deployment
Note: The K3s node token is regenerated during reinstallation. Retrieve the new token from /var/lib/rancher/k3s/server/node-token on the primary server after installation if you need to join additional nodes.
Rollback Procedure
Rollback procedures vary by upgrade method:
Method 1 (Rolling Upgrade)
Use Helm’s built-in rollback command:
helm rollback acd-manager
This reverts to the previous Helm release revision automatically.
Or manually redeploy the previous version:
helm upgrade acd-manager /mnt/esb3027-old/helm/charts/acd-manager \
--values ~/values.yaml
Note: If you use multiple --values files for organization, ensure they are specified in the same order as the original installation.
Method 2 (Clean Upgrade)
Reinstall the previous version:
helm uninstall acd-manager
helm install acd-manager /mnt/esb3027-old/helm/charts/acd-manager \
--values ~/values-backup.yaml
Method 3 (Full Reinstall)
Rollback requires repeating the full cluster reinstall procedure using the old installation ISO. Follow Method 3 steps with the previous version’s ISO. Ensure verified backups are available before attempting.
Troubleshooting
Pods Fail to Start
Check pod status and events:
kubectl describe pod <pod-name> kubectl get events --sort-by='.lastTimestamp'Review pod logs:
kubectl logs <pod-name> kubectl logs <pod-name> -p # Previous instance logs
Database Migration Issues
If PostgreSQL migrations fail:
Check Cloudnative PG cluster status:
kubectl get clusters kubectl describe cluster <cluster-name>Review migration job logs:
kubectl get jobs kubectl logs job/<migration-job-name>
Helm Upgrade Fails
If helm upgrade fails:
Check Helm release status:
helm status acd-manager helm history acd-managerReview the error message for specific failures
Attempt rollback if necessary
Post-Upgrade
After a successful upgrade:
- Review the Release Notes for any post-upgrade tasks
- Update monitoring dashboards if new metrics are available
- Test all critical functionality
- Document the upgrade in your change management system
Next Steps
After completing the upgrade:
- Next Steps Guide - Review post-installation tasks
- Operations Guide - Day-to-day operational procedures
- Release Notes - Review new features and changes
5.6 - Next Steps
Overview
After completing the installation of the AgileTV CDN Manager (ESB3027), several post-installation configuration tasks must be performed before the system is ready for production use. This guide walks you through the essential next steps.
Prerequisites
Before proceeding, ensure:
- The CDN Manager Helm chart is successfully deployed
- All pods are in
Runningstatus - You have network access to the cluster hostname or IP
- You have the default credentials available
Step 1: Access Zitadel Console
The first step is to configure user authentication through Zitadel Identity and Access Management (IAM).
Navigate to the Zitadel Console:
https://<manager-host>/ui/consoleReplace
<manager-host>with your configured hostname (e.g.,manager.localormanager.example.com).Important: The
<manager-host>must match the first entry inglobal.hosts.managerfrom your Helm values exactly. Zitadel uses name-based virtual hosting and CORS validation. If the hostname does not match, authentication will fail.Log in with the default administrator credentials (also listed in the Glossary):
- Username:
admin@agiletv.dev - Password:
Password1!
- Username:
Important: If prompted to configure Multi-Factor Authentication (MFA), you must skip this step for now. MFA is not currently supported. Attempting to configure MFA may lock you out of the administrator account.
Security Recommendation: After logging in, create a new administrator account with proper roles. Once verified, disable or delete the default
admin@agiletv.devaccount. For details on required roles and administrator permissions, see Zitadel’s Administrator Documentation.
Step 2: Configure SMTP Settings (Recommended)
Zitadel requires an SMTP server to send email notifications and perform email validations.
In the Zitadel Console, navigate to Settings > Default Settings
Configure the SMTP settings:
- SMTP Host: Your mail server hostname
- SMTP Port: Typically 587 (TLS) or 465 (SSL)
- SMTP Username: Mail account username
- SMTP Password: Mail account password
- Sender Address: Email address for outgoing mail (e.g.,
noreply@example.com)
Save the configuration
Note: Without SMTP configuration, email-based user validation and password recovery features will not function.
Step 3: Create Additional User Accounts
Create user accounts for operators and administrators:
Tip: For detailed guidance on managing users, roles, and permissions in the Zitadel Console, see Zitadel’s User Management Documentation.
In the Zitadel Console, navigate to Users > Add User
Fill in the user details:
- Username: Unique username
- First Name: User’s first name
- Last Name: User’s last name
- Email: User’s email address (this is their login username)
Known Issue: Due to a limitation in this release of Zitadel, the username must match the local part (the portion before the
@) of the email address. For example, if the email isfoo@example.com, the username must befoo.If these do not match, Zitadel may allow login with the mismatched local part while blocking the full email address. For instance, if username is
foobut email isfoo.bar@example.com, login withfoo@example.commay succeed whilefoo.bar@example.comis blocked.Workaround: Always ensure the username matches the email local part exactly.
Important: The following options must be configured:
- Email Verified: Check this box to skip email verification
- Set Initial Password: Enter a temporary password for the user
Note: If you configured SMTP settings in Step 2, the user will receive an email asking to verify their address and set their initial password. If SMTP is not configured, you must check the “Email Verified” box and set an initial password manually, otherwise the user account will not be enabled.
Click Create User
Provide the user with:
- Their username
- The temporary password (if set manually)
- The Zitadel Console URL
Instruct the user to change their password on first login
Step 4: Configure User Roles and Permissions
Zitadel manages roles and permissions for accessing the CDN Manager:
In the Zitadel Console, navigate to Roles
Assign appropriate roles to users:
- Admin: Full administrative access
- Operator: Operational access without administrative functions
- Viewer: Read-only access
To assign a role:
- Select the user
- Click Add Role
- Select the appropriate role
- Save the assignment
Step 5: Access the MIB Frontend
The MIB Frontend is the web-based configuration GUI for CDN operators:
Navigate to the MIB Frontend:
https://<manager-host>/guiLog in using your Zitadel credentials
Verify you can access the configuration interface
Step 6: Verify API Access
Test API connectivity to ensure the system is functioning:
curl -k https://<manager-host>/api/v1/health/ready
Expected response:
{
"status": "ready"
}
See the API Guide for detailed API documentation.
Step 7: Configure TLS Certificates (If Not Done During Installation)
For production deployments, a valid TLS certificate from a trusted Certificate Authority should be configured. If you did not configure TLS certificates during installation, refer to Step 12: Configure TLS Certificates in the Installation Guide.
Step 8: Set Up Monitoring and Alerting
Configure monitoring dashboards and alerting:
Access Grafana:
- Navigate to
https://<manager-host>/grafana - Log in with default credentials (also listed in the Glossary):
- Username:
admin - Password:
edgeware
- Username:
- Navigate to
Review Pre-built Dashboards:
- System health dashboards are included by default
- CDN metrics dashboards show routing and usage statistics
Note: CDN Director instances automatically have DNS names configured for use in Grafana dashboards. The DNS name is derived from the
namefield inglobal.hosts.routerswith.externalappended. For example, a router namedmy-router-1will have the DNS namemy-router-1.externalin Grafana configuration.
Step 9: Verify Kafka and PostgreSQL Health
Ensure the data layer components are healthy:
kubectl get pods
Verify the following pods are running:
| Component | Pod Name Pattern | Expected Status |
|---|---|---|
| Kafka | acd-manager-kafka-controller-* | Running (3 pods for production) |
| PostgreSQL | acd-cluster-postgresql-0, acd-cluster-postgresql-1, acd-cluster-postgresql-2 | Running (3-node HA cluster) |
| Redis | acd-manager-redis-master-* | Running |
All pods should show Running status with no restarts.
Step 10: Configure Availability Zones (Optional)
For improved network performance, configure availability zones to enable Topology Aware Hints. This optimizes service-to-pod routing by keeping traffic within the same zone when possible.
See the Performance Tuning Guide for detailed instructions on:
- Labeling nodes with zone and region topology
- Verifying topology configuration
- Requirements for Topology Aware Hints to activate
- Integration with pod anti-affinity rules
Note: This step is optional. If zone labels are not configured, the system will fall back to random load-balancing.
Step 11: Review System Configuration
Verify the initial configuration:
Review Helm Values:
helm get values acd-manager -o yamlCheck Ingress Configuration:
kubectl get ingressVerify Service Endpoints:
kubectl get endpoints
Step 12: Document Your Deployment
Maintain documentation for your deployment:
- Cluster hostname and IP addresses
- Configuration file locations
- User accounts and roles created
- TLS certificate expiration dates
- Backup procedures and schedules
- Monitoring and alerting contacts
Next Steps
After completing post-installation configuration:
- Configuration Guide - Detailed system configuration options
- Operations Guide - Day-to-day operational procedures
- Metrics & Monitoring Guide - Comprehensive monitoring setup
- API Guide - REST API reference and integration examples
Troubleshooting
Cannot Access Zitadel Console
- Verify DNS resolution or hosts file configuration
- Check that Traefik ingress is running:
kubectl get pods -n kube-system | grep traefik - Review Traefik logs:
kubectl logs -n kube-system -l app.kubernetes.io/name=traefik
Authentication Failures
- Verify Zitadel pods are healthy:
kubectl get pods | grep zitadel - Check Zitadel logs:
kubectl logs <zitadel-pod-name> - Ensure the external domain matches your hostname in Zitadel configuration
MIB Frontend Not Loading
- Verify MIB Frontend pods are running:
kubectl get pods | grep mib-frontend - Check for connectivity issues to Confd and API services
- Review browser console for JavaScript errors
API Returns 401 Unauthorized
- Verify you have a valid bearer token
- Check token expiration
- Ensure Zitadel authentication is functioning
For additional troubleshooting assistance, refer to the Troubleshooting Guide.
6 - Configuration Guide
Overview
The CDN Manager is deployed via Helm chart with configuration supplied through values.yaml files. This guide explains the configuration structure, how to apply changes, and provides a reference for all configurable options.
Configuration Files
Default Configuration
The default values.yaml file is located on the installation ISO at /mnt/esb3027/values.yaml. This file contains all default values and should be copied to a writable location for modification:
cp /mnt/esb3027/values.yaml ~/values.yaml
Important: You only need to specify fields in your custom values.yaml that differ from the default. Helm applies configuration hierarchically:
- Default values from the Helm chart itself
- Values from the default
values.yamlon the ISO - Values from your custom
values.yamlfile(s)
For example, if you only need to change the manager hostname and router addresses, your custom values.yaml might contain only:
global:
hosts:
manager:
- host: manager.example.com
routers:
- name: default
address: 192.0.2.1
All other configuration values will be inherited from the default values.yaml on the ISO. This approach simplifies upgrades, as you only maintain your customizations.
Configuration Merging
Helm merges configuration files from left to right, with later files overriding earlier values. This allows you to:
- Maintain a base configuration with common settings
- Create environment-specific override files
- Keep the default chart values for unchanged settings
# Multiple files merged left-to-right
helm install acd-manager /mnt/esb3027/charts/acd-manager \
--values ~/values-base.yaml \
--values ~/values-production.yaml \
--values ~/values-tls.yaml
Individual Value Overrides
For temporary changes, you can override individual values with --set:
helm upgrade acd-manager /mnt/esb3027/helm/charts/acd-manager \
--values ~/values.yaml \
--set manager.logLevel=debug
Note: Using --set is discouraged for permanent changes, as the same arguments must be specified for every Helm operation.
Applying Configuration
Initial Installation
helm install acd-manager /mnt/esb3027/charts/acd-manager \
--values ~/values.yaml
Updating Configuration
helm upgrade acd-manager /mnt/esb3027/helm/charts/acd-manager \
--values ~/values.yaml
Dry Run
Before applying changes, validate the configuration with a dry run:
helm upgrade acd-manager /mnt/esb3027/helm/charts/acd-manager \
--values ~/values.yaml \
--dry-run
Rollback
If an upgrade fails, rollback to the previous revision:
# View revision history
helm history acd-manager
# Rollback to previous revision
helm rollback acd-manager
# Rollback to specific revision
helm rollback acd-manager <revision_number>
Note: Rollback reverts the Helm release but does not modify your values.yaml file. You must manually revert configuration file changes.
Force Reinstall
If an upgrade fails and rollback is not sufficient, you can perform a clean reinstall:
helm uninstall acd-manager
helm install acd-manager /mnt/esb3027/charts/acd-manager \
--values ~/values.yaml
Warning: This is service-affecting as all pods will be destroyed and recreated.
Configuration Reference
Global Settings
The global section contains cluster-wide settings. The most critical configuration is global.hosts.
global:
hosts:
manager:
- host: manager.local
routers:
- name: default
address: 127.0.0.1
edns_proxy: []
geoip: []
| Key | Type | Description |
|---|---|---|
global.hosts.manager | Array | External IP addresses or DNS hostnames for all Manager cluster nodes |
global.hosts.routers | Array | CDN Director (ESB3024) instances |
global.hosts.edns_proxy | Array | EDNS Proxy addresses (currently unused) |
global.hosts.geoip | Array | GeoIP Proxy addresses for Frontend GUI |
Important: The first entry in global.hosts.manager must match zitadel.zitadel.ExternalDomain exactly. Zitadel enforces CORS protection, and authentication will fail if these do not match.
Manager Configuration
Core Manager API server settings:
| Key | Type | Default | Description |
|---|---|---|---|
manager.image.registry | String | ghcr.io | Container image registry |
manager.image.repository | String | edgeware/acd-manager | Container image repository |
manager.image.tag | String | Image tag override (uses latest if empty) | |
manager.logLevel | String | info | Log level (trace, debug, info, warn, error) |
manager.replicaCount | Number | 1 | Number of replicas (HPA manages this when enabled) |
manager.containerPorts.http | Number | 80 | HTTP container port |
manager.maxmindDbVolume | String | Name of PVC containing MaxMind GeoIP databases |
Manager Resources
The chart supports both resource presets and explicit resource specifications:
| Key | Type | Default | Description |
|---|---|---|---|
manager.resourcesPreset | String | `` (empty) | Resource preset (see Resource Presets table). Ignored if manager.resources is set. |
manager.resources.requests.cpu | String | 300m | CPU request |
manager.resources.requests.memory | String | 512Mi | Memory request |
manager.resources.limits.cpu | String | 1 | CPU limit |
manager.resources.limits.memory | String | 1Gi | Memory limit |
Note: For production workloads, explicitly set manager.resources rather than using presets.
Manager Datastore
manager:
datastore:
type: redis
namespace: "cdn_manager_ds"
default_ttl: ""
compression: zstd
| Key | Type | Default | Description |
|---|---|---|---|
manager.datastore.type | String | redis | Datastore backend type |
manager.datastore.namespace | String | cdn_manager_ds | Redis namespace for manager data |
manager.datastore.default_ttl | String | `` (empty) | Default TTL for entries |
manager.datastore.compression | String | zstd | Compression algorithm (none, zstd, etc.) |
Manager Discovery
manager:
discovery: []
# Example:
# - namespace: "other"
# hosts:
# - other-host1
# - other-host2
# pattern: "other-.*"
| Key | Type | Description |
|---|---|---|
manager.discovery | Array | Array of discovery host configurations. Each entry can specify hosts (list of hostnames), pattern (regex pattern), or both |
Manager Tuning
manager:
tuning:
enable_cache_control: true
cache_control_max_age: "5m"
cache_control_miss_max_age: ""
| Key | Type | Default | Description |
|---|---|---|---|
manager.tuning.enable_cache_control | Boolean | true | Enable cache control headers in responses |
manager.tuning.cache_control_max_age | String | 5m | Maximum age for cache control headers |
manager.tuning.cache_control_miss_max_age | String | `` (empty) | Maximum age for cache control headers on cache misses |
Manager Container Arguments
manager:
args:
- --config-file=/etc/manager/config.toml
- http-server
Gateway Configuration
NGinx Gateway settings for external Director communication:
| Key | Type | Default | Description |
|---|---|---|---|
gateway.replicaCount | Number | 1 | Number of gateway replicas |
gateway.resources.requests.cpu | String | 100m | CPU request |
gateway.resources.requests.memory | String | 128Mi | Memory request |
gateway.resources.limits.cpu | String | 150m | CPU limit |
gateway.resources.limits.memory | String | 192Mi | Memory limit |
gateway.service.type | String | ClusterIP | Service type |
MIB Frontend Configuration
Web-based configuration GUI settings:
| Key | Type | Default | Description |
|---|---|---|---|
mib-frontend.enabled | Boolean | true | Enable the frontend GUI |
mib-frontend.frontend.resourcePreset | String | nano | Resource preset |
mib-frontend.frontend.autoscaling.hpa.enabled | Boolean | true | Enable HPA |
mib-frontend.frontend.autoscaling.hpa.minReplicas | Number | 2 | Minimum replicas |
mib-frontend.frontend.autoscaling.hpa.maxReplicas | Number | 4 | Maximum replicas |
Confd Configuration
Confd settings for configuration management:
| Key | Type | Default | Description |
|---|---|---|---|
confd.enabled | Boolean | true | Enable Confd |
confd.service.ports.internal | Number | 15000 | Internal service port |
VictoriaMetrics Configuration
Time-series database for metrics:
| Key | Type | Default | Description |
|---|---|---|---|
acd-metrics.enabled | Boolean | true | Enable metrics components |
acd-metrics.victoria-metrics-single.enabled | Boolean | true | Enable VictoriaMetrics |
acd-metrics.grafana.enabled | Boolean | true | Enable Grafana |
acd-metrics.telegraf.enabled | Boolean | true | Enable Telegraf |
acd-metrics.prometheus.enabled | Boolean | true | Enable Prometheus metrics |
Ingress Configuration
Traffic exposure settings:
| Key | Type | Default | Description |
|---|---|---|---|
ingress.enabled | Boolean | true | Enable ingress record generation |
ingress.pathType | String | Prefix | Ingress path type |
ingress.hostname | String | `` (empty) | Primary hostname (defaults to manager.local via global.hosts) |
ingress.path | String | /api | Default path for ingress |
ingress.tls | Boolean | false | Enable TLS configuration |
ingress.selfSigned | Boolean | false | Generate self-signed certificate via Helm |
ingress.secrets | Array | Custom TLS certificate secrets |
Ingress Extra Paths
The chart includes default extra paths for Confd and GeoIP:
ingress:
extraPaths:
- path: /confd
pathType: Prefix
backend:
service:
name: acd-manager-gateway
port:
name: http
- path: /geoip
pathType: Prefix
backend:
service:
name: acd-manager-gateway
port:
name: http
TLS Certificate Secrets
For production TLS certificates:
ingress:
secrets:
- name: manager.local-tls
key: |-
-----BEGIN RSA PRIVATE KEY-----
...
-----END RSA PRIVATE KEY-----
certificate: |-
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
tls: true
Resource Configuration
Resource Presets
Predefined resource configurations for common deployment sizes:
| Preset | Request CPU | Request Memory | Limit CPU | Limit Memory | Ephemeral Storage Limit |
|---|---|---|---|---|---|
nano | 100m | 128Mi | 150m | 192Mi | 2Gi |
micro | 250m | 256Mi | 375m | 384Mi | 2Gi |
small | 500m | 512Mi | 750m | 768Mi | 2Gi |
medium | 500m | 1024Mi | 750m | 1536Mi | 2Gi |
large | 1000m | 2048Mi | 1500m | 3072Mi | 2Gi |
xlarge | 1000m | 3072Mi | 3000m | 6144Mi | 2Gi |
2xlarge | 1000m | 3072Mi | 6000m | 12288Mi | 2Gi |
Note: Limits are calculated as requests plus 50% (except for xlarge/2xlarge and ephemeral-storage).
Custom Resources
Override preset with custom values:
manager:
resources:
requests:
cpu: "300m"
memory: "512Mi"
limits:
cpu: "1"
memory: "1Gi"
Note:
- CPU values use millicores (1000m = 1 core)
- Memory values use binary SI units (1024Mi = 1GiB)
- Requests represent minimum guaranteed resources
- Limits represent maximum consumable resources
Capacity Planning
When sizing resources:
- Requests determine scheduling (node must have available capacity)
- Limits prevent resource starvation
- Maintain 20-30% cluster headroom for scaling
- Total capacity = sum of all requests × replica count + headroom
Security Contexts
Pod Security Context
manager:
podSecurityContext:
enabled: true
fsGroup: 1001
fsGroupChangePolicy: Always
sysctls: []
supplementalGroups: []
Container Security Context
manager:
containerSecurityContext:
enabled: true
runAsUser: 1001
runAsGroup: 1001
runAsNonRoot: true
readOnlyRootFilesystem: true
privileged: false
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
seccompProfile:
type: "RuntimeDefault"
Health Probes
Probe Types
| Probe | Purpose | Failure Action |
|---|---|---|
startupProbe | Initial startup verification | Container restart |
readinessProbe | Traffic readiness check | Remove from load balancer |
livenessProbe | Health monitoring | Container restart |
Default Probe Configuration
Liveness Probe
manager:
livenessProbe:
enabled: true
initialDelaySeconds: 5
periodSeconds: 30
timeoutSeconds: 10
failureThreshold: 5
successThreshold: 1
httpGet:
path: /api/v1/health/alive
port: http
Readiness Probe
manager:
readinessProbe:
enabled: true
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 7
failureThreshold: 3
successThreshold: 1
httpGet:
path: /api/v1/health/ready
port: http
Startup Probe
manager:
startupProbe:
enabled: true
initialDelaySeconds: 0
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 10
successThreshold: 1
httpGet:
path: /api/v1/health/alive
port: http
Autoscaling Configuration
Horizontal Pod Autoscaler (HPA)
manager:
autoscaling:
hpa:
enabled: true
minReplicas: 3
maxReplicas: 8
targetCPU: 50
targetMemory: 80
| Key | Type | Default | Description |
|---|---|---|---|
manager.autoscaling.hpa.enabled | Boolean | true | Enable HPA |
manager.autoscaling.hpa.minReplicas | Number | 3 | Minimum number of replicas |
manager.autoscaling.hpa.maxReplicas | Number | 8 | Maximum number of replicas |
manager.autoscaling.hpa.targetCPU | Number | 50 | Target CPU utilization percentage |
manager.autoscaling.hpa.targetMemory | Number | 80 | Target Memory utilization percentage |
Network Policy
networkPolicy:
enabled: true
allowExternal: true
allowExternalEgress: true
addExternalClientAccess: true
| Key | Type | Default | Description |
|---|---|---|---|
networkPolicy.enabled | Boolean | true | Enable NetworkPolicy |
networkPolicy.allowExternal | Boolean | true | Allow connections from any source (don’t require pod label) |
networkPolicy.allowExternalEgress | Boolean | true | Allow pod to access any range of port and destinations |
networkPolicy.addExternalClientAccess | Boolean | true | Allow access from pods with client label set to “true” |
Pod Affinity and Anti-Affinity
manager:
podAffinityPreset: ""
podAntiAffinityPreset: soft
nodeAffinityPreset:
type: ""
key: ""
values: []
affinity: {}
| Key | Type | Default | Description |
|---|---|---|---|
manager.podAffinityPreset | String | `` (empty) | Pod affinity preset (soft or hard). Ignored if affinity is set |
manager.podAntiAffinityPreset | String | soft | Pod anti-affinity preset (soft or hard). Ignored if affinity is set |
manager.nodeAffinityPreset.type | String | `` (empty) | Node affinity preset type (soft or hard) |
manager.affinity | Object | {} | Custom affinity rules (overrides presets) |
Service Configuration
service:
type: ClusterIP
ports:
http: 80
annotations:
service.kubernetes.io/topology-mode: Auto
externalTrafficPolicy: Cluster
sessionAffinity: None
| Key | Type | Default | Description |
|---|---|---|---|
service.type | String | ClusterIP | Service type |
service.ports.http | Number | 80 | HTTP service port |
service.annotations | Object | service.kubernetes.io/topology-mode: Auto | Service annotations |
service.externalTrafficPolicy | String | Cluster | External traffic policy |
Persistence Configuration
persistence:
enabled: false
mountPath: /agiletv/manager/data
storageClass: ""
accessModes:
- ReadWriteOnce
size: 8Gi
| Key | Type | Default | Description |
|---|---|---|---|
persistence.enabled | Boolean | false | Enable persistence using PVC |
persistence.mountPath | String | /agiletv/manager/data | Mount path |
persistence.storageClass | String | `` (empty) | Storage class (uses cluster default if empty) |
persistence.size | String | 8Gi | Size of data volume |
RBAC and Service Account
rbac:
create: false
rules: []
serviceAccount:
create: true
name: ""
automountServiceAccountToken: true
annotations: {}
Metrics
metrics:
enabled: false
serviceMonitor:
enabled: false
namespace: ""
annotations: {}
labels: {}
interval: ""
scrapeTimeout: ""
| Key | Type | Default | Description |
|---|---|---|---|
metrics.enabled | Boolean | false | Enable Prometheus metrics export |
metrics.serviceMonitor.enabled | Boolean | false | Create Prometheus Operator ServiceMonitor |
Next Steps
After configuration:
- Installation Guide - Deploy with your configuration
- Operations Guide - Day-to-day management
- Performance Tuning Guide - Optimize system performance
- Architecture Guide - Understand component relationships
7 - Performance Tuning Guide
Overview
This guide provides performance tuning recommendations for the AgileTV CDN Manager (ESB3027). While the default configuration is suitable for most deployments, certain environments may benefit from additional optimizations.
Network Topology Optimization
Topology Aware Hints
The CDN Manager uses Kubernetes Topology Aware Hints to prefer routing pods in the same zone as the source of network traffic. This reduces cross-zone latency and improves overall system responsiveness.
How It Works
When nodes are labeled with topology zones, Kubernetes automatically routes traffic to pods in the same zone when possible. This is particularly beneficial for:
- Low-latency requirements: Keeps traffic local to reduce round-trip time
- Cost optimization: Reduces cross-zone data transfer costs in cloud environments
- Load distribution: Prevents hotspots by distributing load across zones
Configuring Availability Zones
Each node must have zone and region labels applied for Topology Aware Hints to function:
# Label a node with a zone
kubectl label nodes <node-name> topology.kubernetes.io/zone=us-east-1a
# Label a node with a region
kubectl label nodes <node-name> topology.kubernetes.io/region=us-east-1
Replace <node-name> with your actual node names and adjust the zone/region values to match your deployment geography.
Note: Labels applied via kubectl label are automatically persistent and will survive node restarts.
Verify Topology Configuration
Verify labels are applied:
kubectl get nodes --show-labels | grep topology.kubernetes.io
Verify EndpointSlices are being generated with hints:
kubectl get endpointslices
Requirements for Topology Aware Hints
For Topology Aware Hints to activate:
- Minimum Nodes: At least one node must be labeled with each zone referenced by endpoints
- Symmetry: The control plane checks for sufficient CPU capacity across zones to balance traffic
- Zone Coverage: All zones with endpoints should have at least one ready node
Integration with Pod Anti-Affinity
Topology labels complement the pod anti-affinity rules already configured in the Helm chart:
- Pod Anti-Affinity: Handles pod-to-node placement to ensure high availability
- Topology Aware Hints: Handles service-to-pod traffic routing to keep requests within the same zone
Together, these features optimize both placement and routing for improved performance.
Fallback Behavior
If zone labels are not configured, the system falls back to random load-balancing across all available pods. This is functionally correct but may result in:
- Increased cross-zone traffic
- Higher latency for some requests
- Less predictable performance characteristics
Kernel Network Tuning (sysctl)
For high-throughput deployments, tuning Linux kernel network parameters can significantly improve connection handling and overall system performance. These settings are particularly beneficial for environments with high connection rates or large numbers of concurrent connections.
Recommended sysctl Settings
Apply the following settings to optimize network performance:
# Networking
net.core.somaxconn = 1024
net.core.netdev_max_backlog = 2048
net.ipv4.tcp_max_syn_backlog = 2048
# Connection Tracking
net.netfilter.nf_conntrack_max = 131072
net.netfilter.nf_conntrack_tcp_timeout_established = 1200
# Port Reuse
net.ipv4.ip_local_port_range = 10240 65535
net.ipv4.tcp_tw_reuse = 1
# Memory Buffers
net.core.rmem_max = 8388608
net.core.wmem_max = 8388608
Setting Descriptions
| Parameter | Recommended Value | Purpose |
|---|---|---|
net.core.somaxconn | 1024 | Maximum socket listen backlog. Increases pending connection queue size. |
net.core.netdev_max_backlog | 2048 | Maximum packets queued at network device level. Helps handle burst traffic. |
net.ipv4.tcp_max_syn_backlog | 2048 | Maximum SYN requests queued. Improves handling of connection floods. |
net.netfilter.nf_conntrack_max | 131072 | Maximum tracked connections. Prevents connection tracking table exhaustion. |
net.netfilter.nf_conntrack_tcp_timeout_established | 1200 | Timeout for established connections (seconds). Reduces stale entry buildup. |
net.ipv4.ip_local_port_range | 10240 65535 | Range of local ports for outbound connections. Expands available ephemeral ports. |
net.ipv4.tcp_tw_reuse | 1 | Allows reusing TIME_WAIT sockets. Reduces port exhaustion under high load. |
net.core.rmem_max | 8388608 | Maximum receive socket buffer size (8MB). Improves high-bandwidth transfers. |
net.core.wmem_max | 8388608 | Maximum send socket buffer size (8MB). Improves high-bandwidth transfers. |
Applying Settings
Temporary (Until Reboot)
Apply settings immediately but they will be lost on reboot:
sudo sysctl -w net.core.somaxconn=1024
sudo sysctl -w net.core.netdev_max_backlog=2048
# ... repeat for each parameter
Persistent (Across Reboots)
Add settings to /etc/sysctl.conf or a file in /etc/sysctl.d/:
# Create a dedicated config file
cat <<EOF | sudo tee /etc/sysctl.d/99-cdn-manager.conf
# CDN Manager Network Tuning
net.core.somaxconn = 1024
net.core.netdev_max_backlog = 2048
net.ipv4.tcp_max_syn_backlog = 2048
net.netfilter.nf_conntrack_max = 131072
net.netfilter.nf_conntrack_tcp_timeout_established = 1200
net.ipv4.ip_local_port_range = 10240 65535
net.ipv4.tcp_tw_reuse = 1
net.core.rmem_max = 8388608
net.core.wmem_max = 8388608
EOF
# Apply all settings
sudo sysctl -p /etc/sysctl.d/99-cdn-manager.conf
Kubernetes Considerations
For Kubernetes deployments, these sysctl settings can be applied via:
- Node-level configuration: Use DaemonSets or node provisioning scripts
- Pod-level safe sysctls: Some sysctls can be set per-pod via
securityContext.sysctls - Container runtime configuration: Configure via container runtime options
Note that some sysctls require privileged containers or node-level configuration.
Monitoring Impact
After applying these settings, monitor:
- Connection establishment rates
- TIME_WAIT socket count:
netstat -n | grep TIME_WAIT | wc -l - Connection tracking table usage:
cat /proc/sys/net/netfilter/nf_conntrack_count - Network buffer utilization via Grafana dashboards
Resource Configuration
Horizontal Pod Autoscaler (HPA)
The default HPA configuration is tuned for production workloads. For environments with variable load, consider adjusting the scale metrics:
| Component | Default Scale Metrics | Tuning Consideration |
|---|---|---|
| Core Manager | CPU 50%, Memory 80% | Lower CPU threshold for faster scale-out |
| NGinx Gateway | CPU 75%, Memory 80% | Increase for cost optimization |
| MIB Frontend | CPU 75%, Memory 90% | Adjust based on operator concurrency |
For detailed HPA configuration, see the Architecture Guide.
Resource Requests and Limits
Ensure resource requests and limits are appropriately sized for your workload. Under-provisioned resources can cause:
- Pod evictions during high load
- Increased latency due to CPU throttling
- Slow scaling responses
Refer to the Configuration Guide for preset configurations and planning guidance.
Database Optimization
PostgreSQL
The PostgreSQL cluster is managed by the Cloudnative PG operator. For improved performance:
- Connection Pooling: The application uses connection pooling by default
- Replica Usage: Read queries can be offloaded to replicas for read-heavy workloads
- Backup Scheduling: Schedule backups during low-traffic periods to minimize I/O impact
Redis
Redis provides in-memory caching for sessions and ephemeral state:
- Memory Allocation: Ensure sufficient memory for cache hit rates
- Persistence: RDB snapshots are enabled; adjust frequency based on durability needs
Kafka
Kafka handles event streaming for selection input and metrics:
- Partition Count: Default partitions are sized for typical workloads
- Replication Factor: Production deployments use 3 replicas for fault tolerance
- Consumer Groups: The Selection Input Worker is limited to one consumer per partition
Monitoring Performance
Key Metrics to Watch
Monitor the following metrics for performance insights:
- API Response Time: Track via Grafana dashboards
- Pod CPU/Memory Usage: Identify resource bottlenecks
- Kafka Lag: Monitor consumer lag for selection input processing
- Database Connections: Watch for connection pool exhaustion
Grafana Dashboards
Pre-built dashboards are available at https://<manager-host>/grafana:
- System Health: Overall cluster and application health
- CDN Metrics: Routing and usage statistics
- Resource Utilization: CPU, memory, and network usage per component
Troubleshooting Performance Issues
High Latency
- Check pod distribution across nodes:
kubectl get pods -o wide - Verify topology labels are applied:
kubectl get nodes --show-labels - Review network latency between nodes
- Check for resource contention:
kubectl top pods
Slow Scaling
- Verify HPA is enabled:
kubectl get hpa - Check cluster capacity for scheduling new pods
- Review HPA metrics:
kubectl describe hpa acd-manager
Database Performance
- Check PostgreSQL cluster status:
kubectl get pods -l app=postgresql - Review slow query logs (if enabled)
- Monitor connection pool usage
Next Steps
After reviewing performance tuning:
- Architecture Guide - Understand component interactions
- Configuration Guide - Detailed configuration options
- Metrics & Monitoring Guide - Comprehensive monitoring setup
- Troubleshooting Guide - Resolve performance issues
8 - Operations Guide
Overview
This guide covers day-to-day operational procedures for managing the AgileTV CDN Manager (ESB3027). Topics include routine maintenance, backup procedures, log management, and common operational tasks.
Prerequisites
Before performing operations, ensure you have:
kubectlaccess to the clusterhelmCLI installed- Access to the node where
values.yamlis stored - Appropriate RBAC permissions for administrative tasks
Cluster Access
There are two supported methods for accessing the Kubernetes cluster:
- SSH to a Server Node (Recommended for operations staff) - SSH into any Server node and run
kubectlcommands directly - Remote kubectl - Install
kubectlon your local machine and configure it to connect to the cluster remotely
Method 1: SSH to Server Node (Recommended)
The kubectl command-line tool is pre-configured on all Server nodes and can be used directly without additional setup:
# SSH to any Server node
ssh root@<server-ip>
# Run kubectl commands directly
kubectl get nodes
kubectl get pods
This method is recommended for day-to-day operations as it requires no local configuration and provides direct access to the cluster.
Method 2: Remote kubectl from Local Machine
To use kubectl from your local workstation or laptop:
Step 1: Install kubectl
Download and install kubectl for your operating system:
- Official Documentation: Install kubectl
- macOS (Homebrew):
brew install kubectl - Linux: Download from the official Kubernetes release page
- Windows: Download from the official Kubernetes release page
Step 2: Copy kubeconfig from Server Node
# Copy kubeconfig from any Server node
scp root@<server-ip>:/etc/rancher/k3s/k3s.yaml ~/.kube/config
Step 3: Update kubeconfig
Edit the kubeconfig file to point to the correct server address:
# Replace localhost with the actual server IP
# macOS/Linux:
sed -i '' 's/127.0.0.1/<server-ip>/g' ~/.kube/config # macOS
sed -i 's/127.0.0.1/<server-ip>/g' ~/.kube/config # Linux
# Or manually edit ~/.kube/config and change:
# server: https://127.0.0.1:6443
# to:
# server: https://<server-ip>:6443
Step 4: Verify connectivity
kubectl get nodes
Managing Multiple Clusters
If you manage multiple Kubernetes clusters from the same machine, you can maintain multiple kubeconfig files:
# Set KUBECONFIG environment variable to include multiple config files
export KUBECONFIG=~/.kube/config-prod:~/.kube/config-lab
# View all contexts
kubectl config get-contexts
# Switch between clusters
kubectl config use-context <context-name>
# View current context
kubectl config current-context
For more information, see the official Kubernetes documentation: Organizing Cluster Access
Helm Commands
Helm releases are managed cluster-wide:
# List all releases
helm list
# View release history
helm history acd-manager
# Get deployed values
helm get values acd-manager -o yaml
# Get deployed manifest
helm get manifest acd-manager
Note: If using remote kubectl, ensure helm is installed on your local machine. See Helm Installation for instructions.
Helm Commands
Helm releases are managed cluster-wide:
# List all releases
helm list
# View release history
helm history acd-manager
# Get deployed values
helm get values acd-manager -o yaml
# Get deployed manifest
helm get manifest acd-manager
Backup Procedures
PostgreSQL Backup
PostgreSQL is managed by the Cloudnative PG operator, which provides continuous backup capabilities.
# Check backup status
kubectl get backup
# Create manual backup
kubectl apply -f - <<EOF
apiVersion: postgresql.cnpg.io/v1
kind: Backup
metadata:
name: manual-backup-$(date +%Y%m%d-%H%M%S)
spec:
cluster:
name: acd-cluster-postgresql
EOF
# List available backups
kubectl get backup -o wide
# Restore from backup (requires downtime)
# See Upgrade Guide for restore procedures
Longhorn Volume Backups
Longhorn provides snapshot and backup capabilities for persistent volumes:
# List all volumes
kubectl get volumes -n longhorn-system
# Create snapshot via Longhorn UI
# Port-forward to Longhorn UI (do not expose via ingress)
kubectl port-forward -n longhorn-system svc/longhorn-frontend 8080:80
# Access: http://localhost:8080
# WARNING: Longhorn UI grants access to sensitive storage information
# and should never be exposed through the ingress controller
Accessing Internal Services
For debugging and troubleshooting, you may need direct access to internal services.
PostgreSQL
PostgreSQL is managed by the Cloudnative PG operator. Connection details are stored in the acd-cluster-postgresql-app Secret:
# View connection details
kubectl describe secret acd-cluster-postgresql-app
# Extract individual fields
PG_HOST=$(kubectl get secret acd-cluster-postgresql-app -o jsonpath='{.data.host}' | base64 -d)
PG_USER=$(kubectl get secret acd-cluster-postgresql-app -o jsonpath='{.data.username}' | base64 -d)
PG_PASS=$(kubectl get secret acd-cluster-postgresql-app -o jsonpath='{.data.password}' | base64 -d)
PG_DB=$(kubectl get secret acd-cluster-postgresql-app -o jsonpath='{.data.dbname}' | base64 -d)
# Connect via psql
kubectl exec -it acd-cluster-postgresql-0 -- psql -U $PG_USER -d $PG_DB
Secret fields: The CNPG operator populates the following fields: username, password, host, port, dbname, uri, jdbc-uri, fqdn-uri, fqdn-jdbc-uri, pgpass.
Redis
Redis runs on port 6379 with no authentication:
# Connect via redis-cli
kubectl exec -it acd-manager-redis-master-0 -- redis-cli
# Or connect from another pod
kubectl run redis-test --rm -it --image=redis -- redis-cli -h acd-manager-redis-master
Kafka
Kafka is accessible on port 9095 from any cluster node:
# Connect from within cluster
kubectl exec -it acd-manager-kafka-controller-0 -- kafka-topics.sh --bootstrap-server localhost:9092 --list
# Connect from external (via any node IP)
kafka-topics.sh --bootstrap-server <node-ip>:9095 --list
The selection_input topic is pre-configured for selection input events.
Longhorn Storage
Longhorn is a distributed block storage system for Kubernetes that provides persistent volumes for stateful applications such as PostgreSQL and Kafka.
Architecture
Longhorn deploys controller and replica engines on each node, forming a distributed storage system. When a volume is created, Longhorn replicates data across multiple nodes to ensure durability even in the event of node failures.
Storage Protocols:
- iSCSI: Used for standard Read-Write-Once (RWO) volumes
- NFS: Used for Read-Write-Many (RWX) volumes that can be mounted by multiple pods simultaneously
Configuration
The CDN Manager deploys Longhorn with a single replica configuration, which differs from the Longhorn default of 3 replicas. This configuration is optimized for the cluster architecture where:
- Pod-node affinity is configured to schedule pods on the same node as their persistent volume data
- This optimizes I/O performance by reducing network traffic
- Data locality is maintained while still providing volume portability
Capacity Planning
Longhorn storage requires an additional 30% capacity headroom for internal operations and scaling. If less than 30% of the total partition capacity is available, Longhorn may mark volumes as “full” and prevent further writes.
For detailed storage requirements and disk partitioning guidance, see the System Requirements Guide.
Configuration Backup
Always backup your Helm values before making changes:
# Export current values
helm get values acd-manager -o yaml > ~/values-backup-$(date +%Y%m%d).yaml
# Backup custom values files
cp ~/values.yaml ~/values-backup-$(date +%Y%m%d).yaml
Backup Schedule Recommendations
| Component | Frequency | Retention |
|---|---|---|
| PostgreSQL | Daily | 30 days |
| Longhorn Snapshots | Before changes | 7 days |
| Configuration | Before each change | Indefinite |
Updating MaxMind GeoIP Databases
The MaxMind GeoIP databases (GeoIP2-City, GeoLite2-ASN, GeoIP2-Anonymous-IP) are used for GeoIP-based routing and validation features. These databases should be updated periodically to ensure accurate IP geolocation data.
Prerequisites
- Updated MaxMind database files (
.mmdbformat) obtained from MaxMind - Access to the cluster via
kubectl - Helm CLI installed
Update Procedure
Step 1: Create New Volume with Updated Databases
Run the volume generation utility with a unique volume name that includes a revision identifier:
# Mount the installation ISO if not already mounted
mkdir -p /mnt/esb3027
mount -o loop,ro esb3027-acd-manager-X.Y.Z.iso /mnt/esb3027
# Generate new volume with updated databases
/mnt/esb3027/generate-maxmind-volume
When prompted:
- Provide the paths to the three database files:
GeoIP2-City.mmdbGeoLite2-ASN.mmdbGeoIP2-Anonymous-IP.mmdb
- Enter a unique volume name with a revision number or date, for example:
maxmind-geoip-2026-04maxmind-geoip-v2
Tip: Using a revision-based naming convention simplifies rollback if needed.
Step 2: Update Helm Configuration
Edit your values.yaml file to reference the new volume:
manager:
maxmindDbVolume: maxmind-geoip-2026-04
Replace maxmind-geoip-2026-04 with the volume name you specified in Step 1.
Step 3: Apply Configuration Update
Upgrade the Helm release with the updated configuration:
helm upgrade acd-manager /mnt/esb3027/charts/acd-manager --values ~/values.yaml
Step 4: Rolling Restart (Optional)
To ensure all pods immediately use the new database files, perform a rolling restart of the manager deployment:
kubectl rollout restart deployment acd-manager
Monitor the rollout status:
kubectl rollout status deployment acd-manager
Step 5: Verify Update
Verify the pods are running with the new volume:
kubectl get pods
kubectl describe pod -l app.kubernetes.io/component=manager | grep -A 5 "Volumes"
Step 6: Clean Up Old Volume (Optional)
After verifying the new databases are working correctly, you can delete the old persistent volume:
# List persistent volumes to find the old one
kubectl get pv
# Delete the old volume
kubectl delete pv <old-volume-name>
Caution: Ensure the new volume is functioning correctly before deleting the old volume. Keep the old volume for at least 24-48 hours as a rollback option.
Rollback Procedure
If issues occur after updating the databases:
- Revert the
maxmindDbVolumevalue in yourvalues.yamlto the previous volume name - Run
helm upgradewith the reverted configuration - Optionally restart the deployment:
kubectl rollout restart deployment acd-manager
Update Frequency Recommendations
| Database | Recommended Update Frequency |
|---|---|
| GeoIP2-City | Weekly or monthly |
| GeoLite2-ASN | Monthly |
| GeoIP2-Anonymous-IP | Weekly or monthly |
MaxMind releases database updates on a regular schedule. Subscribe to MaxMind notifications to stay informed of new releases.
Log Management
Application Logs
# View manager logs
kubectl logs -l app.kubernetes.io/component=manager
# Follow logs in real-time
kubectl logs -l app.kubernetes.io/component=manager -f
# View logs from specific pod
kubectl logs <pod-name>
# View previous instance logs (after crash)
kubectl logs <pod-name> -p
# View logs with timestamps
kubectl logs <pod-name> --timestamps
# View logs from all containers in pod
kubectl logs <pod-name> --all-containers
Component-Specific Logs
# Zitadel logs
kubectl logs -l app.kubernetes.io/name=zitadel
# Gateway logs
kubectl logs -l app.kubernetes.io/component=gateway
# Confd logs
kubectl logs -l app.kubernetes.io/component=confd
# MIB Frontend logs
kubectl logs -l app.kubernetes.io/component=mib-frontend
# PostgreSQL logs
kubectl logs -l app.kubernetes.io/name=postgresql
# Kafka logs
kubectl logs -l app.kubernetes.io/name=kafka
# Redis logs
kubectl logs -l app.kubernetes.io/name=redis
Log Aggregation
Logs are collected by Telegraf and sent to VictoriaMetrics:
# Access Grafana for log visualization
# https://<manager-host>/grafana
# Query logs via Grafana Explore
# Select VictoriaMetrics datasource and use log queries
Log Rotation
Container logs are automatically rotated by Kubernetes:
- Default max size: 10MB per container
- Default max files: 5 rotated files
- Total per pod: ~50MB maximum
Scaling Operations
Manual Scaling
Note: If HPA (Horizontal Pod Autoscaler) is enabled for a deployment, manual scaling changes will be overridden by the HPA. To manually scale, you must first disable the HPA.
# Check if HPA is enabled
kubectl get hpa
# Disable HPA before manual scaling
kubectl patch hpa acd-manager -p '{"spec": {"minReplicas": null, "maxReplicas": null}}'
# Or delete the HPA entirely
kubectl delete hpa acd-manager
# Scale manager replicas
kubectl scale deployment acd-manager --replicas=3
# Scale gateway replicas
kubectl scale deployment acd-manager-gateway --replicas=2
# Scale MIB frontend replicas
kubectl scale deployment acd-manager-mib-frontend --replicas=2
HPA Configuration
# View HPA status
kubectl get hpa
# Describe HPA details
kubectl describe hpa acd-manager
# Edit HPA configuration
kubectl edit hpa acd-manager
Configuration Updates
Updating Helm Values
# Edit values file
vi ~/values.yaml
# Validate with dry-run
helm upgrade acd-manager /mnt/esb3027/charts/acd-manager \
--values ~/values.yaml \
--dry-run
# Apply changes
helm upgrade acd-manager /mnt/esb3027/charts/acd-manager \
--values ~/values.yaml
# Verify rollout
kubectl rollout status deployment/acd-manager
Rolling Back Changes
# View revision history
helm history acd-manager
# Rollback to previous revision
helm rollback acd-manager
# Rollback to specific revision
helm rollback acd-manager <revision>
# Verify rollback
helm history acd-manager
Certificate Management
Checking Certificate Expiration
# Check TLS secret expiration
kubectl get secret acd-manager-tls -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -dates
# Check via Grafana dashboard
# Certificate expiration metrics are available in Grafana
Renewing Certificates
# For Helm-managed self-signed certificates
helm upgrade acd-manager /mnt/esb3027/charts/acd-manager \
--values ~/values.yaml \
--set ingress.selfSigned=true
# For manual certificates, update the secret
kubectl create secret tls acd-manager-tls \
--cert=new-tls.crt \
--key=new-tls.key \
--dry-run=client -o yaml | kubectl apply -f -
# Restart pods to pick up new certificate
kubectl rollout restart deployment acd-manager
Health Checks
Component Health
# Check all pods
kubectl get pods
# Check specific component
kubectl get pods -l app.kubernetes.io/component=manager
# Check persistent volumes
kubectl get pvc
# Check cluster status
kubectl get nodes
# Check ingress
kubectl get ingress
API Health Endpoints
# Liveness check
curl -k https://<manager-host>/api/v1/health/alive
# Readiness check
curl -k https://<manager-host>/api/v1/health/ready
Database Health
# PostgreSQL cluster status
kubectl get clusters -n default
# Check PostgreSQL pods
kubectl get pods -l app.kubernetes.io/name=postgresql
# Kafka cluster status
kubectl get pods -l app.kubernetes.io/name=kafka
# Redis status
kubectl get pods -l app.kubernetes.io/name=redis
Maintenance Windows
Planned Maintenance
Before performing maintenance:
- Notify users of potential service impact
- Verify backups are current
- Document the maintenance procedure
- Prepare rollback plan
Node Maintenance
# Cordon node to prevent new pods
kubectl cordon <node-name>
# Drain node (evict pods)
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data
# Perform maintenance
# Uncordon node
kubectl uncordon <node-name>
Cluster Upgrades
See the Upgrade Guide for cluster upgrade procedures.
Troubleshooting Quick Reference
Common Commands
# Describe problematic pod
kubectl describe pod <pod-name>
# View pod events
kubectl get events --sort-by='.lastTimestamp'
# Check resource usage
kubectl top pods
kubectl top nodes
# Exec into container
kubectl exec -it <pod-name> -- /bin/sh
# Check network policies
kubectl get networkpolicies
# Check service endpoints
kubectl get endpoints
Restarting Components
# Restart deployment
kubectl rollout restart deployment/<deployment-name>
# Restart statefulset
kubectl rollout restart statefulset/<statefulset-name>
# Delete pod (auto-recreated)
kubectl delete pod <pod-name>
Security Operations
Rotating Service Account Tokens
# Delete service account secret (auto-regenerated)
kubectl delete secret <service-account-token-secret>
# Tokens are automatically regenerated
Updating RBAC Permissions
# View current roles
kubectl get roles
kubectl get clusterroles
# View role bindings
kubectl get rolebindings
kubectl get clusterrolebindings
# Edit role
kubectl edit role <role-name>
Audit Log Access
# K3s audit logs location
/var/lib/rancher/k3s/server/logs/audit.log
# View recent audit events
tail -f /var/lib/rancher/k3s/server/logs/audit.log
Disaster Recovery
Pod Recovery
Pods are automatically recreated if they fail:
# Check pod status
kubectl get pods
# If pod is stuck in Terminating
kubectl delete pod <pod-name> --force --grace-period=0
# If pod is stuck in Pending, check resources
kubectl describe pod <pod-name>
kubectl get events --sort-by='.lastTimestamp'
Node Failure Recovery
When a node fails:
- Automatic: Pods are rescheduled on healthy nodes (after timeout)
- Manual: Force delete stuck pods
# Force delete pods on failed node
kubectl delete pod --all --force --grace-period=0 \
--field-selector spec.nodeName=<failed-node>
Data Recovery
For data recovery scenarios, refer to:
- PostgreSQL: Cloudnative PG backup/restore procedures
- Longhorn: Volume snapshot restoration
- Kafka: Partition replication handles node failures
Routine Maintenance Checklist
Daily
- Review Grafana dashboards for anomalies
- Check alert notifications
- Verify backup completion
Weekly
- Review pod restart counts
- Check certificate expiration dates
- Review log storage usage
- Verify HPA is functioning correctly
Monthly
- Test backup restoration procedure
- Review and rotate credentials if needed
- Update documentation if configuration changed
- Review resource utilization trends
Next Steps
After mastering operations:
- Troubleshooting Guide - Deep dive into problem resolution
- Performance Tuning Guide - Optimize system performance
- Metrics & Monitoring Guide - Comprehensive monitoring setup
- API Guide - REST API reference and automation
9 - Metrics & Monitoring Guide
Overview
The CDN Manager includes a comprehensive monitoring stack based on VictoriaMetrics for time-series data storage, Telegraf for metrics collection, and Grafana for visualization. This guide describes the monitoring architecture and how to access and use the monitoring capabilities.
Architecture
Components
| Component | Purpose |
|---|---|
| Telegraf | Metrics collector running on each node, gathering system and application metrics |
| VictoriaMetrics Agent | Metrics scraper and forwarder; scrapes Prometheus endpoints and forwards to VictoriaMetrics |
| VictoriaMetrics (Short-term) | Time-series database for operational dashboards (30-90 day retention) |
| VictoriaMetrics (Long-term) | Time-series database for billing and compliance (1+ year retention) |
| Grafana | Visualization and dashboard platform |
| Alertmanager | Alert routing and notification management |
Metrics Flow
The following diagram illustrates how metrics flow through the monitoring stack:
flowchart TB
subgraph External["External Sources"]
Streamers[Streamers/External Clients]
end
subgraph Cluster["Kubernetes Cluster"]
Telegraf[Telegraf DaemonSet]
subgraph Applications["Application Components"]
Director[CDN Director]
Kafka[Kafka]
Redis[Redis]
Manager[ACD Manager]
Alertmanager[Alertmanager]
end
VMAgent[VictoriaMetrics Agent]
subgraph Storage["Storage"]
VMShort[VictoriaMetrics<br/>Short-term]
VMLong[VictoriaMetrics<br/>Long-term]
end
end
Grafana[Grafana]
Streamers -->|Push metrics| Telegraf
Telegraf -->|remote_write| VMShort
Telegraf -->|remote_write| VMLong
Director -->|Scrape| VMAgent
Kafka -->|Scrape| VMAgent
Redis -->|Scrape| VMAgent
Manager -->|Scrape| VMAgent
Alertmanager -->|Scrape| VMAgent
VMAgent -->|remote_write| VMShort
VMAgent -->|remote_write| VMLong
VMShort -->|Query| Grafana
VMLong -->|Query| GrafanaMetrics Flow Summary:
External metrics ingestion:
- External clients (streamers) push metrics to Telegraf
- Telegraf forwards metrics via
remote_writeto both VictoriaMetrics instances
Internal metrics scraping:
- VictoriaMetrics Agent scrapes Prometheus endpoints from:
- CDN Director instances
- Kafka cluster
- Redis
- ACD Manager components
- Alertmanager
- VMAgent forwards scraped metrics via
remote_writeto both VictoriaMetrics instances
- VictoriaMetrics Agent scrapes Prometheus endpoints from:
Data visualization:
- Grafana queries both VictoriaMetrics databases depending on the dashboard requirements
- Operational dashboards use short-term storage
- Billing and compliance dashboards use long-term storage
Accessing Grafana
Grafana is deployed as part of the metrics stack and accessible via the ingress:
URL: https://<manager-host>/grafana
Default credentials are listed in the Glossary.
Important: Change all default passwords after first login.
Metrics Collection
Application Metrics
Applications expose metrics on Prometheus-compatible endpoints. VictoriaMetrics Agent (VMAgent) scrapes these endpoints and forwards metrics to VictoriaMetrics via remote_write.
System Metrics
Telegraf collects system-level metrics including:
- CPU usage
- Memory utilization
- Disk I/O
- Network statistics
- Process metrics
Kubernetes Metrics
Cluster metrics are collected including:
- Pod resource usage
- Node status
- Deployment status
- Persistent volume usage
Grafana Dashboards
Accessing Dashboards
After logging into Grafana:
- Navigate to Dashboards in the left menu
- Browse available dashboards
- Click on a dashboard to view metrics
Dashboard Types
The included dashboards provide visibility into:
- Cluster Health: Overall cluster resource utilization
- Application Performance: Request rates, latency, error rates
- Component Status: Individual component health indicators
CDN Director Metrics
Director DNS Names in Grafana
CDN Director instances are identified in Grafana by their DNS name, which is derived from the name field in global.hosts.routers:
global:
hosts:
routers:
- name: my-router-1
address: 192.0.2.1
The DNS name used in Grafana dashboards will be: my-router-1.external
This naming convention is automatically applied for all configured directors.
Metrics Retention
VictoriaMetrics is configured with default retention policies. For custom retention settings, modify the VictoriaMetrics configuration in your values.yaml:
acd-metrics:
victoria-metrics-single:
retentionPeriod: "3" # Retention period in months
Troubleshooting
Metrics Not Appearing
If metrics are not appearing in Grafana:
Check Telegraf pods:
kubectl get pods -l app.kubernetes.io/component=telegrafCheck Telegraf logs:
kubectl logs -l app.kubernetes.io/component=telegrafVerify VictoriaMetrics is running:
kubectl get pods -l app.kubernetes.io/component=victoria-metricsCheck application metrics endpoints:
kubectl exec <pod-name> -- curl localhost:8080/metrics
Dashboard Loading Issues
If dashboards fail to load:
Check Grafana pods:
kubectl get pods -l app.kubernetes.io/component=grafanaReview Grafana logs:
kubectl logs -l app.kubernetes.io/component=grafanaVerify datasource configuration in Grafana UI
Next Steps
After setting up monitoring:
- Operations Guide - Day-to-day operational procedures
- Troubleshooting Guide - Resolve monitoring issues
- API Guide - Access metrics via API
10 - API Guide
Overview
The CDN Manager exposes versioned HTTP APIs under /api (v1 and v2), using JSON payloads by default. When sending request bodies, set Content-Type: application/json. Server errors typically respond with { "message": "..." } where available, or an empty body with the relevant status code.
Authentication uses a two-step flow:
- Create a session
- Exchange that session for an access token with
grant_type=session
Use the access token in Authorization: Bearer <token> when calling bearer-protected routes. CORS preflight (OPTIONS) is supported and wildcard origins are accepted by default.
Durations such as TTLs use humantime strings (for example, 60s, 5m, 1h).
Base URL
All API endpoints are relative to:
https://<manager-host>/api
API Reference Guides
The API documentation is organized by functional area:
| Guide | Description |
|---|---|
| Authentication API | Login, token exchange, logout, and session management |
| Health API | Liveness and readiness probes |
| Selection Input API | Key-value and list storage with search capabilities |
| Data Store API | Generic JSON key/value storage |
| Subnets API | CIDR-to-value mappings for routing decisions |
| Routing API | GeoIP lookups and IP validation |
| Discovery API | Host and namespace discovery |
| Metrics API | Metrics submission and aggregation |
| Configuration API | Configuration document management |
| Operator UI API | Blocked tokens, user agents, and referrers |
| OpenAPI Specification | Complete OpenAPI 3.0 specification |
Authentication Flow
All authenticated API calls follow the same authentication flow. For detailed instructions, see the Authentication API Guide.
Quick Start:
# Step 1: Login to get session
curl -s -X POST "https://cdn-manager/api/v1/auth/login" \
-H "Content-Type: application/json" \
-d '{
"email": "user@example.com",
"password": "Password1!"
}' | tee /tmp/session.json
SESSION_ID=$(jq -r '.session_id' /tmp/session.json)
SESSION_TOKEN=$(jq -r '.session_token' /tmp/session.json)
# Step 2: Exchange session for access token
curl -s -X POST "https://cdn-manager/api/v1/auth/token" \
-H "Content-Type: application/json" \
-d "$(jq -nc --arg sid "$SESSION_ID" --arg st "$SESSION_TOKEN" \
'{session_id:$sid,session_token:$st,grant_type:"session",scope:"openid"}')" \
| tee /tmp/token.json
ACCESS_TOKEN=$(jq -r '.access_token' /tmp/token.json)
# Step 3: Call a protected endpoint
curl -s "https://cdn-manager/api/v1/metrics" \
-H "Authorization: Bearer ${ACCESS_TOKEN}"
Error Responses
The API uses standard HTTP response codes to indicate the success or failure of an API request.
Most errors return an empty response body with the relevant HTTP status code (e.g., 404 Not Found or 409 Conflict).
In some cases, the server may return a JSON body containing a user-facing error message:
{
"message": "Human-readable error message"
}
Next Steps
After learning the API:
- Operations Guide - Day-to-day operational procedures
- Troubleshooting Guide - Resolve API issues
- Configuration Guide - Full configuration reference
10.1 - Authentication API
Overview
The Authentication API provides endpoints for user authentication, session management, and token exchange. All authenticated API calls require a valid access token obtained through the authentication flow.
Base URL
https://<manager-host>/api/v1/auth
Endpoints
POST /api/v1/auth/login
Create a session from email/password credentials.
Request:
POST /api/v1/auth/login
Content-Type: application/json
{
"email": "user@example.com",
"password": "Password1!"
}
Success Response (200):
{
"session_id": "session-1",
"session_token": "token-1",
"verified_at": "2024-01-01T00:00:00Z",
"expires_at": "2024-01-01T01:00:00Z"
}
Errors:
401- Authentication failure (invalid credentials)500- Backend/state errors
POST /api/v1/auth/token
Exchange a session for an access token (required for bearer auth).
Request:
POST /api/v1/auth/token
Content-Type: application/json
{
"session_id": "session-1",
"session_token": "token-1",
"grant_type": "session",
"scope": "openid profile"
}
Success Response (200):
{
"access_token": "<token>",
"scope": "openid profile",
"expires_in": 3600,
"token_type": "bearer"
}
Token Scopes
The scope parameter in the token exchange request is a space-separated string of permissions requested for the access token.
Scope Resolution When a token is requested, the backend system filters the requested scopes against the user’s actual permissions. The resulting access token will only contain the subset of requested scopes that the user is authorized to possess.
Naming and Design
Scope names are defined by the applications that consume the tokens, not by the central IAM system. To prevent collisions between different applications or modules, it is highly recommended that application developers use URN-style prefixes for scope names (e.g., urn:acd:manager:config:read).
Errors:
401- Authentication failure (invalid session)500- Backend/state errors
POST /api/v1/auth/logout
Revoke a session. Note: This does not revoke issued access tokens; they remain valid until expiration.
Request:
POST /api/v1/auth/logout
Content-Type: application/json
{
"session_id": "session-1",
"session_token": "token-1"
}
Success Response (200):
{
"status": "Ok"
}
Errors:
400- Invalid session parameters500- Backend/state errors
Complete Authentication Flow Example
# Step 1: Login to get session
curl -s -X POST "https://cdn-manager/api/v1/auth/login" \
-H "Content-Type: application/json" \
-d '{
"email": "user@example.com",
"password": "Password1!"
}' | tee /tmp/session.json
SESSION_ID=$(jq -r '.session_id' /tmp/session.json)
SESSION_TOKEN=$(jq -r '.session_token' /tmp/session.json)
# Step 2: Exchange session for access token
curl -s -X POST "https://cdn-manager/api/v1/auth/token" \
-H "Content-Type: application/json" \
-d "$(jq -nc --arg sid "$SESSION_ID" --arg st "$SESSION_TOKEN" \
'{session_id:$sid,session_token:$st,grant_type:"session",scope:"openid"}')" \
| tee /tmp/token.json
ACCESS_TOKEN=$(jq -r '.access_token' /tmp/token.json)
# Step 3: Call a protected endpoint
curl -s "https://cdn-manager/api/v1/metrics" \
-H "Authorization: Bearer ${ACCESS_TOKEN}"
Using the Access Token
Once you have obtained an access token, include it in the Authorization header of all API requests:
Authorization: Bearer <access_token>
Example:
curl -s "https://cdn-manager/api/v1/configuration" \
-H "Authorization: Bearer ${ACCESS_TOKEN}"
Token Expiration
Access tokens expire after the duration specified in expires_in (typically 3600 seconds / 1 hour). When a token expires, you must re-authenticate to obtain a new token.
Next Steps
- Health API - Liveness and readiness probes
- Selection Input API - Key-value and list storage
- OpenAPI Specification - Complete API specification
10.2 - Health API
Overview
The Health API provides endpoints for Kubernetes health probes and service health checking.
Base URL
https://<manager-host>/api/v1/health
Endpoints
GET /api/v1/health/alive
Liveness probe that indicates whether the service is running. Always returns 200 OK.
Request:
GET /api/v1/health/alive
Response (200):
{
"status": "Ok"
}
Use Case: Kubernetes liveness probe to determine if the pod should be restarted.
GET /api/v1/health/ready
Readiness probe that checks service readiness including downstream dependencies.
Request:
GET /api/v1/health/ready
Success Response (200):
{
"status": "Ok"
}
Failure Response (503):
{
"status": "Fail"
}
Use Case: Kubernetes readiness probe to determine if the pod should receive traffic. Returns 503 if any downstream dependencies (database, Kafka, Redis) are unavailable.
Kubernetes Configuration
Example Kubernetes probe configuration:
livenessProbe:
httpGet:
path: /api/v1/health/alive
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet:
path: /api/v1/health/ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
Next Steps
- Authentication API - User authentication
- Selection Input API - Key-value and list storage
- OpenAPI Specification - Complete API specification
10.3 - Selection Input API
Overview
The Selection Input API provides JSON key/value storage with search capabilities. It supports two API versions (v1 and v2) with different operation models.
Base URL
https://<manager-host>/api/v1/selection_input
https://<manager-host>/api/v2/selection_input
Version Comparison
| Feature | v1 /api/v1/selection_input | v2 /api/v2/selection_input |
|---|---|---|
| Primary operation | Merge/UPSERT (POST) | Insert/Replace (PUT) |
| List append | N/A | POST to push to list |
| Search syntax | Wildcard prefix (foo* implicit) | Full wildcard (foo* explicit) |
| Query params | search, sort, limit, ttl | search, ttl, correlation_id |
| Sort support | Yes (asc/desc) | No |
| Limit support | Yes | No |
| Use case | Simple key-value with optional search | List-like operations, full wildcard |
When to Use Each Version
| Scenario | Recommended Version |
|---|---|
| Simple key-value storage | v1 |
| List/queue operations (append to array) | v2 POST |
| Full wildcard pattern matching | v2 |
| Need to sort or paginate results | v1 |
v1 Endpoints
GET /api/v1/selection_input/{path}
Fetch stored JSON. If value is an object, optional search/limit/sort applies to its keys.
Query Parameters:
search- Wildcard prefix search (adds*implicitly)sort- Sort order (ascordesc)limit- Maximum results (must be > 0)
Success Response (200):
{
"foo": 1,
"foobar": 2
}
Errors:
404- Path does not exist400- Invalid search/sort/limit parameters500- Backend failure
Example:
curl -s "https://cdn-manager/api/v1/selection_input/config?search=foo&limit=2"
POST /api/v1/selection_input/{path}
Upsert (merge) JSON at path. Nested objects are merged recursively.
Query Parameters:
ttl- Expiry time as humantime string (e.g.,10m,1h)
Request:
{
"feature_flag": true,
"ratio": 0.5
}
Success: 201 Created echoing the payload
Errors:
500/503- Backend failure
Example:
curl -s -X POST "https://cdn-manager/api/v1/selection_input/config?ttl=10m" \
-H "Content-Type: application/json" \
-d '{
"feature_flag": true,
"ratio": 0.5
}'
DELETE /api/v1/selection_input/{path}
Delete stored value.
Success: 204 No Content
Errors: 503 - Backend failure
v2 Endpoints
GET /api/v2/selection_input/{path}
Fetch stored JSON with optional wildcard filtering.
Query Parameters:
search- Full wildcard pattern (e.g.,foo*,*bar*)correlation_id- Accepted but currently ignored (logging only)
Success Response (200):
{
"foo": 1,
"foobar": 2
}
Errors:
400- Invalid search pattern404- Path does not exist500- Backend failure
Example:
curl -s "https://cdn-manager/api/v2/selection_input/config?search=foo*"
PUT /api/v2/selection_input/{path}
Insert/replace value. Old value is discarded.
Query Parameters:
ttl- Expiry time as humantime string
Request:
{
"items": ["a", "b", "c"]
}
Success: 200 OK
Example:
curl -s -X PUT "https://cdn-manager/api/v2/selection_input/catalog" \
-H "Content-Type: application/json" \
-d '{
"items": ["a", "b", "c"]
}'
POST /api/v2/selection_input/{path}
Push a value to the back of a list-like entry (append to array).
Query Parameters:
ttl- Expiry time as humantime string
Request (any JSON value):
{
"item": 42
}
Or a simple string:
"ready-for-publish"
Success: 200 OK
Example:
curl -s -X POST "https://cdn-manager/api/v2/selection_input/queue" \
-H "Content-Type: application/json" \
-d '"ready-for-publish"'
DELETE /api/v2/selection_input/{path}
Delete stored value.
Success: 204 No Content
Next Steps
- Data Store API - Generic key/value storage
- Subnets API - CIDR-to-value mappings
- OpenAPI Specification - Complete API specification
10.4 - Data Store API
Overview
The Data Store API provides generic JSON key/value storage for short-lived or simple structured data.
Base URL
https://<manager-host>/api/v1/datastore
Endpoints
GET /api/v1/datastore
List all known keys.
Query Parameters:
show_hidden- Boolean (defaultfalse). Whentrue, includes internal keys starting with_.
Success Response (200):
["user:123", "config:settings", "session:abc"]
Hidden Keys: Keys starting with _ are reserved for internal use (e.g., subnet service). Writing to hidden keys via the datastore API returns 400 Bad Request.
GET /api/v1/datastore/{key}
Retrieve the JSON value for a specific key.
Success Response (200): The stored JSON value
Errors:
404- Key does not exist500- Backend failure
Example:
curl -s "https://cdn-manager/api/v1/datastore/user:123"
POST /api/v1/datastore/{key}
Create a new JSON value at the specified key. Fails if the key already exists.
Query Parameters:
ttl- Expiry time as humantime string (e.g.,60s,1h)
Request:
{
"id": 123,
"name": "alice"
}
Success: 201 Created
Errors:
409 Conflict- Key already exists500- Backend failure
Example:
curl -s -X POST "https://cdn-manager/api/v1/datastore/user:123?ttl=1h" \
-H "Content-Type: application/json" \
-d '{"id":123,"name":"alice"}'
PUT /api/v1/datastore/{key}
Update or replace the JSON value at an existing key.
Query Parameters:
ttl- Expiry time as humantime string
Success: 200 OK
Errors:
404- Key does not exist500- Backend failure
Example:
curl -s -X PUT "https://cdn-manager/api/v1/datastore/user:123" \
-H "Content-Type: application/json" \
-d '{"id":123,"name":"alice-updated"}'
DELETE /api/v1/datastore/{key}
Delete the value at the specified key. Idempotent operation.
Success: 204 No Content
Errors: 500 - Backend failure
Example:
curl -s -X DELETE "https://cdn-manager/api/v1/datastore/user:123"
Next Steps
- Subnets API - CIDR-to-value mappings
- Routing API - GeoIP lookups
- OpenAPI Specification - Complete API specification
10.5 - Subnets API
Overview
The Subnets API manages CIDR-to-value mappings used for routing decisions. This allows classification of IP ranges for routing purposes.
Base URL
https://<manager-host>/api/v1/subnets
Endpoints
PUT /api/v1/subnets
Create or update subnet mappings.
Request:
{
"192.168.1.0/24": "office",
"10.0.0.0/8": "internal",
"203.0.113.0/24": "external"
}
Success: 200 OK
Errors:
400- Invalid CIDR format500- Backend failure
Example:
curl -s -X PUT "https://cdn-manager/api/v1/subnets" \
-H "Content-Type: application/json" \
-d '{
"192.168.1.0/24": "office",
"10.0.0.0/8": "internal"
}'
GET /api/v1/subnets
List all subnet mappings.
Success Response (200): JSON object of CIDR-to-value mappings
Example:
curl -s "https://cdn-manager/api/v1/subnets" | jq '.'
DELETE /api/v1/subnets
Delete all subnet mappings.
Success: 204 No Content
GET /api/v1/subnets/byKey/{subnet}
Retrieve subnet mappings whose CIDR begins with the given prefix.
Example:
curl -s "https://cdn-manager/api/v1/subnets/byKey/192.168" | jq '.'
GET /api/v1/subnets/byValue/{value}
Retrieve subnet mappings with the given classification value.
Example:
curl -s "https://cdn-manager/api/v1/subnets/byValue/office" | jq '.'
DELETE /api/v1/subnets/byKey/{subnet}
Delete subnet mappings whose CIDR begins with the given prefix.
DELETE /api/v1/subnets/byValue/{value}
Delete subnet mappings with the given classification value.
Next Steps
- Routing API - GeoIP lookups and IP validation
- Discovery API - Host and namespace discovery
- OpenAPI Specification - Complete API specification
10.6 - Routing API
Overview
The Routing API provides GeoIP information lookup and IP address validation for routing decisions.
Base URL
https://<manager-host>/api/v1/routing
Endpoints
GET /api/v1/routing/geoip
Look up GeoIP information for an IP address.
Query Parameters:
ip- IP address to look up
Success Response (200):
{
"city": {
"name": "Washington"
},
"asn": 64512
}
Errors:
400- Invalid IP format500- Backend failure
Caching: Cache-Control: public, max-age=86400 (24 hours)
Example:
curl -s "https://cdn-manager/api/v1/routing/geoip?ip=149.101.100.0"
GET /api/v1/routing/validate
Validate if an IP address is allowed (not blocked).
Query Parameters:
ip- IP address to validate
Success Response (200): Empty body (IP is allowed)
Forbidden Response (403):
Access Denied
Errors:
400- Invalid IP format500- Backend failure
Caching: Cache-Control headers included (default: max-age=300, configurable via [tuning] section)
Example:
curl -i "https://cdn-manager/api/v1/routing/validate?ip=149.101.100.0"
Use Cases
GeoIP-Based Routing
Use the /geoip endpoint to determine the geographic location and ASN of an IP address for routing decisions:
# Get location data for routing
IP_INFO=$(curl -s "https://cdn-manager/api/v1/routing/geoip?ip=203.0.113.50")
CITY=$(echo "$IP_INFO" | jq -r '.city.name')
ASN=$(echo "$IP_INFO" | jq -r '.asn')
echo "Routing based on city: $CITY, ASN: $ASN"
IP Validation
Use the /validate endpoint to check if an IP is allowed before processing requests:
# Check if IP is allowed
RESPONSE=$(curl -s -o /dev/null -w "%{http_code}" \
"https://cdn-manager/api/v1/routing/validate?ip=203.0.113.50")
if [ "$RESPONSE" = "200" ]; then
echo "IP is allowed"
elif [ "$RESPONSE" = "403" ]; then
echo "IP is blocked"
fi
Next Steps
- Discovery API - Host and namespace discovery
- Metrics API - Metrics submission and aggregation
- OpenAPI Specification - Complete API specification
10.7 - Discovery API
Overview
The Discovery API provides information about discovered hosts and namespaces. Discovery is configured via the Helm chart values.yaml file. Each entry defines a namespace with a list of hostnames.
Base URL
https://<manager-host>/api/v1/discovery
Endpoints
GET /api/v1/discovery/hosts
Return discovered hosts grouped by namespace.
Success Response (200):
{
"directors": [
{ "name": "director-1.example.com" }
],
"edge-servers": [
{ "name": "cdn1.example.com" },
{ "name": "cdn2.example.com" }
]
}
Example:
curl -s "https://cdn-manager/api/v1/discovery/hosts"
GET /api/v1/discovery/namespaces
Return discovery namespaces with their corresponding Confd URIs.
Success Response (200):
[
{
"namespace": "edge-servers",
"confd_uri": "/api/v1/confd/edge-servers"
},
{
"namespace": "directors",
"confd_uri": "/api/v1/confd/directors"
}
]
Example:
curl -s "https://cdn-manager/api/v1/discovery/namespaces"
Configuration
Discovery is configured via the Helm chart values.yaml file under manager.discovery:
manager:
discovery:
- namespace: "directors"
hosts:
- director-1.example.com
- director-2.example.com
- namespace: "edge-servers"
hosts:
- cdn1.example.com
- cdn2.example.com
Each entry defines a namespace with a list of hostnames. Optionally, a pattern field can be specified for regex-based host matching.
Next Steps
- Metrics API - Metrics submission and aggregation
- Configuration API - Configuration document management
- OpenAPI Specification - Complete API specification
10.8 - Metrics API
Overview
The Metrics API allows submission and retrieval of metrics data from CDN components.
Base URL
https://<manager-host>/api/v1/metrics
Endpoints
POST /api/v1/metrics
Submit metrics data.
Request:
{
"example.com": {
"metric1": 100,
"metric2": 200
}
}
Success: 200 OK
Errors: 500 - Validation/backend errors
Example:
curl -s -X POST "https://cdn-manager/api/v1/metrics" \
-H "Content-Type: application/json" \
-d '{
"example.com": {
"metric1": 100,
"metric2": 200
}
}'
GET /api/v1/metrics
Return aggregated metrics per host.
Response: JSON object with aggregated metrics per host
Note: Metrics are stored per host for up to 5 minutes. Hosts that stop reporting disappear from aggregation after that window. When no metrics are being reported, returns empty object {}.
Example:
curl -s "https://cdn-manager/api/v1/metrics"
Metrics Retention
- Metrics are stored for up to 5 minutes in the aggregation layer
- For long-term metrics storage, data is forwarded to VictoriaMetrics
- Query historical metrics via Grafana dashboards at
/grafana
Next Steps
- Configuration API - Configuration document management
- Operator UI API - Blocked tokens, user agents, and referrers
- OpenAPI Specification - Complete API specification
10.9 - Configuration API
Overview
The Configuration API provides endpoints for managing the system configuration document. ETag is supported; send If-None-Match for conditional GET (may return 304).
Operational Note: This API is intended for internal verification only. Behavior is undefined in multi-replica clusters because pods do not coordinate config writes.
Base URL
https://<manager-host>/api/v1/configuration
Endpoints
GET /api/v1/configuration
Retrieve the configuration document.
Success: 200 OK with configuration JSON
Conditional GET: Returns 304 Not Modified if If-None-Match header matches current ETag
Example:
# Get ETag from response headers
etag=$(curl -s -D- "https://cdn-manager/api/v1/configuration" | awk '/ETag/{print $2}')
# Conditional GET - returns 304 if config unchanged
curl -s -H "If-None-Match: $etag" "https://cdn-manager/api/v1/configuration" -o /tmp/cfg.json -w "%{http_code}\n"
PUT /api/v1/configuration
Replace the configuration document.
Request:
{
"feature_flag": false,
"ratio": 0.25
}
Success: 200 OK
Errors:
400- Invalid configuration format500- Backend failure
DELETE /api/v1/configuration
Delete the configuration document.
Success: 200 OK
ETag Usage
The configuration API supports ETags for optimistic concurrency control:
# 1. Get current config and ETag
response=$(curl -s -D headers.txt "https://cdn-manager/api/v1/configuration")
etag=$(grep -i ETag headers.txt | cut -d' ' -f2 | tr -d '\r')
# 2. Modify the config as needed
modified_config=$(echo "$response" | jq '.feature_flag = true')
# 3. Update with ETag to prevent overwriting concurrent changes
curl -s -X PUT "https://cdn-manager/api/v1/configuration" \
-H "Content-Type: application/json" \
-H "If-Match: $etag" \
-d "$modified_config"
Next Steps
- Operator UI API - Blocked tokens, user agents, and referrers
- OpenAPI Specification - Complete API specification
10.10 - Operator UI API
Overview
The Operator UI API provides read-only helpers exposing curated selection input content for the operator interface.
Query Parameters: search, sort, limit (same as selection input v1)
Note: Stored keys for user agents/referrers are URL-safe base64; responses decode them to human-readable values.
Base URL
https://<manager-host>/api/v1/operator_ui
Endpoints
Blocked Household Tokens
GET /api/v1/operator_ui/modules/blocked_tokens
List all blocked household tokens.
Success Response (200):
[
{
"household_token": "house-001_token-abc",
"expire_time": 1625247600
}
]
GET /api/v1/operator_ui/modules/blocked_tokens/{token}
Get details for a specific blocked token.
Success Response (200):
{
"household_token": "house-001_token-abc",
"expire_time": 1625247600
}
Blocked User Agents
GET /api/v1/operator_ui/modules/blocked_user_agents
List all blocked user agents.
Success Response (200):
[
{
"user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
},
{
"user_agent": "curl/7.68.0"
}
]
GET /api/v1/operator_ui/modules/blocked_user_agents/{encoded}
Get details for a specific blocked user agent. The path variable is URL-safe base64 encoded.
Example:
# Encode the user agent
ENC=$(python3 -c "import base64; print(base64.urlsafe_b64encode(b'curl/7.68.0').decode().rstrip('='))")
# Get details
curl -s "https://cdn-manager/api/v1/operator_ui/modules/blocked_user_agents/$ENC"
Blocked Referrers
GET /api/v1/operator_ui/modules/blocked_referrers
List all blocked referrers.
Success Response (200):
[
{
"referrer": "https://spam-example.com"
}
]
GET /api/v1/operator_ui/modules/blocked_referrers/{encoded}
Get details for a specific blocked referrer. The path variable is URL-safe base64 encoded.
Example:
# Encode the referrer
ENC=$(python3 -c "import base64; print(base64.urlsafe_b64encode(b'spam-example.com').decode().rstrip('='))")
# Get details
curl -s "https://cdn-manager/api/v1/operator_ui/modules/blocked_referrers/$ENC"
URL-Safe Base64 Encoding
The Operator UI API uses URL-safe base64 encoding for path parameters. To encode values:
Python:
import base64
# Encode
encoded = base64.urlsafe_b64encode(b'value').decode().rstrip('=')
# Decode
decoded = base64.urlsafe_b64decode(encoded + '=' * (-len(encoded) % 4)).decode()
Bash (with openssl):
# Encode
echo -n "value" | openssl base64 -urlsafe | tr -d '='
# Decode
echo "encoded" | openssl base64 -urlsafe -d
Next Steps
- OpenAPI Specification - Complete API specification
10.11 - OpenAPI Specification
Overview
The CDN Manager API is documented using the OpenAPI 3.0 specification. This appendix provides the complete specification for reference and for generating API clients.
OpenAPI Specification (YAML)
openapi: 3.0.3
info:
title: AgileTV CDN Manager API
version: "1.0"
servers:
- url: https://<manager-host>/api
description: CDN Manager API server
paths:
/v1/auth/login:
post:
summary: Login and create session
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/LoginRequest'
responses:
'200':
description: Session created
content:
application/json:
schema:
$ref: '#/components/schemas/LoginResponse'
'401': { description: Unauthorized, content: { application/json: { schema: { $ref: '#/components/schemas/ErrorResponse' } } } }
'500': { description: Internal error, content: { application/json: { schema: { $ref: '#/components/schemas/ErrorResponse' } } } }
/v1/auth/token:
post:
summary: Exchange session for access token
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/TokenRequest'
responses:
'200':
description: Access token
content:
application/json:
schema:
$ref: '#/components/schemas/TokenResponse'
'401': { description: Unauthorized, content: { application/json: { schema: { $ref: '#/components/schemas/ErrorResponse' } } } }
'500': { description: Internal error, content: { application/json: { schema: { $ref: '#/components/schemas/ErrorResponse' } } } }
/v1/auth/logout:
post:
summary: Revoke session
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/LogoutRequest'
responses:
'200': { description: Revoked, content: { application/json: { schema: { $ref: '#/components/schemas/LogoutResponse' } } } }
'401': { description: Unauthorized, content: { application/json: { schema: { $ref: '#/components/schemas/ErrorResponse' } } } }
'500': { description: Internal error, content: { application/json: { schema: { $ref: '#/components/schemas/ErrorResponse' } } } }
/v1/selection_input{tail}:
get:
summary: Read selection input
parameters:
- $ref: '#/components/parameters/Tail'
- $ref: '#/components/parameters/Search'
- $ref: '#/components/parameters/Sort'
- $ref: '#/components/parameters/Limit'
responses:
'200': { description: JSON value }
'400': { description: Bad request, content: { application/json: { schema: { $ref: '#/components/schemas/ErrorResponse' } } } }
'404': { description: Not found }
'500': { description: Backend failure }
post:
summary: Merge selection input
parameters:
- $ref: '#/components/parameters/Tail'
- $ref: '#/components/parameters/Ttl'
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/AnyJson'
responses:
'201': { description: Created, content: { application/json: { schema: { $ref: '#/components/schemas/AnyJson' } } } }
'500': { description: Backend failure }
'503': { description: Service unavailable }
delete:
summary: Delete selection input
parameters:
- $ref: '#/components/parameters/Tail'
responses:
'204': { description: Deleted }
'503': { description: Service unavailable }
/v2/selection_input{tail}:
get:
summary: Read selection input v2
parameters:
- $ref: '#/components/parameters/TailV2'
- $ref: '#/components/parameters/Search'
responses:
'200': { description: JSON value }
'400': { description: Invalid search pattern }
'404': { description: Not found }
'500': { description: Backend failure }
put:
summary: Replace selection input v2
parameters:
- $ref: '#/components/parameters/TailV2'
- $ref: '#/components/parameters/Ttl'
- $ref: '#/components/parameters/CorrelationId'
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/AnyJson'
responses:
'200': { description: Updated }
'500': { description: Backend failure }
post:
summary: Push to selection input v2
parameters:
- $ref: '#/components/parameters/TailV2'
- $ref: '#/components/parameters/Ttl'
- $ref: '#/components/parameters/CorrelationId'
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/AnyJson'
responses:
'200': { description: Pushed }
'500': { description: Backend failure }
delete:
summary: Delete selection input v2
parameters:
- $ref: '#/components/parameters/TailV2'
responses:
'204': { description: Deleted }
'500': { description: Backend failure }
/v1/configuration:
get:
summary: Read configuration
responses:
'200': { description: Configuration, content: { application/json: { schema: { $ref: '#/components/schemas/AnyJson' } } }, headers: { ETag: { schema: { type: string } } } }
'304': { description: Not modified }
'500': { description: Backend failure }
put:
summary: Replace configuration
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/AnyJson'
responses:
'200': { description: Replaced }
'500': { description: Backend failure }
delete:
summary: Delete configuration
responses:
'200': { description: Deleted }
'500': { description: Backend failure }
/v1/routing/geoip:
get:
summary: GeoIP lookup
parameters:
- name: ip
in: query
required: true
schema: { type: string }
responses:
'200': { description: GeoIP data, content: { application/json: { schema: { $ref: '#/components/schemas/GeoIpResponse' } } } }
'400': { description: Invalid IP }
'500': { description: Backend failure }
/v1/routing/validate:
get:
summary: Validate routing
parameters:
- name: ip
in: query
required: true
schema: { type: string }
responses:
'200': { description: Allowed }
'403': { description: Access Denied }
'400': { description: Invalid IP }
'500': { description: Backend failure }
/v1/metrics:
post:
summary: Ingest metrics
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/MetricsIngress'
responses:
'200': { description: Stored }
'500': { description: Validation/back-end error }
get:
summary: Aggregate metrics
responses:
'200': { description: Aggregated metrics, content: { application/json: { schema: { $ref: '#/components/schemas/AnyJson' } } } }
'500': { description: Backend failure }
/v1/discovery/hosts:
get:
summary: List discovered hosts by namespace
responses:
'200':
description: Discovered hosts keyed by namespace
content:
application/json:
schema:
type: object
additionalProperties:
type: array
items:
$ref: '#/components/schemas/DiscoveryHost'
'500': { description: Backend failure }
/v1/discovery/namespaces:
get:
summary: List discovery namespaces with Confd URIs
responses:
'200':
description: Namespaces with Confd links
content:
application/json:
schema:
type: array
items:
$ref: '#/components/schemas/DiscoveryNamespace'
'500': { description: Backend failure }
/v1/datastore:
get:
summary: List datastore keys
responses:
'200': { description: Keys list, content: { application/json: { schema: { type: array, items: { type: string } } } } }
'500': { description: Backend failure }
/v1/datastore/{key}:
get:
summary: Get a JSON value by key
parameters:
- name: key
in: path
required: true
schema: { type: string }
responses:
'200': { description: JSON value, content: { application/json: { schema: { $ref: '#/components/schemas/AnyJson' } } } }
'404': { description: Not found }
'500': { description: Backend failure }
post:
summary: Create a JSON value at key
parameters:
- name: key
in: path
required: true
schema: { type: string }
- $ref: '#/components/parameters/Ttl'
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/AnyJson'
responses:
'201': { description: Created }
'409': { description: Conflict (already exists) }
'500': { description: Backend failure }
put:
summary: Update/replace a JSON value at key
parameters:
- name: key
in: path
required: true
schema: { type: string }
- $ref: '#/components/parameters/Ttl'
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/AnyJson'
responses:
'200': { description: Updated }
'404': { description: Not found }
'500': { description: Backend failure }
delete:
summary: Delete a datastore key
parameters:
- name: key
in: path
required: true
schema: { type: string }
responses:
'204': { description: Deleted }
'500': { description: Backend failure }
/v1/subnets:
get:
summary: List all subnet mappings
responses:
'200': { description: Subnet mappings, content: { application/json: { schema: { type: object, additionalProperties: { type: string } } } } }
'500': { description: Backend failure }
put:
summary: Create or update subnet mappings
requestBody:
required: true
content:
application/json:
schema:
type: object
additionalProperties:
type: string
description: Map of CIDR strings to classification values
responses:
'200': { description: Created }
'400': { description: Invalid CIDR format }
'500': { description: Backend failure }
delete:
summary: Delete all subnet mappings
responses:
'204': { description: Deleted }
'500': { description: Backend failure }
/v1/subnets/byKey/{subnet}:
get:
summary: Get subnet mappings by CIDR prefix
parameters:
- name: subnet
in: path
required: true
schema: { type: string }
responses:
'200': { description: Subnet mappings, content: { application/json: { schema: { type: object, additionalProperties: { type: string } } } } }
'500': { description: Backend failure }
delete:
summary: Delete subnet mappings by CIDR prefix
parameters:
- name: subnet
in: path
required: true
schema: { type: string }
responses:
'204': { description: Deleted }
'500': { description: Backend failure }
/v1/subnets/byValue/{value}:
get:
summary: Get subnet mappings by value
parameters:
- name: value
in: path
required: true
schema: { type: string }
responses:
'200': { description: Subnet mappings, content: { application/json: { schema: { type: object, additionalProperties: { type: string } } } } }
'500': { description: Backend failure }
delete:
summary: Delete subnet mappings by value
parameters:
- name: value
in: path
required: true
schema: { type: string }
responses:
'204': { description: Deleted }
'500': { description: Backend failure }
/v1/operator_ui/modules/blocked_tokens:
get:
summary: List blocked tokens
parameters:
- $ref: '#/components/parameters/Search'
- $ref: '#/components/parameters/Sort'
- $ref: '#/components/parameters/Limit'
responses:
'200': { description: Blocked tokens, content: { application/json: { schema: { type: array, items: { $ref: '#/components/schemas/BlockedToken' } } } } }
'400': { description: Parse error, content: { application/json: { schema: { $ref: '#/components/schemas/ErrorResponse' } } } }
/v1/operator_ui/modules/blocked_tokens/{token}:
get:
summary: Get blocked token
parameters:
- name: token
in: path
required: true
schema: { type: string }
responses:
'200': { description: Blocked token, content: { application/json: { schema: { $ref: '#/components/schemas/BlockedToken' } } } }
'404': { description: Not found }
'400': { description: Parse error, content: { application/json: { schema: { $ref: '#/components/schemas/ErrorResponse' } } } }
/v1/operator_ui/modules/blocked_user_agents:
get:
summary: List blocked user agents
parameters:
- $ref: '#/components/parameters/Search'
- $ref: '#/components/parameters/Sort'
- $ref: '#/components/parameters/Limit'
responses:
'200': { description: Blocked user agents, content: { application/json: { schema: { type: array, items: { $ref: '#/components/schemas/BlockedUserAgent' } } } } }
'400': { description: Parse error, content: { application/json: { schema: { $ref: '#/components/schemas/ErrorResponse' } } } }
/v1/operator_ui/modules/blocked_user_agents/{encoded}:
get:
summary: Get blocked user agent
parameters:
- name: encoded
in: path
required: true
schema: { type: string }
responses:
'200': { description: Blocked user agent, content: { application/json: { schema: { $ref: '#/components/schemas/BlockedUserAgent' } } } }
'400': { description: Parse error, content: { application/json: { schema: { $ref: '#/components/schemas/ErrorResponse' } } } }
/v1/operator_ui/modules/blocked_referrers:
get:
summary: List blocked referrers
parameters:
- $ref: '#/components/parameters/Search'
- $ref: '#/components/parameters/Sort'
- $ref: '#/components/parameters/Limit'
responses:
'200': { description: Blocked referrers, content: { application/json: { schema: { type: array, items: { $ref: '#/components/schemas/BlockedReferrer' } } } } }
'400': { description: Parse error, content: { application/json: { schema: { $ref: '#/components/schemas/ErrorResponse' } } } }
/v1/operator_ui/modules/blocked_referrers/{encoded}:
get:
summary: Get blocked referrer
parameters:
- name: encoded
in: path
required: true
schema: { type: string }
responses:
'200': { description: Blocked referrer, content: { application/json: { schema: { $ref: '#/components/schemas/BlockedReferrer' } } } }
'400': { description: Parse error, content: { application/json: { schema: { $ref: '#/components/schemas/ErrorResponse' } } } }
/v1/health/alive:
get:
summary: Liveness check
responses:
'200': { description: Alive, content: { application/json: { schema: { $ref: '#/components/schemas/HealthStatus' } } } }
/v1/health/ready:
get:
summary: Readiness check
responses:
'200': { description: Ready, content: { application/json: { schema: { $ref: '#/components/schemas/HealthStatus' } } } }
'503': { description: Unready, content: { application/json: { schema: { $ref: '#/components/schemas/HealthStatus' } } } }
components:
parameters:
Tail:
name: tail
in: path
required: true
schema: { type: string }
TailV2:
name: tail
in: path
required: true
schema: { type: string }
Search:
name: search
in: query
required: false
schema: { type: string }
Sort:
name: sort
in: query
required: false
schema: { type: string, enum: [asc, desc] }
Limit:
name: limit
in: query
required: false
schema: { type: integer, minimum: 1 }
Ttl:
name: ttl
in: query
required: false
schema: { type: string, description: Humantime duration }
CorrelationId:
name: correlation_id
in: query
required: false
schema: { type: string }
schemas:
LoginRequest:
type: object
required: [email, password]
properties:
email: { type: string, format: email }
password: { type: string, format: password }
LoginResponse:
type: object
properties:
session_id: { type: string }
session_token: { type: string }
verified_at: { type: string, format: date-time }
expires_at: { type: string, format: date-time }
LogoutRequest:
type: object
required: [session_id]
properties:
session_id: { type: string }
session_token: { type: string }
LogoutResponse:
type: object
properties:
status: { $ref: '#/components/schemas/StatusValue' }
TokenRequest:
type: object
required: [session_id, session_token, grant_type]
properties:
session_id: { type: string }
session_token: { type: string }
scope: { type: string }
grant_type: { type: string, enum: [session] }
TokenResponse:
type: object
required: [access_token, scope, expires_in, token_type]
properties:
access_token: { type: string }
scope: { type: string }
expires_in: { type: integer, format: int64 }
token_type: { type: string, enum: [bearer] }
ErrorResponse:
type: object
properties:
message: { type: string }
AnyJson:
description: Arbitrary JSON value
MetricsIngress:
type: object
additionalProperties:
type: object
additionalProperties: { type: number }
GeoIpResponse:
type: object
properties:
city:
type: object
properties:
name: { type: string }
asn: { type: integer }
is_anonymous: { type: boolean }
BlockedToken:
type: object
properties:
household_token: { type: string }
expire_time: { type: integer, format: int64 }
BlockedUserAgent:
type: object
properties:
user_agent: { type: string }
BlockedReferrer:
type: object
properties:
referrer: { type: string }
DiscoveryHost:
type: object
properties:
name: { type: string }
DiscoveryNamespace:
type: object
properties:
namespace: { type: string }
confd_uri: { type: string }
HealthStatus:
type: object
properties:
status: { $ref: '#/components/schemas/StatusValue' }
StatusValue:
type: string
enum: [Ok, Fail]
Using the OpenAPI Specification
Generating API Clients
The OpenAPI specification can be used to generate client libraries in multiple languages:
Using openapi-generator:
# Generate Python client
openapi-generator generate -i openapi.yaml -g python -o ./python-client
# Generate TypeScript client
openapi-generator generate -i openapi.yaml -g typescript-axios -o ./typescript-client
# Generate Go client
openapi-generator generate -i openapi.yaml -g go -o ./go-client
Using swagger-codegen:
swagger-codegen generate -i openapi.yaml -l python -o ./python-client
Validating the Specification
To validate the OpenAPI specification:
# Using swagger-cli
swagger-cli validate openapi.yaml
# Using spectral
spectral lint openapi.yaml
Next Steps
- Authentication API - Detailed authentication flow
- API Guide Index - Browse all API documentation
- Operations Guide - Day-to-day operational procedures
11 - Troubleshooting Guide
Overview
This guide provides troubleshooting procedures for common issues encountered when operating the AgileTV CDN Manager (ESB3027). Use the diagnostic commands and resolution steps to identify and resolve problems.
Diagnostic Tools
Cluster Status
# Check node status
kubectl get nodes
# Check all pods
kubectl get pods -A
# Check events sorted by time
kubectl get events --sort-by='.lastTimestamp'
# Check resource usage
kubectl top nodes
kubectl top pods
Component Status
# Check deployments
kubectl get deployments
# Check statefulsets
kubectl get statefulsets
# Check persistent volumes
kubectl get pvc
kubectl get pv
# Check services
kubectl get services
# Check ingress
kubectl get ingress
Common Issues
Pods Stuck in Pending State
Symptoms: Pods remain in Pending state indefinitely.
Causes:
- Insufficient cluster resources (CPU/memory)
- No nodes match scheduling constraints
- PersistentVolume not available
Diagnosis:
# Describe the pending pod
kubectl describe pod <pod-name>
# Check events for scheduling failures
kubectl get events --field-selector reason=FailedScheduling
# Check node capacity
kubectl describe nodes | grep -A 5 "Allocated resources"
# Check available PVs
kubectl get pv
Resolution:
# Free up resources by scaling down non-critical workloads
kubectl scale deployment <deployment> --replicas=0
# Or add additional nodes to the cluster
# If PV is stuck, delete and recreate
kubectl delete pvc <pvc-name>
kubectl delete pod <pod-name>
Pods Stuck in ContainerCreating
Symptoms: Pods remain in ContainerCreating state.
Causes:
- Image pull failures
- Volume mount issues
- Network configuration problems
Diagnosis:
kubectl describe pod <pod-name>
# Check for image pull errors
kubectl get events | grep -i "failed to pull"
# Check volume mount status
kubectl get events | grep -i "mount"
Resolution:
# For image pull issues, verify image exists and credentials
kubectl get secret <pull-secret-name> -o yaml
# For volume issues, check Longhorn volume status
kubectl get volumes -n longhorn-system
# Delete stuck pod to trigger recreation
kubectl delete pod <pod-name> --force --grace-period=0
Persistent Volume Mount Failures
Symptoms: Pod fails to start with error “AttachVolume.Attach failed for volume… is not ready for workloads” or similar volume attachment errors.
Causes:
- Longhorn volume created but unable to be successfully mounted
- Network connectivity issues between nodes (Longhorn requires iSCSI and NFS traffic)
- Longhorn service unhealthy
- Incorrect storage class configuration
Diagnosis:
# Describe the failing pod to see the error
kubectl describe pod <pod-name>
# Check Longhorn volumes status
kubectl get volumes -n longhorn-system
# Check Longhorn UI for detailed volume status
kubectl port-forward -n longhorn-system svc/longhorn-frontend 8080:80
# Access: http://localhost:8080
Resolution:
# Verify firewall allows Longhorn traffic between nodes
# Ports 9500 and 8500 must be open (see Networking Guide)
# Check Longhorn is healthy
kubectl get pods -n longhorn-system
# If volume is stuck, delete PVC and pod to trigger recreation
kubectl delete pvc <pvc-name>
kubectl delete pod <pod-name>
Pods in CrashLoopBackOff
Symptoms: Pods repeatedly crash and restart.
Causes:
- Application configuration errors
- Missing dependencies (database not ready)
- Resource limits too low
- Liveness probe failures
Diagnosis:
# View current logs
kubectl logs <pod-name>
# View previous instance logs
kubectl logs <pod-name> -p
# Describe pod for restart reasons
kubectl describe pod <pod-name>
# Check if dependencies are healthy
kubectl get pods | grep -E "(postgres|kafka|redis)"
Resolution:
# For dependency issues, wait for dependencies to be ready
kubectl wait --for=condition=Ready pod/<dependency-pod> --timeout=300s
# For resource issues, increase limits
kubectl edit deployment <deployment-name>
# For configuration issues, check ConfigMaps and Secrets
kubectl get configmap <configmap-name> -o yaml
kubectl get secret <secret-name> -o yaml
# Restart the deployment
kubectl rollout restart deployment/<deployment-name>
Pods in Terminating State
Symptoms: Pods stuck in Terminating state indefinitely.
Causes:
- Volume detachment issues
- Node communication problems
- Finalizer blocking deletion
Diagnosis:
kubectl describe pod <pod-name>
# Check if node is reachable
kubectl get nodes
# Check finalizers
kubectl get pod <pod-name> -o jsonpath='{.metadata.finalizers}'
Resolution:
# Force delete the pod
kubectl delete pod <pod-name> --force --grace-period=0
# If node is unreachable, drain and remove from cluster
kubectl drain <node-name> --ignore-daemonsets --force
kubectl delete node <node-name>
Service Unreachable
Symptoms: Service endpoints not accessible.
Causes:
- No ready pods backing the service
- Network policy blocking traffic
- Service port mismatch
Diagnosis:
# Check service endpoints
kubectl get endpoints <service-name>
# Check if pods are ready
kubectl get pods -l app=<label>
# Check network policies
kubectl get networkpolicies
# Test connectivity from within cluster
kubectl run test --rm -it --image=busybox -- wget -O- <service-name>:<port>
Resolution:
# Ensure pods are ready and matching service selector
kubectl get pods --show-labels
# Check service selector matches pod labels
kubectl get service <service-name> -o jsonpath='{.spec.selector}'
# Temporarily disable network policy for testing
kubectl edit networkpolicy <policy-name>
Ingress Not Working
Symptoms: External access via ingress fails.
Causes:
- Traefik ingress controller not running
- Ingress configuration errors
- TLS certificate issues
- DNS resolution problems
Diagnosis:
# Check Traefik pods
kubectl get pods -n kube-system -l app.kubernetes.io/name=traefik
# Check ingress resources
kubectl get ingress
# Describe ingress for errors
kubectl describe ingress <ingress-name>
# Check Traefik logs
kubectl logs -n kube-system -l app.kubernetes.io/name=traefik
# Test DNS resolution
nslookup <hostname>
Resolution:
# Restart Traefik
kubectl rollout restart deployment -n kube-system traefik
# Fix ingress configuration
kubectl edit ingress <ingress-name>
# Renew or recreate TLS secret
kubectl create secret tls <secret-name> --cert=tls.crt --key=tls.key \
--dry-run=client -o yaml | kubectl apply -f -
# Verify hostname matches certificate
openssl x509 -in tls.crt -noout -subject -issuer
Database Connection Failures
Symptoms: Application cannot connect to PostgreSQL.
Causes:
- PostgreSQL cluster not ready
- Connection pool exhausted
- Network connectivity issues
- Authentication failures
Diagnosis:
# Check PostgreSQL cluster status
kubectl get clusters
# Check PostgreSQL pods
kubectl get pods -l app.kubernetes.io/name=postgresql
# Check PostgreSQL logs
kubectl logs -l app.kubernetes.io/name=postgresql
# Test connectivity
kubectl exec -it <app-pod> -- psql -h <postgres-service> -U <user> -d <database>
Resolution:
# Wait for PostgreSQL to be ready
kubectl wait --for=condition=Ready pod -l app.kubernetes.io/name=postgresql --timeout=300s
# Check connection string in application config
kubectl get secret <secret-name> -o jsonpath='{.data}' | base64 -d
# Restart application pods
kubectl rollout restart deployment/<deployment-name>
Kafka Connection Issues
Symptoms: Application cannot connect to Kafka.
Causes:
- Kafka controllers not ready
- Topic not created
- Network connectivity issues
Diagnosis:
# Check Kafka pods
kubectl get pods -l app.kubernetes.io/name=kafka
# Check Kafka logs
kubectl logs -l app.kubernetes.io/name=kafka
# List topics
kubectl exec -it <kafka-pod> -- kafka-topics.sh --bootstrap-server localhost:9092 --list
Resolution:
# Wait for Kafka controllers to be ready
kubectl wait --for=condition=Ready pod -l app.kubernetes.io/name=kafka --timeout=300s
# Create missing topic
kubectl exec -it <kafka-pod> -- kafka-topics.sh --bootstrap-server localhost:9092 \
--create --topic <topic-name> --partitions 3 --replication-factor 3
# Restart application to reconnect
kubectl rollout restart deployment/<deployment-name>
Redis Connection Issues
Symptoms: Application cannot connect to Redis.
Diagnosis:
# Check Redis pods
kubectl get pods -l app.kubernetes.io/name=redis
# Check Redis logs
kubectl logs -l app.kubernetes.io/name=redis
# Test connectivity
kubectl exec -it <redis-pod> -- redis-cli ping
Resolution:
# Wait for Redis to be ready
kubectl wait --for=condition=Ready pod -l app.kubernetes.io/name=redis --timeout=300s
# Restart application
kubectl rollout restart deployment/<deployment-name>
High Memory Usage
Symptoms: Pods approaching or hitting memory limits.
Diagnosis:
# Check memory usage
kubectl top pods
# Check OOMKilled pods
kubectl get pods --field-selector=status.phase=Failed
# Check for memory leaks in logs
kubectl logs <pod-name> | grep -i "memory\|oom"
Resolution:
# Temporarily increase memory limit
kubectl edit deployment <deployment-name>
# Or scale horizontally if HPA is enabled
kubectl scale deployment <deployment-name> --replicas=<n>
# Long-term: Update values.yaml and perform helm upgrade
High CPU Usage
Symptoms: Pods consistently using high CPU.
Diagnosis:
# Check CPU usage
kubectl top pods
# Check for runaway processes
kubectl top pods --sort-by=cpu
Resolution:
# Scale horizontally if HPA is enabled
kubectl scale deployment <deployment-name> --replicas=<n>
# Or increase CPU limits
kubectl edit deployment <deployment-name>
Persistent Volume Issues
Symptoms: PVC not binding or volume errors.
Diagnosis:
# Check PVC status
kubectl get pvc
# Check PV status
kubectl get pv
# Check Longhorn volumes
kubectl get volumes -n longhorn-system
# Check Longhorn UI for details
kubectl port-forward -n longhorn-system svc/longhorn-frontend 8080:80
Resolution:
# For stuck PVC, delete and recreate
kubectl delete pvc <pvc-name>
kubectl delete pod <pod-name>
# For Longhorn issues, check Longhorn UI
# Access via http://localhost:8080
# Recreate Longhorn volume if necessary
Zitadel Authentication Failures
Symptoms: Users cannot authenticate via Zitadel.
Causes:
- CORS configuration mismatch
- External domain misconfigured
- Zitadel pods not healthy
Diagnosis:
# Check Zitadel pods
kubectl get pods -l app.kubernetes.io/name=zitadel
# Check Zitadel logs
kubectl logs -l app.kubernetes.io/name=zitadel
# Verify external domain configuration
helm get values acd-manager -o yaml | grep -A 5 zitadel
Resolution:
# Ensure global.hosts.manager[0].host matches zitadel.zitadel.ExternalDomain
# Update values.yaml if needed
helm upgrade acd-manager /mnt/esb3027/charts/acd-manager \
--values ~/values.yaml
# Restart Zitadel
kubectl rollout restart deployment -l app.kubernetes.io/name=zitadel
Certificate Errors
Symptoms: TLS/SSL errors in browser or API calls.
Diagnosis:
# Check certificate expiration
kubectl get secret <tls-secret> -o jsonpath='{.data.tls\.crt}' | base64 -d | \
openssl x509 -noout -dates
# Check certificate subject
kubectl get secret <tls-secret> -o jsonpath='{.data.tls\.crt}' | base64 -d | \
openssl x509 -noout -subject -issuer
Resolution:
# Renew self-signed certificate
helm upgrade acd-manager /mnt/esb3027/charts/acd-manager \
--values ~/values.yaml \
--set ingress.selfSigned=true
# Or update manual certificate
kubectl create secret tls <secret-name> \
--cert=new-cert.crt --key=new-key.key \
--dry-run=client -o yaml | kubectl apply -f -
# Restart pods to pick up new certificate
kubectl rollout restart deployment <deployment-name>
Log Collection
Collecting Logs for Support
# Capture timestamp once to ensure consistency
TS=$(date +%Y%m%d-%H%M%S)
# Create log collection directory
mkdir -p ~/cdn-logs-$TS
cd ~/cdn-logs-$TS
# Collect pod logs
for pod in $(kubectl get pods -o name); do
kubectl logs $pod > ${pod#pod/}.log 2>&1
kubectl logs $pod -p > ${pod#pod/}.previous.log 2>&1 || true
done
# Collect cluster events
kubectl get events --sort-by='.lastTimestamp' > events.log
# Collect pod descriptions
for pod in $(kubectl get pods -o name); do
kubectl describe $pod > ${pod#pod/}.describe.txt
done
# Compress for transfer
tar czf cdn-logs-$TS.tar.gz *.log *.txt
Emergency Procedures
Complete Cluster Recovery
If the cluster is completely down:
Assess node status:
kubectl get nodesRestart K3s on nodes:
# On each node systemctl restart k3sIf primary server failed:
- Promote another server node
- Update load balancer/DNS to point to new primary
Restore from backup if necessary:
- See Upgrade Guide for restore procedures
Data Recovery
For data recovery scenarios:
- PostgreSQL: Use Cloudnative PG backup/restore
- Longhorn: Restore from volume snapshots
- Kafka: Replication handles most failures
Getting Help
If issues persist:
- Collect logs using the procedure above
- Check release notes for known issues
- Contact support with log bundle and issue description
Next Steps
After resolving issues:
- Operations Guide - Preventive maintenance procedures
- Configuration Guide - Verify configuration is correct
- Architecture Guide - Understand component dependencies
12 - Glossary
Overview
This glossary defines key terms and acronyms used throughout the AgileTV CDN Manager (ESB3027) documentation.
A
ACD (Agile Content Delivery)
The overall CDN solution comprising the Manager (ESB3027) and Director (ESB3024) components.
Agent Node
A Kubernetes node that runs workloads but does not participate in the control plane. Agent nodes provide additional capacity for running application pods.
API Gateway
See NGinx Gateway.
ASN (Autonomous System Number)
A unique identifier for a network on the internet. Used in GeoIP-based routing decisions.
C
CDN Director
The Edge Server Business (ESB3024) component that handles actual content routing and delivery. Multiple Directors can be managed by a single CDN Manager.
Cloudnative PG (CNPG)
A Kubernetes operator that manages PostgreSQL clusters. Provides high availability, automatic failover, and backup capabilities for the Manager’s database layer.
Confd
Configuration daemon that synchronizes configuration from the Manager to CDN Directors. Runs as a sidecar or separate deployment.
CORS (Cross-Origin Resource Sharing)
A security mechanism that allows web applications to make requests to a different domain. Zitadel enforces CORS policies requiring the external domain to match the configured hostname.
CrashLoopBackOff
A Kubernetes pod state indicating the container is repeatedly crashing and being restarted. Typically indicates a configuration or dependency issue.
D
Datastore
The internal key-value storage system used by the Manager for short-lived or simple structured data. Backed by Redis.
Descheduler
A Kubernetes component that periodically analyzes pod distribution and evicts pods from overutilized nodes to optimize cluster balance.
Director
See CDN Director.
E
EDB (EnterpriseDB)
A company that provides PostgreSQL-related software and services. The Cloudnative PG operator was originally developed by EDB.
Ephemeral Storage
Temporary storage available to pods. Used for temporary files and caches. Not persistent across pod restarts.
ESB (Edge Server Business)
The product family designation for CDN components. ESB3027 is the Manager, ESB3024 is the Director.
etcd
A distributed key-value store used by Kubernetes for cluster state management. Runs on Server nodes as part of the control plane.
F
FailedScheduling
A Kubernetes event indicating a pod could not be scheduled due to insufficient resources or scheduling constraints.
Flannel
A network overlay solution for Kubernetes. Provides VXLAN-based networking for pod-to-pod communication.
Frontend GUI
See MIB Frontend.
G
GeoIP
Geographic IP lookup service using MaxMind databases. Used for location-based routing decisions.
Grafana
A visualization and dashboard platform for time-series data. Used to display metrics collected by Telegraf and stored in VictoriaMetrics.
H
Helm Chart
A package of pre-configured Kubernetes resources. The CDN Manager is deployed via a Helm chart that handles all component installation.
HPA (Horizontal Pod Autoscaler)
A Kubernetes feature that automatically scales the number of pods based on CPU/memory utilization or custom metrics.
HTTP Server
The main API server component of the Manager, built with Actix Web (Rust framework).
I
Ingress
A Kubernetes resource that exposes HTTP/HTTPS routes from outside the cluster to services within. The CDN Manager uses Traefik as the ingress controller.
Ingress Controller
A component that implements ingress rules. The CDN Manager uses Traefik for primary ingress and NGinx for external Director communication.
K
Kafka
A distributed event streaming platform used by the Manager for asynchronous communication and event processing.
K3s
A lightweight Kubernetes distribution optimized for edge and production deployments. Used as the underlying cluster technology.
Kubernetes (K8s)
An open-source container orchestration platform. The CDN Manager runs on a K3s-based Kubernetes cluster.
L
Longhorn
A distributed block storage system for Kubernetes. Provides persistent volumes for stateful components like PostgreSQL and Kafka.
Liveness Probe
A Kubernetes health check that determines if a container is running properly. Failed liveness probes trigger container restart.
M
Manager
The central management component (ESB3027) for configuring and monitoring CDN Directors.
MaxMind
A provider of IP intelligence databases including GeoIP City, GeoLite2 ASN, and Anonymous IP databases used by the Manager.
MIB Frontend
The web-based configuration GUI for CDN operators. Provides a user interface for managing streams, routers, and other configuration.
Multi-Factor Authentication (MFA)
An authentication method requiring multiple forms of verification. Note: MFA is not currently supported in the CDN Manager and should be skipped during setup.
N
Name-based Virtual Hosting
A technique where multiple hostnames are served from the same IP address. Zitadel uses this for CORS validation.
Namespace
A Kubernetes abstraction for organizing cluster resources. The CDN Manager uses namespaces to group related components.
NGinx Gateway
An NGinx-based gateway that handles external communication with CDN Directors.
Node Token
A secret token used to authenticate new nodes joining a K3s cluster. Located at /var/lib/rancher/k3s/server/node-token on Server nodes.
O
Operator
A method of packaging, deploying, and managing a Kubernetes application. Cloudnative PG is an operator for PostgreSQL.
OOMKilled
A Kubernetes pod state indicating the container was terminated due to exceeding memory limits.
P
PDB (Pod Disruption Budget)
A Kubernetes feature that ensures a minimum number of pods remain available during voluntary disruptions like maintenance.
PersistentVolume (PV)
A piece of storage in the Kubernetes cluster. Created dynamically by Longhorn for stateful components.
PersistentVolumeClaim (PVC)
A request for storage by a pod. Bound to a PersistentVolume.
Pod
The smallest deployable unit in Kubernetes. Contains one or more containers.
PostgreSQL
An open-source relational database. Used by the Manager for persistent data storage, managed by Cloudnative PG.
Probe
A Kubernetes health check mechanism. Types include liveness, readiness, and startup probes.
Prometheus
An open-source monitoring and alerting toolkit. Telegraf exports metrics in Prometheus format.
R
RBAC (Role-Based Access Control)
A method of regulating access to resources based on user roles. Used by Kubernetes for authorization.
Readiness Probe
A Kubernetes health check that determines if a container is ready to receive traffic. Failed readiness probes remove the pod from service load balancing.
Redis
An in-memory data structure store used for caching and as the datastore backend for the Manager.
Replica
A copy of a pod. Multiple replicas provide high availability and load distribution.
Resource Preset
Predefined resource configurations (nano, micro, small, medium, large, xlarge, 2xlarge) for common deployment sizes.
Rolling Update
A deployment strategy that updates pods one at a time to maintain availability during upgrades.
S
Selection Input
A key-value storage mechanism used for configuration data that can be queried with wildcard patterns. Available in v1 and v2 APIs with different semantics.
Server Node
A Kubernetes node that participates in the control plane (etcd, API server). Can also run workloads unless tainted.
Service
A Kubernetes abstraction that defines a logical set of pods and a policy for accessing them. Provides stable networking endpoints.
ServiceAccount
A Kubernetes identity for processes running in pods. Used for authentication between Kubernetes components.
StatefulSet
A Kubernetes workload API object for managing stateful applications. Used for PostgreSQL and Kafka deployments.
Startup Probe
A Kubernetes health check that determines if a container application has started. Disables liveness and readiness checks until it succeeds.
Stream
A content stream configuration defining source and routing parameters.
T
Telegraf
An agent for collecting, processing, aggregating, and writing metrics. Runs on each node to gather system and application metrics.
TLS (Transport Layer Security)
A cryptographic protocol for secure communication. The CDN Manager uses TLS for all external HTTPS connections.
Topology Aware Hints
A Kubernetes feature that prefers routing traffic to pods in the same zone as the source. Reduces latency by keeping traffic local.
Traefik
A modern HTTP reverse proxy and ingress controller. Used as the primary ingress controller for the CDN Manager.
TTL (Time To Live)
The duration after which data expires. Used in the datastore and selection input APIs.
V
Values.yaml
The Helm chart configuration file. Contains all configurable parameters for the CDN Manager deployment.
VictoriaMetrics
A time-series database used for storing metrics data. Provides long-term storage and querying capabilities.
VXLAN
Virtual Extensible LAN. A network virtualization technology used by Flannel for pod networking.
Z
Zitadel
An identity and access management (IAM) platform used for authentication and authorization in the CDN Manager. Provides OAuth2/OIDC capabilities.
Default Credentials
The following table lists all default credentials used by the CDN Manager. Change these defaults before deploying to production.
| Service | Username | Password | Notes |
|---|---|---|---|
| Zitadel Console | admin@agiletv.dev | Password1! | Primary identity management; accessed at /ui/console |
| Grafana | admin | edgeware | Monitoring dashboards; accessed at /grafana |
Security Warning: These are default credentials only. For production deployments, you must change all default passwords before exposing the system to users.
Zitadel Default Account: Use the default
admin@agiletv.devaccount only to create a new administrator account with proper roles. After verifying the new account works, disable or delete the default admin account. For details on required roles and administrator permissions, see Zitadel’s Administrator Documentation. See the Next Steps Guide for initial configuration procedures.
Common Abbreviations
| Abbreviation | Meaning |
|---|---|
| API | Application Programming Interface |
| ASN | Autonomous System Number |
| CORS | Cross-Origin Resource Sharing |
| CPU | Central Processing Unit |
| DNS | Domain Name System |
| EDB | EnterpriseDB |
| ESB | Edge Server Business |
| GUI | Graphical User Interface |
| HA | High Availability |
| Helm | Helm Package Manager |
| HPA | Horizontal Pod Autoscaler |
| HTTP | Hypertext Transfer Protocol |
| HTTPS | HTTP Secure |
| IAM | Identity and Access Management |
| IP | Internet Protocol |
| JSON | JavaScript Object Notation |
| K8s | Kubernetes |
| MFA | Multi-Factor Authentication |
| MIB | Management Information Base |
| NIC | Network Interface Card |
| OAuth | Open Authorization |
| OIDC | OpenID Connect |
| PVC | PersistentVolumeClaim |
| PV | PersistentVolume |
| RBAC | Role-Based Access Control |
| SSL | Secure Sockets Layer |
| TCP | Transmission Control Protocol |
| TLS | Transport Layer Security |
| TTL | Time To Live |
| UDP | User Datagram Protocol |
| UI | User Interface |
| VPA | Vertical Pod Autoscaler |
| VXLAN | Virtual Extensible LAN |
Next Steps
After reviewing terminology:
- Architecture Guide - Understand component relationships
- Configuration Guide - Full configuration reference
- Operations Guide - Day-to-day operational procedures