This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

AgileTV CDN Manager (esb3027)

Centralized Management of AgileTV CDN Director

1: Getting Started
2: System Requirements Guide
3: Architecture Guide
4: Quick Start Guide
5: Installation Guide
6: Configuration Guide
7: Networking
8: Storage Guide
9: Metrics and Monitoring
10: Operations Guide
11: Post Installation Guide
12: Releases

12.1: Release esb3027-1.4.0
12.2: Release esb3027-1.2.1
12.3: Release esb3027-1.2.0
12.4: Release esb3027-1.0.0

13: API Guides

13.1: Healthcheck API
13.2: Authentication API
13.3: Router API
13.4: Selection Input API
13.5: Operator UI API

14: Use Cases

14.1: Custom Deployments

15: Troubleshooting Guide
16: Glossary

1 - Getting Started

Getting Started Guide

Introduction

The ESB3027 AgileTV CDN Manager is a suite of services responsible for coordinating the Content Delivery Network (CDN) operations. It provides essential APIs and features supporting the ESB3024 AgileTV CDN Director. Key capabilities include:

Centralized user management for authentication and authorization Configuration services, APIs, and user interfaces CDN usage monitoring and metrics reporting License-based tracking, monitoring, and billing Core API services Event coordination and synchronization The software can be deployed as either a self-managed cluster or in a public cloud environment such as AWS. Designed as a cloud-native application following CNCF best practices, its deployment varies slightly depending on the environment:

Self-hosted: A lightweight Kubernetes cluster runs on bare-metal or virtual machines within the customer’s network. The application is deployed within this cluster.

Public cloud: The cloud provider manages the cluster infrastructure, with the application deploying into it. The differences are primarily operational; the software’s functionality remains consistent across environments, with distinctions clearly noted in this guide.

Since deployment relies on Kubernetes, familiarity with key tools is essential:

helm: The package manager for Kubernetes, used for installing, upgrading, rolling back, and removing application charts. Helm charts are collections of templates and default values that generate Kubernetes manifests for deployment.

kubectl: The primary command-line tool for managing Kubernetes resources and applications. In a self-hosted setup, it’s typically used from the control plane nodes; in cloud environments, it may be run locally, often from your laptop or desktop.

Cloud provider tools: In cloud environments, familiarity with CLI tools like awscli and the WebUI is also required for managing infrastructure.

Architectural Overview

See the Architecture Guide.

Installation Overview

The installation process for the manager varies depending on the environment.

Self-hosted: Begin by deploying a lightweight Kubernetes cluster. The installation ISO includes an installer for a simple K3s cluster, a Rancher Labs Kubernetes distribution.

Public cloud: Use your cloud provider’s tooling to deploy the cluster. Specific instructions are beyond this document’s scope, as they vary by provider.

Once the cluster is operational, the remaining steps are the same: deploy the manager software using Helm.

The following sections provide an overview based on your environment. For detailed instructions, refer to the Installation Guide.

Hardware Requirements

In a Kubernetes cluster, each node has a fixed amount of resources—such as CPU, memory, and free disk space. Pods are assigned to nodes based on resource availability. The control plane uses a best-effort approach to schedule pods on nodes with the lowest overall utilization.

Kubernetes manifests for each deployment specify both resource requests and limits for each pod. A node must have at least the requested resources available to schedule a pod there. Since each replica of a deployment requires the same resource requests, the total resource consumption depends on the number of replicas, which is configurable.

Additionally, a Horizontal Pod Autoscaler can automatically adjust the number of replicas based on resource utilization, within defined minimum and maximum bounds.

Because of this, the hardware requirements for deploying the software depend heavily on expected load, configuration, and cluster size. Nonetheless, there are some general recommendations for hardware selection.

See the System Requirements Guide for details about the recommended hardware, supported operating systems, and networking requirements.

Installation Guide

The installation instructions can be found in the Installation Guide.

Configuration Reference

A detailed look at the configuration can be found in the Configuration Reference Guide.

2 - System Requirements Guide

Cluster Sizing, Hardware, Operating System, and Networking Requirements

Cluster Sizing

The ESB3027 AgileTV CDN Manager requires a minimum of three machines for production deployment. While it’s possible to run the software on a single node in a lab environment, such an setup will not offer optimal performance or high availability.

A typical cluster comprises nodes assigned to either a Server or Agent role. Server nodes are responsible for running the control plane software, which manages the cluster, and they can also host application workloads if configured accordingly. Agent nodes, on the other hand, execute the application containers (workloads) but do not participate in the control plane or quorum. They serve to scale capacity as needed. See the Installation Guide for more information about the role types and responsibilities.

For high availability, it is essential to have an odd number of Server nodes. The minimum recommended is three, which allows the cluster to tolerate the loss of one server node. Increasing the Server nodes to five enhances resilience, enabling the cluster to withstand the loss of two server nodes. The critical factor is that more than half of the Server nodes are available; this quorum ensures the cluster remains operational. The loss of Agent nodes does not impact quorum, though workloads on failed nodes are automatically migrated if there is sufficient capacity.

Hardware Requirements

Single-Node Lab Cluster (Acceptance Testing)

For customer acceptance testing in a single-node lab environment, the following hardware is required. These requirements match the Lab Install Guide and are intended for non-production, single-node clusters only:

	CPU	Memory	Disk
Minimum	8 Cores	16GB	128GB
Recommended	12 Cores	24GB	128GB

Disk space should be available in the /var partition

Note: These requirements are for lab/acceptance testing only. For production workloads, see below.

Production Cluster (3 or More Nodes)

The following tables outline the minimum and recommended hardware specifications for different node roles within a production cluster. All disk space values refer to the available space on the /var/lib/longhorn partition. Additional capacity may be needed in other locations not specified here; it is advisable to follow the operating system vendor’s recommendations for those areas. For optimal performance, it is recommended to use SSDs or similar high-speed disks for Longhorn storage. Both virtual machines and bare-metal hardware are supported; however, hosting multiple nodes under a single hypervisor can impact performance.

Server Role - Control Plane only

	CPU	Memory	Disk
Minimum	4 Cores	8GB	64GB
Recommended	8 Cores	16GB	128GB

Disk space should be available in the /var partition

Agent Role

	CPU	Memory	Disk
Minimum	8 Cores	16GB	128GB
Recommended	16 Cores	32GB	256GB

Disk space should be available in the /var partition

Server Role - Control Plane + Workloads

	CPU	Memory	Disk
Minimum	12 Cores	24GB	128GB
Recommended	24 Cores	48GB	256GB

Disk space should be available in the /var partition

Operating System Requirements

Operating System	Supported
RedHat 7	No
RedHat 8	Yes
RedHat 9	Yes
RedHat 10	Untested

We currently support RedHat Enterprise Linux or any compatible clone such as Oracle Linux, Alma Linux, etc., as long as the major version is listed as supported in the above table.

SELinux support will be installed if SELinux is “Enforcing” when installing the ESB3027 AgileTV CDN Manager cluster.

Networking Requirements

A minimum of 1 Network Interface Card must be present and configured as the default gateway on the node when the cluster is installed. If the node does not have an interface with the default route, a default route must be configured. See the Installation Guide for details.

3 - Architecture Guide

General Architectural Overview

Kubernetes Architecture

Kubernetes is an open-source container orchestration platform that simplifies the deployment, management, and scaling of containerized applications. It provides a robust framework to run applications reliably across a cluster of machines by abstracting the complexities of the underlying infrastructure. At its core, Kubernetes manages resources through various objects that define how applications are deployed and maintained.

Nodes are the physical or virtual machines that make up the Kubernetes cluster. Each node runs a container runtime, the kubelet agent, and other necessary components to host and manage containers. The smallest deployable units in Kubernetes are Pods, which typically consist of one or more containers sharing storage, network, and a specified way to run the containers. Containers within Pods are the actual runtime instances of the applications.

To manage the lifecycle of applications, Kubernetes offers different controllers such as Deployments and StatefulSets. Deployments are used for stateless applications, enabling easy rolling updates and scaling. StatefulSets, on the other hand, are designed for stateful applications that require persistent storage and stable network identities, like databases. Kubernetes also uses Services to provide a stable network endpoint that abstracts Pods, facilitating reliable communication within the application or from outside the cluster, often distributing traffic load across multiple Pods.

graph TD
    subgraph Cluster
        direction TB
        Node1["Node"]
        Node2["Node"]
    end

    subgraph "Workloads"
        Deployment["Deployment (stateless)"]
        StatefulSet["StatefulSet (stateful)"]
        Pod1["Pod"]
        Pod2["Pod"]
        Container1["Container"]
        Container2["Container"]
    end

    subgraph "Networking"
        Service["Service"]
    end

    Node1 -->|Hosts| Pod1
    Node2 -->|Hosts| Pod2
    Deployment -->|Manages| Pod1
    StatefulSet -->|Manages| Pod2
    Pod1 -->|Contains| Container1
    Pod2 -->|Contains| Container2
    Service -->|Provides endpoint to| Pod1
    Service -->|Provides endpoint to| Pod2

Additional Concepts

Both Deployments and StatefulSets can be scaled by adjusting the number of Pod replicas. In a Deployment, replicas are considered identical clones of the Pod, and a Service typically performs load balancing across them. Each replica in a ReplicaSet is assigned a fixed name, usually following a pattern like <name>-<index>, for example, postgresql-0, postgresql-1, and so on.

Many applications use a fixed number of replicas set through Helm, which remains constant regardless of system load. Alternatively, for more dynamic scaling, a Horizontal Pod Autoscaler (HPA) can be used to automatically adjust the number of replicas between a defined minimum and maximum based on real-time load metrics. In public cloud environments, a Vertical Pod Autoscaler (VPA) may also be employed to dynamically scale the number of nodes, but since this feature is not supported in self-hosted setups and depends on the specific cloud provider’s implementation, it is less commonly used in on-premises environments.

Architectural Diagram

graph TD
    subgraph Cluster
        direction TB
        PostgreSQL[PostgreSQL Database]
        Kafka[kafka-controller Pods]
        Redis[Redis Master & Replicas]
        VictoriaMetrics[VictoriaMetrics]
        Prometheus[Prometheus Server]
        Grafana[Grafana Dashboard]
        Gateway[Nginx Gateway]
        Confd[Confd]
        Manager[ACD-Manager]
        Frontend[MIB Frontend]
        ZITADEL[Zitadel]
        Telegraf[Telegraf]
        AlertManager[Alertmanager]
    end

    PostgreSQL -->|Stores data| Manager
    Kafka -->|Streams data| Manager
    Redis -->|Cache / Message Broker| Manager
    VictoriaMetrics -->|Billing data| Grafana
    Prometheus -->|Billing data| VictoriaMetrics
    Prometheus -->|Monitoring data| Grafana
    Manager -->|Metrics & Monitoring| Prometheus
    Manager -->|Alerting| AlertManager
    Manager -->|User Interface| Frontend
    Manager -->|Authentication| ZITADEL
    Frontend -->|Authentication| Manager
    Confd -->|Config Updates| Manager
    Telegraf -->|System Metrics| Prometheus
    Gateway -->|Proxies| Director[Director APIs]

    style PostgreSQL fill:#f9f,stroke:#333,stroke-width:1px
    style Kafka fill:#ccf,stroke:#333,stroke-width:1px
    style Redis fill:#cfc,stroke:#333,stroke-width:1px
    style VictoriaMetrics fill:#ffc,stroke:#333,stroke-width:1px
    style Prometheus fill:#ccf,stroke:#333,stroke-width:1px
    style Grafana fill:#f99,stroke:#333,stroke-width:1px
    style Gateway fill:#eef,stroke:#333,stroke-width:1px
    style Confd fill:#eef,stroke:#333,stroke-width:1px
    style Manager fill:#eef,stroke:#333,stroke-width:1px
    style Frontend fill:#eef,stroke:#333,stroke-width:1px
    style ZITADEL fill:#eef,stroke:#333,stroke-width:1px
    style Telegraf fill:#eef,stroke:#333,stroke-width:1px
    style AlertManager fill:#eef,stroke:#333,stroke-width:1px

Cluster Scaling

Most components, of the cluster can be horizontally scaled, as long as sufficient resources exist in the cluster to support the additional pods. There are a few exceptions however. The Selection Input service, currently does not support scaling as the order in which Kafka records would no longer be maintained among different consumer group members. Services such as PostgreSQL, Prometheus and VictoriaMetrics also do not support scaling at the present time due to the additional configuration requirements. Most if not all of the other services may be scaled, either by explicitly setting the number of replicas in the configuration or in some cases by enabling and configuring the horizontal pod autoscaler.

The Horizontal Pod Autoscaler, monitors the resource utilization of the Pods in a deployment, and based on some configurable metrics, will manage the scaling between a preset minimum and maximum number of replicas. See the Configuration Guide for more information.

Kubernetes automatically selects which node will run the pods based on several factors including, the resource utilization of the nodes, any pod and node affinity rules, selector labels, among other considerations. By default, all nodes with the ability to run workloads of both Server and Agent roles are considered unless specific configuration for node and pod affinity rules have been defined.

Summary

The acd-manager interacts with core components like PostgreSQL, Kafka, and Redis for data storage, messaging, and caching.
It exposes APIs via the API Gateway and integrates with Zitadel for authentication.
Monitoring and alerting are handled through Prometheus, VictoriaMetrics, Grafana, and Alertmanager.
Supporting services like Confd facilitate configuration management, while Telegraf collects system metrics.

4 - Quick Start Guide

Quick start guide for deploying the cluster

Lab Install Guide

This section describes a simplified installation process for customer acceptance testing in a single-node lab environment. Unlike the production Quick Start Guide (which assumes 3 or more nodes), the Lab Install Guide is intended for customers to perform acceptance testing prior to installing a production environment.

System Requirements:

RHEL 8 or 9 (or equivalent) with at least a minimal installation
8-core CPU
16 GB RAM
128 GB available disk space in the /var partition

Step 1: Mount the ISO

mkdir -p /mnt/esb3027
mount -o loop,ro esb3027-acd-manager-X.Y.Z.iso /mnt/esb3027

Step 2: Install the Base Cluster Software

/mnt/esb3027/install

Step 3: (Air-gapped only) Mount the Extras ISO and Load Images

mkdir -p /mnt/esb3027-extras
mount -o loop,ro esb3027-acd-manager-extras-X.Y.Z.iso /mnt/esb3027-extras
/mnt/esb3027-extras/load-images

Step 4: Deploy the Cluster Helm Chart

helm install --wait --timeout 10m acd-cluster /mnt/esb3027/helm/charts/acd-cluster

Step 5: Deploy the Manager Helm Chart

helm install acd-manager /mnt/esb3027/helm/charts/acd-manager --values ~/values.yaml --timeout 10m

Step 6: Next Steps

See the Post Install Guide for post-installation steps and recommendations.

You can now access the manager and begin acceptance testing. For full configuration details, see the full Installation Guide.

Quick Start Guide

This section provides a concise, step-by-step summary for installing the ESB3027 AgileTV CDN Manager cluster in a production environment. The Quick Start Guide is intended for production deployments with three or more nodes, providing high availability and scalability. For full details, see the full Installation Guide.

Step 1: Mount the ISO

mkdir -p /mnt/esb3027
mount -o loop,ro esb3027-acd-manager-X.Y.Z.iso /mnt/esb3027

Step 2: Install the Base Cluster Software

/mnt/esb3027/install

Step 3: (Air-gapped only) Mount the Extras ISO and Load Images

mkdir -p /mnt/esb3027-extras
mount -o loop,ro esb3027-acd-manager-extras-X.Y.Z.iso /mnt/esb3027-extras
/mnt/esb3027-extras/load-images

Step 4: Fetch the Node Token

cat /var/lib/rancher/k3s/server/node-token

Step 5: Join Additional Nodes

On each additional node, repeat Step 1, then run:

/mnt/esb3027/join-server https://<primary-server-ip>:6443 <node-token>
# or for agent nodes:
/mnt/esb3027/join-agent https://<primary-server-ip>:6443 <node-token>

Step 6: Deploy the Cluster Helm Chart

helm install --wait --timeout 10m acd-cluster /mnt/esb3027/helm/charts/acd-cluster

Step 7: Deploy the Manager Helm Chart

helm install acd-manager /mnt/esb3027/helm/charts/acd-manager --values ~/values.yaml --timeout 10m

Step 8: Next Steps

See the Post Install Guide for post-installation steps and recommendations.

For configuration details and troubleshooting, see the full Installation Guide.

5 - Installation Guide

How to install ESB3027 AgileTV CDN Manager

SELinux Requirements

SELinux is fully supported provided it is enabled and set to “Enforcing” mode at the time of the initial cluster installation on all Nodes. This is the default configuration for Red Hat Enterprise Linux and its derivatives, such as Oracle Linux and AlmaLinux. If the mode is set to “Enforcing” prior to install time, the necessary SELinux packages will be installed, and the cluster will be started with support for SELinux. For these reasons, enabling SELinux after the initial cluster installation is not supported.

Firewalld Requirements

Please see the Networking Guide for the current firewall recommendations.

Hardware Requirements

Refer to the System Requirements Guide for the current Hardware, Operating System, and Network Requirements.

Networking Requirements

A minimum of one Network Interface Card must be present and configured as the default gateway on the node when the cluster is installed. If the node does not have an interface with the default route, a default route must be configured. Even a black-hole route via a dummy interface will suffice. The K3s software requires a default route in order to auto-detect the node’s primary IP, and for cluster routing to function properly. To add a dummy route do the following:

ip link add dummy0 type dummy
ip link set dummy0 up
ip addr add 203.0.113.254/31 dev dummy0
ip route add default via 203.0.113.255 dev dummy0 metric 1000

Special Considerations when using Multiple Network Interfaces

If there are special network considerations, such as using a non-default interface for cluster communication, that must be configured using the INSTALL_K3S_EXEC environment variable as below before installing the cluster or joining nodes.

As an example, consider the case where the node contains two interfaces, bond0 and bond1, where the default route exists through bond0, but where bond1 should be used for cluster communication. In that case, ensure that the INSTALL_K3S_EXEC environment variable is set as follows in the environment prior to installing or joining the cluster. Assuming that bond1 has the local IP address 10.0.0.10:

export INSTALL_K3S_EXEC="<MODE> --node-ip 10.0.0.10 --flannel-iface=bond1"

Where MODE should be one of server or agent depending on the role of the node. The initial node used to create the cluster MUST be server, and additional nodes vary depending on the role.

Air-Gapped Environments

In air-gapped environments—those without direct Internet access—additional considerations are required. First, on each node, the Operating System’s ISO must be mounted so that dnf can be used to install essential packages included with the OS. Second, the “Extras” ISO from the ESB3027 AgileTV CDN Manager must be mounted to provide access to container images for third-party software that would otherwise be downloaded from public repositories. Details on mounting this ISO and loading the included images are provided below.

Introduction

details about node roles and sizing can be found in the System Requirements Guide. Installing the ESB3027 AgileTV CDN Manager for production requires a minimum of three nodes. More details about node roles and sizing can be found in the System Requirements Guide. Before beginning the installation, select one node as the primary “Server” node. This node will serve as the main installation point. Once additional Server nodes join the cluster, all Server nodes are considered equivalent, and cluster operations can be managed from any of them. The typical process involves installing the primary node as a Server, then adding more Server nodes to expand the cluster, followed by joining Agent nodes as needed to increase capacity.

Roles

All nodes in the cluster have one of two roles. Server nodes run the control-plane software necessary to manage the cluster and provide redundancy. Agent nodes do not run the control-plane software; instead, they are responsible for running the Pods that make up the applications. Jobs are distributed among agent nodes to enable horizontal scalability of workloads. However, agent nodes do not contribute to the cluster’s high availability. If an agent node fails, the Pods assigned to that node are automatically moved to another node, provided sufficient resources are available.

Control-plane only Server nodes

Both server nodes and agent nodes run workloads within the cluster. However, a special attribute called the “CriticalAddonsOnly” taint can be applied to server nodes. This taint prevents the node from scheduling workloads that are not part of the control plane. If the hardware allows, it is recommended to apply this taint to server nodes to separate their responsibilities. Doing so helps prevent misbehaving applications from negatively impacting the overall health of the cluster.

graph TD
    subgraph Cluster
        direction TB
        ServerNodes[Server Nodes]
        AgentNodes[Agent Nodes]
    end

    ServerNodes -->|Manage cluster and control plane| ControlPlane
    ServerNodes -->|Provide redundancy| Redundancy

    AgentNodes -->|Run application Pods| Pods
    Pods -->|Handle workload distribution| Workloads
    AgentNodes -->|Failover: Pods move if node fails| Pods

    ServerNodes -->|Can run Pods unless tainted with CriticalAddonsOnly| PodExecution
    Taint[CriticalAddonsOnly Taint] -->|Applied to server nodes to restrict workload| ServerNodes

For high availability, at least three nodes running the control plane are required, along with at least three nodes running workloads. These can be a combination of server and agent roles, provided that the control-plane nodes are sufficient. If a server node has the “CriticalAddonsOnly” taint applied, an additional agent node must be deployed to ensure workloads can run. For example, the cluster could consist of three untainted server nodes, or two untainted servers, one tainted server, and one agent, or three tainted servers and three agents—all while maintaining at least three control-plane nodes and three workload nodes.

The “CriticalAddonsOnly” taint can be applied to server nodes at any time after cluster installation. However, it only affects Pods scheduled in the future. Existing Pods that have already been assigned to a server node will remain there until they are recreated or rescheduled due to an external event.

kubectl taint nodes <node-name> CriticalAddonsOnly=true:NoSchedule

Where node-name is the hostname of the node for which to apply the taint. Multiple node names may be specified in the same command. This command should only be run from one of the server nodes.

Installing the Primary Server Node

Mount the ESB3027 ISO

Start by mounting the core ESB3027 ISO on the system. There are no limitations on the exact mountpoint used, but for this document, we will assume /mnt/esb3027.

mkdir -p /mnt/esb3027
mount -o loop,ro esb3027-acd-manager-X.Y.Z.iso /mnt/esb3027

Run the installer

Run the install command to install the base cluster software.

/mnt/esb3027/install

(Air-gapped only) Mount the “Extras” ISO and Load Container Images

In an air-gapped environment, after running the installer, the “extras” image must be mounted. This image contains publicly available container images that otherwise would be simply downloaded from the source repositories.

mkdir -p /mnt/esb3027-extras
mount -o loop,ro esb3027-acd-manager-extras-X.Y.Z.iso /mnt/esb3027-extras

The public container images for third-party products such as Kafka, Redis, Zitadel, etc., need to be loaded into the container runtime. An embedded registry mirror is used to distribute these images to other nodes within the cluster, so this only needs to be performed on one machine.

/mnt/esb3027-extras/load-images

Fetch the primary node token

In order to join additional nodes into the cluster, a unique node token must be provided. This token is automatically generated on the primary node during the installation process. Retrieve this now, and take note of it for later use.

cat /var/lib/rancher/k3s/server/node-token

Join Additional Server Nodes

From each additional server node, mount the core ISO and join the cluster using the following commands.

mkdir -p /mnt/esb3027
mount esb3027-acd-manager-X.Y.Z.iso /mnt/esb3027

Obtain the node token from the primary server as you will need to include it in the following command. You will also need the URL to the primary server for which to connect.

/mnt/esb3027/join-server https://primary-server-ip:6443 abcdefg0123456...987654321

Where primary-server-ip is replaced with the IP address to which this node should connect to the primary server, and abcdef...321 is the contents of the node-token retrieved from the primary server.

Repeat the above steps on each additional Server node in the cluster.

Join Agent Nodes

From each additional agent node, mount the core ISO and join the cluster using the following commands.

mkdir -p /mnt/esb3027
mount esb3027-acd-manager-X.Y.Z.iso /mnt/esb3027

Obtain the node token from the primary server as you will need to include it in the following command. You will also need the URL to the primary server for which to connect.

/mnt/esb3027/join-agent https://primary-server-ip:6443 abcdefg0123456...987654321

Repeat the above steps on each additional Agent node in the cluster.

Verify the state of the cluster

At this point, a generic Kubernetes cluster should have multiple nodes connected and be marked Ready. Verify this is the case by running the following from any one of the Server nodes.

kubectl get nodes

Each node in the cluster should be listed in the output with the status “Ready”, and the Server nodes should have “control-plane” in the listed Roles. If this is not the case, see the Troubleshooting Guide to help diagnose the problem.

Deploy the cluster helm chart

The acd-cluster helm chart, which is included on the core ISO, contains the clustering software which is required for self-hosted clusters, but may be optional in Cloud deployments. Currently this consists of a PostgreSQL database server, but additional components may be added in later releases.

helm install --wait --timeout 10m acd-cluster /mnt/esb3027/helm/charts/acd-cluster

Deploying the Manager chart

The acd-manager helm chart is used to deploy the acd-manager application as well as any of the third-party services on which the chart depends. Installing this chart requires at least a minimal configuration to be applied. To get started, either copy the default values.yaml file from the chart directory /mnt/esb3027/helm/charts/acd-manager/values.yaml or copy the following minimal template to a writable location such as the user’s home directory.

global:
  hosts:
    manager:
      - host: manager.local
    routers:
      - name: director-1
        address: 192.0.2.1
      - name: director-2
        address: 192.0.2.2
zitadel:
  zitadel:
    configmapConfig:
      ExternalDomain: manager.local

Where:

manager.local is either the external IP or resolvable DNS name used to access the manager’s cluster.
All director instances should be listed in the global.hosts.routers section. The name field is used in URLs, and must consist of only alpha-numeric characters or ‘.’, ‘-’, or ‘_’.

Further details on the available configuration options in the default values.yaml file can be found in the Configuration Guide.

You must set at a minimum the following properties:

Property	Type	Description
global.hosts.manager	Array	List of external IP addresses or DNS hostnames for each node in the cluster
global.hosts.router	Array	List of `name` and `address` for each instance of ESB3024 AgileTV CDN Director
zitadel.zitadel.configmapConfig.ExternalDomain	String	External DNS domain name or IP address of one manager node. This must match the first entry from `global.hosts.manager`

Note! The Zitadel ExternalDomain must match the hostname or IP address given in the first global.hosts.manager entry, and MUST match the Origin used when accessing Zitadel. This is enforced by CORS.

Hint: For non-air-gapped environments, where no DNS servers are present, a third-party service sslip.io may be used to provide a resolvable DNS name which can be used for both the global.hosts.manager and Zitadel ExternalDomain entries. Any IP address passed as W.X.Y.Z.sslip.io will resolve to the IP W.X.Y.Z

Only the value used for Zitadel’s ExternalDomain may be used to access Zitadel due to CORS restrictions. E.g. if that is set to “10.10.10.10.sslip.io”, then Zitadel must be accessed via the URL https://10.10.10.10.sslip.io/ui/console. This must match the first entry in global.hosts.manager as that entry will be used by internal services that need to interact with Zitadel, such as the frontend GUI and the manager API services.

Importing TLS Certificates

By default, the manager will generate a self-signed TLS certificate for use with the cluster ingress.

In production environments, it is recommended to use a valid TLS certificate issued by a trusted Certificate Authority (CA).

To install the TLS certificate pair into the ingress controller, the certificate and key must be saved in a Kubernetes secret. The simplest way of doing this is to let Helm generate the secret by including the PEM formatted certificate and private key directly in the configuration values. Alternatively, the secret can be created manually and simply referenced by the configuration.

Option 1: Let Helm manage the secret

To have Helm automatically manage the secret based on the PEM formatted certificate and key, add a record to ingress.secrets as described in the following snippet.

ingress:
  secrets:
    - name: <secret-name>
      key: |-
        -----BEGIN RSA PRIVATE KEY-----
        ...
        -----END RSA PRIVATE KEY-----
      certificate: |-
        -----BEGIN CERTIFICATE-----
        ...
        -----END CERTIFICATE-----

Option 2: Manually creating the secret

To manually create the secret in Kubernetes, execute the following command: This will create a secret named “secret-name”.

kubectl create secret tls secret-name --cert=tls.crt --key=tls.key

Configure the Ingress

The ingress controllers must be configured as to the name of the secret holding the certificate and key files. Additionally, the DNS hostname or IP address, covered by the certificate, which Must be used to access the ingress, must be set in the configuration.

ingress:
  hostname: <dns-hostname>
  tls: true
  secretName: <secret-name>

zitadel:
  ingress:
    tls:
      - hosts:
          - <dns-hostname>
        secretName: <secret-name>

confd:
  ingress:
    hostname: <dns-hostname>
    tls: true
    secretName: <secret-name>

mib-frontend:
  ingress:
    hostname: <dns-hostname>
    tls: true
    secretName: <secret-name>

dns-hostname - A valid DNS hostname for the cluster which is valid for the certificate. For compatibility with Zitadel and CORS restrictions, this MUST be the same DNS hostname listed as the first entry in global.hosts.manager.
secret-name - An arbitry name used to identify the Kubernetes secret containing the TLS certificate and key. This has a maximum length limitation of 53 characters.

Loading Maxmind GeoIP databases

The Maxmind GeoIP databases are required if GeoIP lookups are to be performed by the manager. If this functionality is used, then Maxmind formatted GeoIP databases must be configured. The following databases are used by the manager.

GeoIP2-City.mmdb - The City Database.
GeoLite2-ASN.mmdb - The ASN Database.
GeoIP2-Anonymous-IP.mmdb - The VPN and Anonymous IP database.

A helper utility has been provided on the ISO called generate-maxmind-volume that will prompt the user for the locations of these 3 database files, and the name of a volume, which will be created in Kubernetes. After running this command, set the manager.maxmindDbVolume property in the configuration to the volume name.

To run the utility, use:

/mnt/esb3027/generate-maxmind-volume

Installing the Chart

Install the acd-manager helm chart using the following command: (This assumes the configuration is in ~/values.yaml)

helm install acd-manager /mnt/esb3027/helm/charts/acd-manager --values ~/values.yaml --timeout 10m

By default, there is not expected to be much output from the helm install command itself. If you would like to see more detailed information in real-time throughout the deployment process, you can add the --debug flag to the command:

helm install acd-manager /mnt/esb3027/helm/charts/acd-manager --values ~/values.yaml --timeout 10m --debug

Note: The --timeout 10m flag increases the default Helm timeout from 5 minutes to 10 minutes. This is recommended because the default may not be sufficient on slower hardware or in resource-constrained environments. You may need to adjust the timeout value further depending on your system’s performance or deployment conditions.

Monitor the chart rollout with the following command:

kubectl get pods

The output of which should look similar to the following:

NAME                                             READY   STATUS      RESTARTS   AGE
acd-cluster-postgresql-0                         1/1     Running     0          44h
acd-manager-6c85ddd747-5j5gt                     1/1     Running     0          43h
acd-manager-confd-558f49ffb5-n8dmr               1/1     Running     0          43h
acd-manager-gateway-7594479477-z4bbr             1/1     Running     0          43h
acd-manager-grafana-78c76d8c5-c2tl6              1/1     Running     0          43h
acd-manager-kafka-controller-0                   2/2     Running     0          43h
acd-manager-kafka-controller-1                   2/2     Running     0          43h
acd-manager-kafka-controller-2                   2/2     Running     0          43h
acd-manager-metrics-aggregator-f6ff99654-tjbfs   1/1     Running     0          43h
acd-manager-mib-frontend-67678c69df-tkklr        1/1     Running     0          43h
acd-manager-prometheus-alertmanager-0            1/1     Running     0          43h
acd-manager-prometheus-server-768f5d5c-q78xb     1/1     Running     0          43h
acd-manager-redis-master-0                       2/2     Running     0          43h
acd-manager-redis-replicas-0                     2/2     Running     0          43h
acd-manager-selection-input-844599bc4d-x7dct     1/1     Running     0          43h
acd-manager-telegraf-585dfc5ff8-n8m5c            1/1     Running     0          43h
acd-manager-victoria-metrics-single-server-0     1/1     Running     0          43h
acd-manager-zitadel-69b6546f8f-v9lkp             1/1     Running     0          43h
acd-manager-zitadel-69b6546f8f-wwcmx             1/1     Running     0          43h
acd-manager-zitadel-init-hnr5p                   0/1     Completed   0          43h
acd-manager-zitadel-setup-kjnwh                  0/2     Completed   0          43h

The output contains a “READY” column, which indicates the number of ready pods on the left, and the number of requested pods on the right. Pods with status “Completed” are one time commands that have terminated successfully and can be ignored in this output. For “Running” pods, once all pods have the same number on both sides of the “READY” status the rollout is complete.

If a Pod is marked as “CrashLoopBackoff” or “Error” this means that either one of the containers in the pod has failed to deploy, or that the container has terminated in an Error state. See the Troubleshooting Guide to help diagnose the problem. The Kubernetes cluster will retry failed pod deployments several times, and the number in the “RESTARTS” column will show the number of times that has happened. If a pod restarts during the initial rollout, this may simply be that the state of the cluster was not as expected by the pod at that time, and this can be safely ignored. After the initial rollout has completed, the pods should stabilize, and multiple restarts may be an indication that something is wrong. In that case, refer to the Troubleshooting Guide for more information.

Next Steps

For post-installation steps, see the Post Install Guide.

6 - Configuration Guide

Configuration Guide

Overview

When deploying the acd-manager helm chart, a configuration file containing the chart values must be supplied to Helm. The default values.yaml file can be found on the ISO in the chart’s directory. Helm does not require that the complete file be supplied at install time, as any files supplied via the --values command will be merged with the defaults from the chart. This allows the operator to maintain a much simpler configuration file containing only the modified values. Additionally, values may be individually overridden by passing --set key=value to the Helm command. However, this is discouraged for all but temporary cases, as the same arguments must be specified any time the chart is updated.

The default values.yaml file is located on the ISO under the subpath /helm/charts/acd-manager/values.yaml Since the ISO is mounted read-only, you must copy this file to a writable location to make changes. Helm supports multiple --values arguments where all files will be merged left-to-right before being merged with the chart defaults.

Applying the Configuration

After updating the configuration file, you must perform a helm upgrade for the changes to be propagated to the cluster. Helm tracks the changes in each revision, and supports rolling back to previous configurations. During the initial chart installation, the configuration values will be supplied to Helm through the helm install command, but to update an existing installation, the following command line shall be used instead.

helm upgrade acd-manager /mnt/esb3027/helm/charts/acd-manager --values /path/to/values.yaml

Note: Both the helm install and helm upgrade commands take many of the same arguments, and a shortcut exists helm upgrade --install which can be used in place of either, to update an existing installation, or deploy a new installation if one did not previously exist.

If the configuration update was unsuccessful, you can roll back to a previous revision using the following command. Keep in mind, this will not change the values.yaml file on disk, so you must revert the changes to that file manually, or restore the file from a backup.

helm rollback acd-manager <revision_number>

You can view the current revision number of all installed charts with helm list --all

If you wish to temporarily change one or more values, for instance to increase the manager log level from “info” to “debug”, you can do so with the --set command.

helm upgrade acd-manager /mnt/esb3027/helm/charts/acd-manager --values /path/to/values.yaml --set manager.logLevel=debug

It is also possible to split the values.yaml into multiple individual files, for instance to separate manager and metrics values in two files using the following commands. All files will be merged left to right by Helm. Take notice however, that doing this will require all values files to be supplied in the same order any time a helm upgrade is performed in the future.

helm upgrade acd-manager /mnt/esb3027/helm/charts/acd-manager --values /path/to/values1.yaml --values /path/to/values2.yaml

Before applying new configuration, it is recommended to perform a dry-run to ensure that the templates can be rendered properly. This does not guarantee that the templates will be accepted by Kubernetes, only that the templates can be properly rendered using the supplied values. The rendered templates will be output to the console.

helm upgrade ... --dry-run

In the event that the helm upgrade fails to produce the desired results, e.g. if the correct configuration did not propagate to all required pods, simply performing a helm uninstall acd-manager followed by the original helm install command will force all pods to be redeployed. This is service affecting however and should only be performed as a last-resort as all pods will be destroyed and recreated.

Configuration Reference

In this section, we break down the configuration file and look more in-depth into the options available.

Globals

The global section, is a special-case section in Helm, intended for sharing global values between charts. most of the configuration properties here can be ignored, as they are intended as a means of globally providing defaults that affect nested subcharts. The only necessary field here is the hosts configuration.

global:
  hosts:
    manager:
      - host: manager.local
    routers:
      - name: default
        address: 127.0.0.1
    edns_proxy: []
    geoip: []

key	Type	Description
global.hosts.manager	Array	List of external IP addresses or DNS hostnames for all nodes in the Manager cluster
global.hosts.routers	Array	List of ESB3024 AgileTV CDN Director instances
global.hosts.edns_proxy	Array	List of EDNS Proxy addresses
global.hosts.geoip	Array	List of GeoIP Proxy addresses

The global.hosts.manager record contains a list of objects containing a single host field. The first of which is used by several internal services to contact Zitadel for user authentication and authorization. Since Zitadel, which provides these services enforces CORS protections, this must match exacly the Origin used to access Zitadel.

The global.hosts.routers record contains a list of objects each with a name and address field. The name field is a unique identifier used in URLs to refer to the Director instance, and the address field is the IP address or DNS name used to communicate with the Director node. Only Director instances run outside of this cluster need to be specified here, as instances running in Kubernetes can utilize the cluster’s auto- discovery system.

The global.hosts.edns_proxy record contains a list of objects each with an address and port field. This list is currently unused.

The global.hosts.geoip record contains a list of objects each with an address and port field. This list should refer to the GeoIP Proxies used by the Frontend GUI. Currently only one GeoIP proxy is supported.

Common Parameters

This section contains common parameters that are namespaced to the acd-manager chart. These should be left at their default values under most circumstances.

Key	Type	Description
kubeVersion	String	Override the Kubernetes version reported by .Capabilities
apiVersion	String	Override the Kubernetes API version reported by .Capabilities
nameOverride	String	Partially override `common.names.name`
fullnameOverride	String	Fully override `common.names.name`
namespaceOverride	String	Fully override `common.names.namespace`
commonLabels	Object	Labels to add to all deployed objects
commonAnnotations	Object	Annotations to add to all deployed objects
clusterDomain	String	Kubernetes cluster domain name
extraDeploy	Array	List of extra Kubernetes objects to deploy with the release
diagnosticMode.enabled	Boolean	Enable Diagnostic mode (All probes will be disabled and the command will be overridden)
diagnosticMode.command	Array	Override the command when diagnostic mode is enabled
diagnosticMode.args	Array	Override the command line arguments when diagnostic mode is enabled

Manager

This section represents the configuration options for the ACD Manager’s API server.

Key	Type	Description
manager.image.registry	String	The docker registry
manager.image.repository	String	The docker repository
manager.image.tag	String	Override the image tag
manager.image.digest	String	Override a specific image digest
manager.image.pullPolicy	String	The image pull policy
manager.image.pullSecrets	Array	A list of secret names containing credentials for the configured image registry
manager.image.debug	boolean	Enable debug mode for the containers
manager.logLevel	String	Set the log level used in the containers
manager.replicaCount	Number	Number of manager replicas to deploy. This value is ignored if the Horizontal Pod Autoscaler is enabled
manager.containerPorts.http	Number	Port number exposed by the container for HTTP traffic
manager.extraContainerPorts	Array	List of additional container ports to expose
manager.livenessProbe	Object	Configuration for the liveness probe on the manager container
manager.readinessProbe	Object	Configuration for the readiness probe on the manager container
manager.startupProbe	Object	Configuration for the startup probe on the manager container
manager.customLivenessProbe	Object	Override the default liveness probe
manager.customReadinessProbe	Object	Override the default readiness probe
manager.customStartupProbe	Object	Override the default startup probe
manager.resourcePreset	String	Set the manager resources according to one common preset
manager.resources	Object	Set request and limits for different resources like CPU or memory
manager.podSecurityContext	Object	Set the security context for the manager pods
manager.containerSecurityContext	Object	Set the security context for all containers inside the manager pods
manager.maxmindDbVolume	String	Name of a Kubernetes volume containing Maxmind GeoIP, ASN, and Anonymous IP databases
manager.existingConfigmap	String	Reserved for future use
manager.command	Array	Command executed inside the manager container
manager.args	Array	Arguments passed to the command
manager.automountServiceAccountToken	Boolean	Mount Service Account token in manager pods
manager.hostAliases	Array	Add additional entries to /etc/hosts in the pod
manager.deploymentAnnotations	Object	Annotations for the manager deployment
manager.podLabels	Object	Extra labels for manager pods
manager.podAnnotations	Object	Extra annotations for the manager pods
manager.podAffinityPreset	String	Allowed values `soft` or `hard`
manager.podAntiAffinityPreset	String	Allowed values `soft` or `hard`
manager.nodeAffinityPreset.type	String	Allowed values `soft` or `hard
manager.nodeAffinityPreset.key	String	Node label key to match
manager.nodeAffinityPreset.values	Array	List of node labels to match
manager.affinity	Object	Override the affinity for pod assignments
manager.nodeSelector	Object	Node labels for manager pod assignments
manager.tolerations	Array	Tolerations for manager pod assignment
manager.updateStrategy.type	String	Can be set to `RollingUpdate` or `Recreate`
manager.priorityClassName	String	Manager pods’ priorityClassName
manager.topologySpreadConstraints	Array	Topology Spread Constraints for manager pod assignment spread across the cluster among failure-domains
manager.schedulerName	String	Name of the Kubernetes scheduler for manager pods
manager.terminationGracePeriodSeconds	Number	Seconds manager pods need to terminate gracefully
manager.lifecycleHooks	Object	Lifecycle Hooks for manager containers to automate configuration before or after startup
manager.extraEnvVars	Array	List of extra environment variables to add to the manager containers
manager.extraEnvVarsCM	Array	List of Config Maps containing extra environment variables to pass to the Manager pods
manager.extraEnvVarsSecret	Array	List of Secrets containing extra environment variables to pass to the Manager pods
manager.extraVolumes	Array	Optionally specify extra list of additional volumes for the manager pods
manager.extraVolumeMounts	Array	Optionally specify extra list of additional volume mounts for the manager pods
manager.sidecars	Array	Add additional sidecar containers to the manager pods
manager.initContainers	Array	Add additional init containers to the manager pods
manager.pdb.create	Boolean	Enable / disable a Pod Disruption Budget creation
manager.pdb.minAvailable	Number	Minimum number/precentage of pods that should remain scheduled
manager.pdb.maxUnavailable	Number	Maximum number/percentage of pods that may be made unavailable
manager.autoscaling.vpa	Object	Vertical Pod Autoscaler Configuration. Not used for self-hosted clusters
manager.autoscaling.hpa	Object	Horizontal Pod Autoscaler. Automatically scale the number of replicas based on resource utilization

Gateway

The parameters under the gateway namespace are mostly identical to those of the manager section above, but which affect the NGinx Proxy Gateway service. The additional properites here are described in the following table.

Key	Type	Description
gateway.service.type	String	Service Type
gateway.service.ports.http	Number	The service port
gateway.service.nodePorts	Object	Allows configuring the exposed node port if the `service.type` is “NodePort”
gateway.service.clusterIP	String	Override the ClusterIP address if the `service.type` is “ClusterIP”
gateway.service.loadBalancerIP	String	Override the LoadBalancer IP address if the `service.type` is “LoadBalancer”
gateway.service.loadBalancerSourceRanges	Array	Source CIDRs for the LoadBalancer
gateway.service.externalTrafficPolicy	String	External Traffic Policy for the service
gateway.service.annotations	Object	Additional custom annotations for the manager service
gateway.service.extraPorts	Array	Extra ports to expose in the manager service. (Normally used with the `sidecar` value)
gateway.service.sessionAffinity	String	Control where client requests go, to the same pod or round-robin
gateway.service.sessionAffinityConfig	Object	Additional settings for the sessionAffinity

Selection Input

The parameters under the selectionInput namespace are mostly identical to those of the manager section above, but which affect the selection input consumer service. The additional properties here are described in the following table.

Key	Type	Description
selectionInput.kafkaTopic	String	Name of the selection input kafka topic

Metrics Aggregator

The parameters under the metricsAggregator namespace are mostly identical to those of the manager section above, but which affect the metrics aggregator service.

Traffic Exposure

These parameters determine how the various services are exposed over the network.

Key	Type	Description
service.type	String	Service Type
service.ports.http	Number	The service port
service.nodePorts	Object	Allows configuring the exposed node port if the `service.type` is “NodePort”
service.clusterIP	String	Override the ClusterIP address if the `service.type` is “ClusterIP”
service.loadBalancerIP	String	Override the LoadBalancer IP address if the `service.type` is “LoadBalancer”
service.loadBalancerSourceRanges	Array	Source CIDRs for the LoadBalancer
service.externalTrafficPolicy	String	External Traffic Policy for the service
service.annotations	Object	Additional custom annotations for the manager service
service.extraPorts	Array	Extra ports to expose in the manager service. (Normally used with the `sidecar` value)
service.sessionAffinity	String	Control where client requests go, to the same pod or round-robin
service.sessionAffinityConfig	Object	Additional settings for the sessionAffinity
networkPolicy.enabled	Boolean	Specifies whether a NetworkPolicy should be created
networkPolicy.allowExternal	Boolean	Doesn’t require server labels for connections
networkPolicy.allowExternalEgress	Boolean	Allow the pod to access any range of port and all destinations
networkPolicy.allowExternalClientAccess	Boolean	Allow access from pods with client label set to “true”
networkPolicy.extraIngress	Array	Add extra ingress rules to the Network Policy
networkPolicy.extraEgress	Array	Add extra egress rules to the Network Policy
networkPolicy.ingressPodMatchLabels	Object	Labels to match to allow traffic from other pods.
networkPolicy.ingressNSMatchLabels	Object	Labels to match to allow traffic from other namespaces.
networkPolicy.ingressNSPodMatchLabels	Object	Pod labels to match to allow traffic from other namespaces.
ingress.enabled	Boolean	Enable the ingress record generation for the manager
ingress.pathType	String	Ingress Path Type
ingress.apiVersion	String	Force Ingress API version
ingress.hostname	String	Match HOST header for the ingress record
ingress.ingressClassName	String	Ingress Class that will be used to implement the Ingress
ingress.path	String	Default path for the Ingress record
ingress.annotations	Object	Additional annotations for the Ingress resource.
ingress.tls	Boolean	Enable TLS configuration for the host defined at `ingress.hostname`
ingress.selfSigned	Boolean	Create a TLS secret for this ingress record using self-signed certificates generated by Helm
ingress.extraHosts	Array	An array with additional hostnames to be covered by the Ingress record.
ingress.extraPaths	Array	An array of extra path entries to be covered by the Ingress record.
ingress.extraTls	Array	TLS configuration for additional hostnames to be covered with this Ingress record.
ingress.secrets	Array	Custom TLS certificates as secrets
ingress.extraRules	Array	Additional rules to be covered with this Ingress record.

Persistence

The following values control how persistent storage is used by the manager. Currently these have no effect as the Manager does not use any persistent volume claims, however they are documented here as the same properties are used in several subcontainers to configure persistence.

Key	Type	Description
persistence.enabled	Boolean	Enable persistence using Persistent Volume Claims
persistence.mountPath	String	Path where to mount the volume
persistence.subPath	String	The subdirectory of the volume to mount
persistence.storageClass	String	Storage class of backing Persistent Volume Claim
persistence.annotations	Object	Persistent Volume Claim annotations
persistence.accessModes	Array	Persistent Volume Access Modes
persistence.size	String	Size of the data volume
persistence.dataSource	Object	Custom PVC data source
persistence.existingClaim	String	The name of an existing PVC to use for persistence
persistence.selector	Object	Selector to match existing Persistent Volume for data PVC

Other Values

The following are additional parameters for the chart.

Key	Type	Description
defaultInitContainers	Object	Configuration for default init containers.
rbac.create	Boolean	Specifies whether Role-Based Access Control Resources should be created.
rbac.rules	Object	Custom RBAC rules to apply
serviceAccount.create	Boolean	Specifies whether a ServiceAccount should be created
serviceAccount.name	String	Override the ServiceAccount name. If not set, a name will be generated automatically.
serviceAccount.annotations	Object	Additional Service Account annotations (evaluated as a template)
serviceAccount.automountServiceAccountToken	Boolean	Automount the service account token for the service account.
metrics.enabled	Boolean	Enable the export of Prometheus metrics. Not currently implemented
metrics.serviceMonitor.enabled	Boolean	If true, creates a Prometheus Operator ServiceMonitor
metrics.serviceMonitor.namespace	String	Namespace in which Prometheus is running
metrics.serviceMonitor.annotations	Object	Additional custom annotations for the ServiceMonitor
metrics.serviceMonitor.labels	Object	Extra labels for the ServiceMonitor
metrics.serviceMonitor.jobLabel	String	The name of the label on the target service to use as the job name in Prometheus
metrics.serviceMonitor.honorLabels	Boolean	Chooses the metric’s labels on collisions with target labels
metrics.serviceMonitor.tlsConfig	Object	TLS configuration used for scrape endpoints used by Prometheus
metrics.serviceMonitor.interval	Number	Interval at which metrics should be scraped.
metrics.serviceMonitor.scrapeTimeout	Number	Timeout after which the scrape is ended.
metrics.serviceMonitor.metricRelabelings	Array	Specify additional relabeling of metrics.
metrics.serviceMonitor.relabelings	Array	Specify general relabeling
metrics.serviceMonitor.selector	Object	Prometheus instance selector labels

Sub-components

Confd

Key	Type	Description
confd.enabled	Boolean	Enable the embedded Confd instance
confd.service.ports.internal.	Number	Port number to use for internal communication with the Confd TCP socket

MIB Frontend

There are many additional properties that can be configured for the MIB Frontend service which are not specified in the configuration file. The mib-frontend helm Chart follows the same basic template as the acd-manager chart so documenting them all here would be unnecessarily repeatative. Virtually every property in this chart can be configured under the mib-frontend namespace and be valid.

Key	Type	Description
mib-frontend.enabled	Boolean	Enable the Configuration GUI
mib-frontend.frontend.resourcePreset	String	Use a preset resource configuration.
mib-frontend.frontend.resources	Object	Use custom resource configuration.
mib-frontend.frontend.autoscaling.hpa	Object	Horizontal Pod Autoscaler configuration for MIB Frontend component

ACD Metrics

There are many additional properties that can be configured for the ACD metrics service which are not specified in the configuration file. The acd-metrics helm Chart follows the same basic template as the acd-manager chart, as do each of its subcharts. Documenting them all here would mostly be unnecessarily repeatative. Virtually any property in this chart can be configured under the acd-metrics namespace and be valid. For example, setting the resource preset for grafana can be achieved by setting acd-metrics.grafana.resourcePreset etc.

Key	Type	Description
acd-metrics.enabled	Boolean	Enable the ACD Metrics components
acd-metrics.telegraf.enabled	Boolean	Enable the Telegraf Database component
acd-metrics.prometheus.enabled	Boolean	Enable the Prometheus Service Instance
acd-metrics.grafana.enabled	Boolean	Enable the Grafana Service Instance
acd-metrics.victoria-metrics-single.enabled	Boolean	Enable Victoria Metrics Service instance

Zitadel

Zitadel does not follow the same template as many of the other services. Below is a list of Zitadel specific properties.

Key	Type	Description
zitadel.enabled	Boolean	Enable the Zitadel instance
zitadel.replicaCount	Number	Number of replicas in the Zitadel deployment
zitadel.image.repository	String	The full name of the image registry and repository for the Zitadel container
zitadel.setupJob	Object	Configuration for the initial setup job to configure the database
zitadel.zitadel.masterkeySecretName	String	The name of an existing Kubernetes secret containing the Zitadel Masterkey
zitadel.zitadel.configmapConfig	Object	The Zitadel configuration. See Configuration Options in ZITADEL
zitadel.zitadel.configmapConfig.ExternalDomain	String	The external domain name or IP address to which all requests must be made.
zitadel.service	Ojbect	Service configuration options for Zitadel
zitadel.ingress	Object	Traffic exposure parameters for Zitadel

The zitadel.zitadel.configmapConfig.ExternalDomain MUST be configured with the same value used as the first entry in in global.hosts.manager. Cross-Origin Resource Sharing (CORS) is enforced with Zitadel, and only this origin specified here will be allowed to be used to access Zitadel. The first entry in the global.hosts.manager Array will be used by internal services, and if this does not match, authentication requests will not be accepted.

For example, if the global.hosts.manager entries look like this:

global:
  hosts:
    manager:
      - host: foo.example.com
      - host: bar.example.com

The Zitadel ExternalDomain must be set to foo.example.com, and all requests to Zitadel must use foo.example.com. e.g https://foo.example.com/ui/console. Requests made to bar.example.com will result in HTTP 404 errors.

Redis and Kafka

Both the redis and kafka subcharts follow the same basic structure as the acd-manager chart, and the configurable values in each are nearly identical. Documenting the configuration of these charts here would be unnecessarily redundant. However, the operator may wish to adjust the resource configuration for these charts at the following locations:

Key	Type	Description
redis.master.resources	Object	Resource configuration for the Redis master instance
redis.replica.resources	Object	Resource configuration for the Redis read-only replica instances
redis.replica.replicaCount	Number	Number of Read-only Redis replica instances
kafka.controller.resources	Object	Resource configuration for the Kafka controller
kafka.controller.replicaCount	Number	Number of Kafka controller replica instances to deploy

Resource Configuration

All resource configuration blocks follow the same basic schema which is defined here.

Key	Type	Description
resources.limits.cpu	String	The maximum CPU which can be consumed before the Pod is terminated.
resources.limits.memory	String	The maximum amount of memory the pod may consume before being killed.
resources.limits.ephemeral-storage	String	The maximum amount of storage a pod may consume
resources.requests.cpu	String	The minimum available CPU cores for each Pod to be assigned to a node.
resources.requests.memory	String	The minimum available Free Memory on a node for a pod to be assigned.
resources.requests.ephemeral-storage	String	The minimum amount of storage a pod requires to be assigned to a node.

CPU values are specified in units of 1/1000 of a CPU e.g. “1000m” represents 1 core, “250m” is 1/4 of 1 core. Memory and Storage values are specified with the SI suffix, e.g. “250Mi” is 250MB, “3Gi” is 3GB, etc.

Most services also include a resourcePreset value which is a simple String representing some common configurations.

The presets are as follows:

Preset	Request CPU	Request Memory	Request Storage	Limit CPU	Limit Memory	Limit Storage
nano	100m	128Mi	50Mi	150m	192Mi	2Gi
micro	250m	256Mi	50Mi	375m	384Mi	2Gi
small	500m	512Mi	50Mi	750m	768Mi	2Gi
medium	500m	1024Mi	50Mi	750m	1536Mi	2Gi
large	1.0	2048Mi	50Mi	1.5	3072Mi	2Gi
xlarge	1.0	3072Mi	50Mi	3.0	6144Mi	2Gi
2xlarge	1.0	3072Mi	50Mi	6.0	12288Mi	2Gi

When considering the resource requests vs. limits, the request values should represent the minimum resource usage necessary to run the service, while the limits represent the maximum resources each pod in the deployment will be allowed to consume. The resource request and limits are per pod, so a service using “large” presets with 3 replicas will need a minimum of 3 full cores, and 6GB of available memory to start and may consume up to a maximum of 4.5 Cores and 9GB of memory across all nodes in the cluster.

Security Contexts

Most charts used in the deployment contain configuration for both Pod and Container security contexts. Below is additional information about the parameters there-in.

Key	Type	Description
podSecurityContext.enabled	Boolean	Enable the Pod Security Context
podSecurityContext.fsGroupChangePolicy	String	Set filesystem group change policy for the nodes
podSecurityContext.sysctls	Array	Set kernel settings using sysctl interface for the pods
podSecurityContext.supplementalGroups	Array	Set filesystem extra groups for the pods
podSecurityContext.fsGroup	Number	Set Filesystem Group ID for the pods
containerSecurityContext.enabled	Boolean	Enable the container security context
containerSecurityContext.seLinuxOptions	Object	Set SELinux options for each container in the Pod
containerSecurityContext.runAsUser	Number	Set runAsUser in the containers Security Context
containerSecurityContext.runAsGroup	Number	Set runAsGroup in the containers Security Context
containerSecurityContext.runAsNonRoot	Boolean	Set runAsNonRoot in the containers Security Context
containerSecurityContext.readOnlyRootFilesystem	Boolean	Set readOnlyRootFilesystem in the containers Security Context
containerSecurityContext.privileged	Boolean	Set privileged in the container Security Context
containerSecurityContext.allowPrivilegeEscalation	Boolean	Set allowPrivilegeEscalation in the container’s security context
containerSecurityContext.capabilities.drop	Array	List of capabilities to be dropped in the container
containerSecurityContext.seccompProfile.type	String	Set seccomp profile in the container

Probe Configuration

Each Pod uses healthcheck probes to determine the readiness of the pod. Three probe types are defined. startupProbe, readinessProbe, and livenessProbe. They all contain exactly the same configuration options, the only difference between the probe types is when they are executed.

Liveness Probe: Checks if the container is running. If this probe fails, Kubernetes restarts the container, assuming it is stuck or unhealthy.
Readiness Probe: Determines if the container is ready to accept traffic. If it fails, the container is removed from the service load balancer until it becomes ready again.
Startup Probe: Used during container startup to determine if the application has started successfully. It helps to prevent the liveness probe from killing a container that is still starting up.
The following table describes each of these properties:

Property	Description
enabled	Determines whether the probe is active (`true`) or disabled (`false`).
initialDelaySeconds	Time in seconds to wait after the container starts before performing the first probe.
periodSeconds	How often (in seconds) to perform the probe.
timeoutSeconds	Number of seconds to wait for a probe response before considering it a failure.
failureThreshold	Number of consecutive failed probes before considering the container unhealthy (for liveness) or unavailable (for readiness).
successThreshold	Number of consecutive successful probes required to consider the container healthy or ready (usually 1).
httpGet	Specifies that the probe performs an HTTP GET request to check container health.
httpGet.path	The URL path to request during the HTTP GET probe.
httpGet.port	The port number or name where the HTTP GET request is sent.
exec	Specifies that the probe runs the specified command inside the container and expects a successful exit code to indicate health.
exec.command	An array of strings representing the command to run

Only one of httpGet or exec may be specified in a single probe. These configurations are mutually exclusive.

7 - Networking

Network and Firewall requirements

Port Usage

The following table describes the minimal firewall setup required between each node in the cluster for the Kubernetes cluster to function properly. Unless otherwise specified, these rules must allow traffic to pass between any nodes in the cluster.

Protocol	Port	Source	Destination	Description
TCP	2379-2380	Server	Server	Etcd Service
TCP	6443	Any	Server	K3s Supervisor and Kubernetes API Server
UDP	8472	Any	Any	Flannel VXLAN
TCP	10250	Any	Any	Kubelet Metrics
TCP	5001	Any	Server	Spegel Registry Mirror
TCP	9500	Any	Any	Longhorn Management API
TCP	8500	Any	Any	Longhorn Agent
Any	N/A	10.42.0.0/16	Any	K3s Pods
Any	N/A	10.43.0.0/16	Any	K3s Services
TCP	80	Any	Any	Optional Ingress HTTP traffic
TCP	443	Any	Any	Ingress HTTPS Traffic

The following table describes the required ports which must be allowed through any firewalls for the manager application. Access to these ports must be allowed from any client which requires access to these services towards any node in the cluster.

Protocol	Port	Description
TCP	443	Ingress HTTPS Traffic
TCP	3000	Grafana
TCP	9095	Kafka
TCP	9093	Alertmanager
TCP	9090	Prometheus
TCP	6379	Redis

Note: Port 443 is duplicated in both of the above tables. Port 443 is used by the internal applications running within the cluster to access Zitadel so all nodes in the cluster must have access to that port, and it’s also used to provide ingress services from outside the cluster for multiple applications.

Firewall Rules

What follows is an example script that can be used to open the required ports using firewalld. Adjust the commands as necessary to fit the environment.

# Allow Kubernetes cluster ports (between nodes)
firewall-cmd --permanent --add-port=2379-2380/tcp
firewall-cmd --permanent --add-port=6443/tcp
firewall-cmd --permanent --add-port=8472/udp
firewall-cmd --permanent --add-port=10250/tcp
firewall-cmd --permanent --add-port=5001/tcp
firewall-cmd --permanent --add-port=9500/tcp
firewall-cmd --permanent --add-port=8500/tcp
# Allow all traffic from specific subnets for K3s pods/services
firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="10.42.0.0/16" accept'
firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="10.43.0.0/16" accept'
# Allow optional ingress HTTP/HTTPS traffic
firewall-cmd --permanent --add-port=80/tcp
firewall-cmd --permanent --add-port=443/tcp

# Allow ports for the manager application (from anywhere)
firewall-cmd --permanent --add-port=443/tcp
firewall-cmd --permanent --add-port=3000/tcp
firewall-cmd --permanent --add-port=9095/tcp
firewall-cmd --permanent --add-port=9093/tcp
firewall-cmd --permanent --add-port=9090/tcp
firewall-cmd --permanent --add-port=6379/tcp

# Reload firewalld to apply changes
firewall-cmd --reload

IP Routing

Proper IP routing is critical for cluster communication. The network must allow nodes to route traffic to each other’s pod CIDRs (e.g., 10.42.0.0/16, 10.43.0.0/16) and external clients to reach ingress and services. Verify that your network infrastructure permits routing between these subnets; otherwise, nodes may not communicate properly, impacting cluster functionality.

Handling Multiple Zones with Kubernetes Interfaces

Kubernetes creates virtual network interfaces for pods within the node’s network namespace. These interfaces are typically not associated with any specific firewalld zone by default. Firewalld applies rules to the primary physical interface (such as eth0), not directly to the pod interfaces.

8 - Storage Guide

Working with Longhorn Storage

Overview

Longhorn is an open-source distributed block storage system designed specifically for Kubernetes. It provides persistent storage for stateful applications by creating and managing storage volumes that are replicated across multiple nodes to ensure high availability. Longhorn integrates seamlessly with Kubernetes, allowing users to dynamically provision, attach, and manage persistent disks through standard Kubernetes PersistentVolumeClaims (PVCs).

Longhorn deploys a set of controller and replica engines as containers on each node, forming a distributed storage system. When a volume is created, Longhorn replicates data across multiple nodes, ensuring durability even in the event of node failures. The system also handles snapshots, backups, and restores, offering robust data protection. Kubernetes automatically mounts these volumes into Pods, providing persistent storage for stateful applications to operate reliably.

graph TD
    subgraph Cluster Nodes
        Node1["Node 1"]
        Node2["Node 2"]
        Node3["Node 3"]
    end

    subgraph Longhorn Components
        Controller["Longhorn Controller"]
        Replica1["Replica (Node 1)"]
        Replica2["Replica (Node 2)"]
        Replica3["Replica (Node 3)"]
    end

    subgraph Storage Volume
        Volume["Persistent Volume"]
    end

    Node1 -->|Runs| Replica1
    Node2 -->|Runs| Replica2
    Node3 -->|Runs| Replica3

    Controller -->|Manages| Volume
    Replica1 & Replica2 & Replica3 -->|Replicate Data| Volume

Accessing the configuration GUI

Longhorn provides a web-based frontend for managing storage configurations across the Kubernetes cluster. This UI allows users to configure various aspects of the storage engine, such as the number of replicas, backup settings, snapshot management, and more.

Since this frontend does not include any authentication mechanisms and improper use could lead to significant data loss, access is restricted. To securely access the UI, a manual port-forward must be established.

You can set up a temporary connection to the Longhorn frontend using the following kubectl port-forward command:

kubectl port-forward -n longhorn-system --address 0.0.0.0 svc/longhorn-frontend 8888:80

This command forwards local port 8888 to the Longhorn frontend service in the cluster. You can then access the UI by navigating to:

http://k3s-server:8888

This connection remains active as long as the port-forward command is running. To stop it, simply press Ctrl+C. Make sure to run this command only when needed, and avoid leaving the UI accessible without proper authentication.

9 - Metrics and Monitoring

“Monitoring the CDN”

The ESB3027 AgileTV CDN Manager includes a built-in metrics and monitoring solution based on Telegraf, Prometheus, and Grafana. A set of default Grafana dashboards provides visibility into CDN performance, displaying host metrics such as CPU, memory, network, and disk utilization—collected from the Director and Cache nodes via Telegraf—as well as streaming metrics from each Director instance. These metrics are stored in a Time-Series Database and visualized through Grafana dashboards. Additionally, the system supports custom dashboards using Prometheus as a data source, offering flexibility for customers to monitor all aspects of the CDN according to their specific needs.

Accessing Grafana

To access Grafana, point a browser towards any node in the cluster on port 3000. e.g. http://manager.local:3000/ and log in using the default administrator account credentials listed below.

Known Limitation: Grafana does not currently support Single-Sign-On (SSO) using Zitadel accounts.

Username: admin
Password: edgeware

On the left column, click Dashboards and select the Dashboard you wish to view.

Custom Dashboards

The grafana instance uses persistent storage within the cluster for data storage. Any custom dashboards or modifications to existing dashboards will be saved in the persistent storage volume, and will persist across software upgrades.

Billing and Licensing

A separate VictoriaMetrics Time-Series Database is included within the metrics component of the manager. It periodically scrapes usage data from Prometheus to calculate aggregated statistics and verify license compliance. This data is retained for at least one year. Grafana can also use this database as a source to display long-term usage metrics.

10 - Operations Guide

How to operate ESB3027 Agiletv CDN Manager

Overview

This guide details some of the common commands that will be necessary to operate the ESB3027 AgileTV CDN Manager software. Before starting, you will need at least a basic understanding of the following command line tooling.

Getting and Describing Kubernetes Resources

The two most common commands in Kubernetes are get and describe for a specific resource such as a Pod or Service. Using kubectl get typically lists all resources of a particular type; for example, kubectl get pods will display all pods in the current namespace. To obtain more detailed information about a specific resource, use kubectl describe <resource>, such as kubectl describe pod postgresql-0 to view details about that particular pod.

When describing a pod, the output includes a recent Event history at the bottom. This can be extremely helpful for troubleshooting issues, such as why a pod failed to deploy or was restarted. However, keep in mind that this event history only reflects the most recent events from the past few hours, so it may not provide insights into problems that occurred days or weeks ago.

Obtaining Logs

Each Pod maintains its own logs for each container. To fetch the logs of a specific pod, use kubectl logs <pod_name>. Adding the -f flag will stream the logs in follow mode, allowing real-time monitoring. If a pod contains multiple containers, by default, only the logs from the primary container are shown. To view logs from a different container within the same pod, use the -c <container_name> flag.

Since each pod maintains its own logs, retrieving logs from all replicas of a Deployment or StatefulSet may be necessary to get a complete view. You can use label selectors to collect logs from all pods associated with the same application. For example, to fetch logs from all pods belonging to the “acd-manager” deployment, run:

kubectl logs -l app.kubernetes.io/name=acd-manager

To find the labels associated with a specific Deployment or ReplicaSet, describe the resource and look for the “Labels” field.

The following table describes the common labels currently used by deployments in the cluster.

Component Labels

Label (key=value)	Description
app.kubernetes.io/component=manager	Identifies the ACD Manager service
app.kubernetes.io/component=confd	Identifies the confd service
app.kubernetes.io/component=frontend	Identifies the GUI (frontend) service
app.kubernetes.io/component=gateway	Identifies the API gateway service
app.kubernetes.io/component=grafana	Identifies the Grafana monitoring service
app.kubernetes.io/component=metrics-aggregator	Identifies the metrics aggregator service
app.kubernetes.io/component=mib-frontend	Identifies the MIB frontend service
app.kubernetes.io/component=server	Identifies the Prometheus server component
app.kubernetes.io/component=selection-input	Identifies the selection input service
app.kubernetes.io/component=start	Identifies the Zitadel startup/init component
app.kubernetes.io/component=primary	Identifies the PostgreSQL primary node
app.kubernetes.io/component=controller-eligible	Identifies the Kafka controller-eligible node
app.kubernetes.io/component=alertmanager	Identifies the Prometheus Alertmanager
app.kubernetes.io/component=master	Identifies the Redis master node
app.kubernetes.io/component=replica	Identifies the Redis replica node

Instance, Name, and Part-of Labels

Label (key=value)	Description
app.kubernetes.io/instance=acd-manager	Helm release instance name (acd-manager)
app.kubernetes.io/instance=acd-cluster	Helm release instance name (acd-cluster)
app.kubernetes.io/name=acd-manager	Resource name: acd-manager
app.kubernetes.io/name=confd	Resource name: confd
app.kubernetes.io/name=grafana	Resource name: grafana
app.kubernetes.io/name=mib-frontend	Resource name: mib-frontend
app.kubernetes.io/name=prometheus	Resource name: prometheus
app.kubernetes.io/name=telegraf	Resource name: telegraf
app.kubernetes.io/name=zitadel	Resource name: zitadel
app.kubernetes.io/name=postgresql	Resource name: postgresql
app.kubernetes.io/name=kafka	Resource name: kafka
app.kubernetes.io/name=redis	Resource name: redis
app.kubernetes.io/name=victoria-metrics-single	Resource name: victoria-metrics-single
app.kubernetes.io/part-of=prometheus	Part of the Prometheus stack
app.kubernetes.io/part-of=kafka	Part of the Kafka stack

Restarting a Pod

Since Kubernetes maintains a fixed number of replicas for each Deployment or ReplicaSet, deleting a pod will cause Kubernetes to immediately recreate it, effectively restarting the pod. For example, to restart the pod acd-manager-6c85ddd747-5j5gt, run:

kubectl delete pod acd-manager-6c85ddd747-5j5gt

Kubernetes will automatically detach that pod from any associated Service, preventing new connections from reaching it. It then spawns a new instance, which goes through startup, liveness, and readiness probes. Once the new pod passes the readiness probes and is marked as ready, the Service will start forwarding new traffic to it.

If multiple replicas are running, traffic will be distributed among the existing pods while the new pod is initializing, ensuring a seamless, zero-downtime operation.

Stopping and Starting a Deployment

Unlike traditional services, Kubernetes does not have a concept of stopping a service directly. Instead, you can temporarily scale a Deployment to zero replicas, which has the same effect.

For example, to stop the acd-manager Deployment, run:

kubectl scale deployment acd-manager --replicas=0

To restart it later, scale the deployment back to its original number of replicas, e.g.,

kubectl scale deployment acd-manager --replicas=1

If you want to perform a simple restart of all pods within a deployment, you can delete all pods with a specific label, and Kubernetes will automatically recreate them. For example, to restart all pods with the component label “manager,” use:

kubectl delete pod -l app.kubernetes.io/component=manager

This command causes Kubernetes to delete all matching pods, which are then recreated, effectively restarting the service without changing the deployment configuration.

Running command inside a pod

Sometimes it is necessary to run a command inside an existing Pod such as obtaining a bash shell.

Using the kubectl exec -it <podname> -- <command> can be used to do just that. Assuming we need to run the confcli tool inside the confd pod acd-manager-confd-558f49ffb5-n8dmr that can be accomplished using the following command:

kubectl exec -it acd-manager-confd-558f49ffb5-n8dmr -- /usr/bin/python3.11 /usr/local/bin/confcli

Note: The confd container does not have a shell, so specifying the python interpreter is necessary on this image.

Monitoring resource usage

Kubernetes includes an internal metrics API which can give some insight into the resource usage of the Pods and of the Nodes.

To list the current usage of the Pods in the cluster issue the following:

kubectl top pods

This will give output similar to the following:

NAME                                             CPU(cores)   MEMORY(bytes)
acd-cluster-postgresql-0                         3m           44Mi
acd-manager-6c85ddd747-rdlg6                     4m           15Mi
acd-manager-confd-558f49ffb5-n8dmr               1m           47Mi
acd-manager-gateway-7594479477-z4bbr             0m           10Mi
acd-manager-grafana-78c76d8c5-c2tl6              18m          144Mi
acd-manager-kafka-controller-0                   19m          763Mi
acd-manager-kafka-controller-1                   19m          967Mi
acd-manager-kafka-controller-2                   25m          1127Mi
acd-manager-metrics-aggregator-f6ff99654-tjbfs   4m           2Mi
acd-manager-mib-frontend-67678c69df-tkklr        1m           26Mi
acd-manager-prometheus-alertmanager-0            2m           25Mi
acd-manager-prometheus-server-768f5d5c-q78xb     5m           53Mi
acd-manager-redis-master-0                       12m          18Mi
acd-manager-redis-replicas-0                     15m          14Mi
acd-manager-selection-input-844599bc4d-x7dct     3m           3Mi
acd-manager-telegraf-585dfc5ff8-n8m5c            1m           27Mi
acd-manager-victoria-metrics-single-server-0     2m           10Mi
acd-manager-zitadel-69b6546f8f-v9lkp             1m           76Mi
acd-manager-zitadel-69b6546f8f-wwcmx             1m           72Mi

Querying the metrics API for the nodes gives the aggregated totals for each node:

kubectl top nodes

Yields output similar to the following:

NAME                 CPU(cores)   CPU(%)   MEMORY(bytes)   MEMORY(%)
k3d-local-agent-0    118m         0%       1698Mi          21%
k3d-local-agent-1    120m         0%       661Mi           8%
k3d-local-agent-2    84m          0%       1054Mi          13%
k3d-local-server-0   115m         0%       1959Mi          25%

Taking a node out of service

To temporarily take a node out of service for maintenance, you can do so with minimal downtime, provided there are enough resources on other nodes in the cluster to handle the pods from the target node.

Step 1: Cordon the node.
This prevents new pods from being scheduled on the node:

kubectl cordon <node-name>

Step 2: Drain the node.
This moves existing pods off the node, respecting DaemonSets and local data:

kubectl drain <node-name> --ignore-daemonsets --delete-local-data

The --ignore-daemonsets flag skips DaemonSet-managed pods, which are typically managed separately.
The --delete-local-data flag removes any local ephemeral data stored on the node.

Once drained, the node is effectively out of service.

To bring the node back into service:
Uncordon the node with:

kubectl uncordon <node-name>

This allows Kubernetes to schedule new pods on the node. It won’t automatically move existing pods back; you may need to manually restart or reschedule pods if desired. Since the node now has more available resources, Kubernetes will attempt to schedule new pods there to balance the load across the cluster.

Backup and restore of persistent volumes

The Longhorn storage driver, which provides the persistent storage used in the cluster, (See the Storage Guide for more details) provides built-in mechanisms for backup, restore, and snapshotting volumes. This can be performed entirely from within the Longhorn WebUI. See the relevant section of the Storage Guide for details on accessing that UI, since it requires setting up a port forward, which is described there.

See the relevant Longhorn Documentation for how to configure Longhorn and to manage Snapshotting and Backup and Restore.

11 - Post Installation Guide

Steps to take after installation

After installing the cluster, there are a few steps that should be taken to complete the setup.

Create an Admin User

The ESB3027 AgileTV CDN Manager ships with a default user account, but this account is only intended as a way to log in and create an actual user. Attempting to authenticate other services such as the MIB Frontend Configuration GUI, may not work using this pre-provisioned account.

You will need the IP address or DNS name specified in the configuration as both the first manager host and the Zitadel External Domain.

global:
  hosts:
    manager:
      - host: manager.local

Using a web browser, connect to the following URL, replacing manager.local with the IP or DNS name from the configuration above:

https://manager.local/ui/console

You must authenticate using the default credentials:

Username: admin@agiletv.dev
Password: Password1!

It will ask you to set up Multi-Factor Authentication, however you MUST skip this step for now, as it is not currently supported everywhere in the manager’s APIs.

On the menu bar at the top of the screen, click “Users” and proceed to create a New User. Enter the required information, and for now, ensure the “Email Verified” and “Set Initial Password” boxes are checked. Zitadel will attempt to send a confirmation EMail if the “Email Verified” box is not checked, however on initial installation, the SMTP server details have not been configured.

You should now be able to authenticate to the MIB Frontend GUI at https://manager.local/gui using the credentials for the new user.

Configure an SMTP Server

Zitadel requires an SMTP server to be configured in order to send validation emails and support communication with users for password resets, etc. If you have an SMTP server, you can configure it by logging back into the Zitadel Web UI at https://manager.local/ui/console, clicking on “Default Settings” at the top of the page, and configuring the SMTP provider from the menu on the left. After this has been performed, if a new user account is created, an E-Mail will be sent to the configured E-Mail address with a verification link, which must be clicked before the account will be valid.

12 - Releases

ESB3027 AgileTV CDN Manager releases

12.1 - Release esb3027-1.4.0

Build date

2025-10-23

Release status

Type: production

Included components

ACD Configuration GUI 2.3.9

Compatibility

This release has been tested with the following product versions:

AgileTV CDN Director, ESB3024-1.22.0

Breaking changes from previous release

A full installation is required for this version
If the field confd.confd.image.tag is set in the present configuration file it must be removed or updated before upgrading

Change log

NEW: Monitoring and Metrics support [ESB3027-17]
NEW: Support for horizontal scaling [ESB3027-63]
NEW: Deploy GUI container with Manager [ESB3027-67]
NEW: Support Kafka redundancy [ESB3027-125]
NEW: Support for Redis high availability [ESB3027-126]
NEW: Add Prometheus Container [ESB3027-130]
NEW: Add Grafana Container [ESB3027-131]
NEW: External DNS Name configuration should be global [ESB3027-180]
NEW: Deploy hardware metrics services acd-metrics-aggregator and acd-telegraf-metrics-database in k8s cluster [ESB3027-189]
NEW: REST API Performance Improvements [ESB3027-208]
NEW: “Star”/Make a Grafana dashboard the home page [ESB3027-243]
NEW: Support for remote TCP connections for confd subscribers [ESB3027-244]
NEW: Persist long term usage data [ESB3027-248]
NEW: New billing dashboard [ESB3027-249]
NEW: [ANSSI-BP-028] System Settings - Network Configuration and Firewalls [ESB3027-258]
NEW: [ANSSI-BP-028] System Settings - SELinux [ESB3027-260]
NEW: Support deploying GUI independently from manager [ESB3027-278]
NEW: Automatically generate Zitadel secret [ESB3027-280]
NEW: Deprecate the generate-ssl-secret command [ESB3027-281]
NEW: Deprecate the generate-zitadel-mastekey command [ESB3027-285]
FIXED: Access to services restricted with SELinux in Enforcing mode [ESB3027-32]
FIXED: Authentication token payload contains invalid user details [ESB3027-47]
FIXED: Unexpected 200 OK response to non-existent confd endpoint [ESB3027-154]
FIXED: Multiple restarts encountered for selection-input service on startup [ESB3027-155]
FIXED: Installer script requires case-sensitive hostnames [ESB3027-158]
FIXED: Installer script does not support configuring additional options [ESB3027-214]
FIXED: Selection input API accepts keys containing non-urlsafe characters [ESB3027-216]
FIXED: Installation fails on minimal RHEL installation [ESB3027-287]
FIXED: Kafka consumer configuration warning logged on startup [ESB3027-294]

Deprecated functionality

None

System requirements

Known limitations

Installation of the software is only supported using a self-hosted configuration.

12.2 - Release esb3027-1.2.1

Build date

2025-05-22

Release status

Type: production

Compatibility

This release is compatible with the following product versions:

AgileTV CDN Director, ESB3024-1.20.1

Breaking changes from previous release

None

Change log

FIXED: Installer changes ownership of /var, /etc/ and /usr [ESB3027-146]
FIXED: K3s installer should not be left on root filesystem [ESB3027-149]

Deprecated functionality

None

System requirements

A minimum CPU architecture level of x86-64-v2 due to inclusion of Oracle Linux 9 inside the container. While all modern CPUs support this archetecture level, virtual hypervisors may default to a CPU type that has more compatibility with older processors. If this minimum CPU architecture level is not attained the containers may refuse to start. See Operating System Compatibility and Building Red Hat Enterprise Linux 9 for the x86-64-v2 Microarchitecture Level for more information.

Known limitations

Installation of the software is only supported using a self-hosted configuration.

12.3 - Release esb3027-1.2.0

Build date

2025-05-14

Release status

Type: production

Compatibility

This release is compatible with the following product versions:

AgileTV CDN Director, ESB3024-1.20.1

Breaking changes from previous release

None

Change log

NEW: Remove .sh extension from all scripts on the ISO [ESB3027-102]
NEW: The script load-certificates.sh should be called generate-ssl-secret [ESB3027-104]
NEW: Add support for High Availability [ESB3027-108]
NEW: Enable the K3s Registry Mirror [ESB3027-110]
NEW: Support for Air-Gapped installations [ESB3027-111]
NEW: Basic hardware monitoring support for nodes in K8s Cluster [ESB3027-122]
NEW: Separate docker containers from ISO [ESB3027-124]
FIXED: GUI is unable to make DELETE request on api/v1/selection_input/modules/blocked_referrers [ESB3027-112]

Deprecated functionality

None

System requirements

A minimum CPU architecture level of x86-64-v2 due to inclusion of Oracle Linux 9 inside the container. While all modern CPUs support this archetecture level, virtual hypervisors may default to a CPU type that has more compatibility with older processors. If this minimum CPU architecture level is not attained the containers may refuse to start. See Operating System Compatibility and Building Red Hat Enterprise Linux 9 for the x86-64-v2 Microarchitecture Level for more information.

Known limitations

Installation of the software is only supported using a self-hosted configuration.

12.4 - Release esb3027-1.0.0

Build date

2025-04-17

Release status

Type: production

Compatibility

This release is compatible with the following product versions:

AgileTV CDN Director, ESB3024-1.20.0

Breaking changes from previous release

None

Change log

This is the first production release

Deprecations from previous release

None

System requirements

A minimum CPU architecture level of x86-64-v2 due to inclusion of Oracle Linux 9 inside the container. While all modern CPUs support this archetecture level, virtual hypervisors may default to a CPU type that has more compatibility with older processors. If this minimum CPU architecture level is not attained the containers may refuse to start. See Operating System Compatibility and Building Red Hat Enterprise Linux 9 for the x86-64-v2 Microarchitecture Level for more information.

Known limitations

Installation of the software is only supported using a self-hosted, single-node configuration.

13 - API Guides

ESB3027 AgileTV CDN Manager API Guides

13.1 - Healthcheck API

Healthchecks

This API provides endpoints to verify the liveness and readiness of the service.

Liveness Check

Endpoint:
GET /api/v1/health/alive

Purpose:
Ensures that the service is running and accepting connections. This check does not verify dependencies or internal health, only that the service process is alive and listening.

Response:

Success (200 OK):

{
  "status": "ok"
}

Failure (503 Service Unavailable):
Indicates the service is not alive, possibly due to a critical failure.

Example Request

GET /api/v1/health/alive HTTP/1.1
Host: your-host
Accept: */*

Example Response

HTTP/1.1 200 OK
Content-Type: application/json

{
  "status": "ok"
}

Readiness Check

Endpoint:
GET /api/v1/health/ready

Purpose:
Verifies if the service is ready to handle requests, including whether all dependencies (such as databases or external services) are operational.

Response:

Success (200 OK):

{
  "status": "ok"
}

Failure (503 Service Unavailable):
Indicates the service or its dependencies are not yet ready.

Example Request

GET /api/v1/health/ready HTTP/1.1
Host: your-host
Accept: */*

Example Response

HTTP/1.1 200 OK
Content-Type: application/json

{
  "status": "ok"
}

Notes

These endpoints are typically used by load balancers, orchestrators like Kubernetes, or monitoring systems to assess service health.
The liveness endpoint confirms the process is running; the readiness endpoint confirms the service and its dependencies are fully operational and ready to serve traffic.

13.2 - Authentication API

API for integrating with Zitadel for Authentication and Authorization

The manager offers a simplified authentication and authorization API that integrates with the Zitadel IAM system. This flow is a streamlined custom OAuth2-inspired process:

Session Establishment:
Users authenticate by sending their credentials to the Login endpoint, which returns a session ID and session token.
Token Exchange:
The session token is exchanged for a short-lived, signed JWT access token via the Token Grant flow. This access token can be used to authorize API requests, and its scopes determine what resources and actions are permitted. The token should be protected, as it grants the bearer the rights specified by its scopes as long as it is valid.

Send user credentials to initiate a session:

POST /api/v1/auth/login HTTP/1.1
Accept: application/json, */*;q=0.5
Content-Type: application/json
Host: localhost:4464

{
    "email": "test@example.com",
    "password": "test"
}

Response:

{
    "expires_at": "2025-01-29T15:49:47.062354+00:00",
    "session_id": "304646367786041347",
    "session_token": "12II6yYYfN8UJ5ij-bac6IRRXX6t9qG_Flrlow_fukXKqvo9HFDVZ7a76Exj7Gn-uVRx04_reCaXew",
    "verified_at": "2025-01-28T15:49:47.054169+00:00"
}

Logout

To terminate a session, send:

POST /api/v1/auth/logout HTTP/1.1
Accept: application/json
Content-Type: application/json
Host: localhost:4464

{
    "session_id": "304646367786041347",
    "session_token": "12II6yYYfN8UJ5ij-bac6IRRXX6t9qG_Flrlow_fukXKqvo9HFDVZ7a76Exj7Gn-uVRx04_reCaXew"
}

Response:

{
    "status": "Ok"
}

Token Grant

After establishing a session, exchange the session token for a short-lived access token:

POST /api/v1/auth/token HTTP/1.1
Accept: application/json
Content-Type: application/json
Host: localhost:4464

{
    "grant_type": "session",
    "scope": "foo bar baz",
    "session_id": "304646818908602371",
    "session_token": "wfCelUhfSb4DKJbLCwg9dr59rTeaC13LF2TXH1tMqXz68ojL8LE9M-dCcwsKgrwjcXkjj9y49wWvdQ"
}

Note: The scope parameter is a space-delimited string defining the permissions requested. The API responds with an access token, which is a JWT that contains embedded scopes and other claims, and must be kept secret.

Response example:

{
    "access_token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzI1NiIsImp3ayI6eyJ1c2UiOiJzaWciLCJhbGciOiJFUzI1NiIsImtpZCI6ImFjZC1tYW5hZ2VyLWVzMjU2LWtleSIsImt0eSI6IkVDIiwiY3J2IjoiUC0yNTYiLCJ4IjoiWWxpYVVoSXpnaTk1SjV4NXdaU0tGRUhyWldFUTdwZDZUR2JrTEN6MGxLcyIsInkiOiJDcWNWY1MzQ1pFMjB2enZiWFdxRERRby00UXEzYnFfLUlPZWNPMlZudkFzIn0sImtpZCI6ImFjZC1tYW5hZ2VyLWVzMjU2LWtleSJ9.eyJleHAiOjE3MzgwODAwMjIsImlhdCI6MTczODA3OTcyMiwibmJmIjoxNzM4MDc5NzIyLCJzdWIiOiJ0ZXN0QGV4YW1wbGUuY29tIiwiZ2l2ZW5fbmFtZSI6IiIsImZhbWlseV9uYW1lIjoiVGVzdCBVc2VyIiwiZW1haWwiOiJ0ZXN0QGV4YW1wbGUuY29tIiwic2NvcGUiOiJmb28gYmFyIGJheiJ9.uRmmszZfkrbJpQxIRpxmHf4gL6omvsOQHeuQYd00Bj8PNwQejNA2ZJO3Q_PsE0qb1IrMX5bsCC_k9lWUFMNQ1w",
    "expires_in": 300,
    "scope": "foo bar baz",
    "token_type": "bearer"
}

The access token can then be included in API requests via the Authorization header as Bearer <token>.

13.3 - Router API

Miscellaneous Routing APIs

The /api/v1/routing/validate endpoint evaluates routing rules for a specified IP address. If the IP is blocked according to the configured rules, the endpoint responds with a 401 Unauthorized.

Limitations

Supported Classifier Types: Only classifiers of type GeoIP, Anonymous IP, and IPRange are supported. Other classifiers require additional information which is not available to the Manager, so they are assumed not to match.
Policy Behavior: Since the exact path taken through the rules during the initial request is unknown, a “default allow” policy is in effect. This means that unless an IP explicitly matches a rule that denies it, the response will be 200 OK, indicating the IP is allowed.

Request

Method:
GET /api/v1/routing/validate?ip=<IP_ADDRESS>

Headers:
Accept: */* (or as needed)

Example:

GET /api/v1/routing/validate?ip=1.1.1.1 HTTP/1.1
Accept: */*
Host: localhost
User-Agent: HTTPie/3.2.4

Response

Blocked IP:
Returns 401 Unauthorized if the IP matches a block rule.

HTTP/1.1 401 Unauthorized

Allowed IP:
Returns 200 OK if the IP does not match a block rule (or if no matching rule is found due to the “default allow” policy).

HTTP/1.1 200 OK

Default-Allow Policy

The routing validation API uses a default-allow policy: if a request does not match any rule, it is allowed. This approach is intentional and designed to prevent valid sessions from being accidentally dropped if your configuration uses advanced features or rule types that are not fully supported by the Manager. Since the Manager only supports a subset of all possible classifier types and rule logic, it cannot always determine the exact path a request would take through the full configuration. By defaulting to allow, the system avoids inadvertently blocking legitimate traffic due to unsupported or unrecognized configuration elements.

To ensure sensitive or restricted IPs are blocked, you must add explicit deny rules at the top of your ruleset. Rules are evaluated in order, and the first match applies.

Best Practice: Place your most specific deny rules first, followed by general allow rules. This ensures that deny conditions are always checked before any allow conditions.

Example Ruleset (confd/confcli syntax)

{
  "rules": [
    {
      "name": "deny-restricted",
      "type": "deny",
      "condition": "in_session_group('Restricted')",
      "onMiss": "allow-general"
    },
    {
      "name": "allow-general",
      "type": "allow",
      "condition": "always()",
      "onMatch": "main-host"
    }
  ]
}

The first rule denies requests from the Restricted session group.
The second rule allows all other requests.

Note: With a default-allow policy, any request not explicitly denied will be permitted. Always review your ruleset to ensure that deny rules are comprehensive and prioritized.

13.4 - Selection Input API

Selection Input API

This API allows you to store arbitrary JSON data in synchronization across all Director instances via Kafka. It is based on the Selection Input API provided by the Director. You can create, delete, and fetch selection input entries at arbitrary paths.

Known Limitations

Parent Path Access: Accessing a parent path (e.g., /foo) will not return all nested structures under that path.
Field Access Limitation: It is not possible to query nested fields directly. For example, if /foo/bar contains {"baz": {"bam": "boom"}}, querying /foo/bar/baz/bam will not return "boom". You can only query /foo/bar/baz to retrieve {"bam": "boom"}.

API Usage

Create New Keys

Create multiple entries under a specified path by POSTing a JSON object where each key-value pair corresponds to a key and its associated data.

Request:

POST /api/v1/selection_input/<path>

Body Example:

{
    "key1": {...},
    "key2": {...}
}

Example:
POST to /api/v1/selection_input/modules/keys with the above body creates:

/modules/keys/key1 with value {...}
/modules/keys/key2 with value {...}

Delete a Key

Remove a specific key at a given path.

Request:

DELETE /api/v1/selection_input/<path>/<key>

Example:
To delete key2 under /modules/keys:

DELETE /api/v1/selection_input/modules/keys/key2

Fetch a Key

Retrieve the data stored under a specific key.

Request:

GET /api/v1/selection_input/<path>/<key>

Example:
To fetch key1 under /modules/keys:

GET /api/v1/selection_input/modules/keys/key1

Response:

{
    "key1": {...}
}

Fetch All Keys Under a Path

Retrieve all selection input data stored under a parent path.

Request:

GET /api/v1/selection_input/<path>

Example:
To get all keys under /modules/keys:

GET /api/v1/selection_input/modules/keys

Response:

{
    "key1": {...},
    "key2": {...}
}

Filtering, Sorting, and Limiting Results

You can refine the list of keys returned by adding query parameters:

search=<string>: Filter results to include only keys matching the search string.
sort=<asc|desc>: Sort keys in ascending or descending order before filtering.
limit=<number>: Limit the number of results returned (positive integer).

Note:

Sorting occurs prior to filtering and limiting.
The order of query parameters does not affect the request.

Example:

GET /api/v1/selection_input/modules/keys?search=foo&sort=asc&limit=10

13.5 - Operator UI API

Operator UI API Guide

This API provides endpoints to retrieve and manage blocked tokens, user agents, and referrers used within the Operator UI.

Endpoints

Retrieve List of Blocked Tokens

GET /api/v1/operator_ui/modules/blocked_tokens/

Fetches a list of blocked tokens, supporting optional filtering, sorting, and limiting.

Query Parameters:

search (optional): Filter tokens matching the search term.
limit (optional): Limit number of results.
sort (optional): Sort order, "asc" or "desc" (default: "asc").

Responses:

200 OK with JSON array of blocked tokens.
404 Not Found if no tokens found.
500 Internal Server Error on failure.

Retrieve a Specific Blocked Token

GET /api/v1/operator_ui/modules/blocked_tokens/{token}

Fetches details of a specific blocked token.

Path Parameter:

token: The token string to retrieve.

Responses:

200 OK with JSON object of the token.
404 Not Found if token does not exist.
500 Internal Server Error on failure.

Retrieve List of Blocked User Agents

GET /api/v1/operator_ui/modules/blocked_user_agents/

Fetches a list of blocked user agents, with optional sorting and limiting.

Query Parameters:

limit (optional): Limit number of results.
sort (optional): "asc" or "desc" (default: "asc").

Responses:

200 OK with JSON array of user agents.
404 Not Found if none found.
500 Internal Server Error on failure.

Retrieve a Specific Blocked User Agent

GET /api/v1/operator_ui/modules/blocked_user_agents/{user_agent}

Retrieves details of a specific blocked user agent.

Path Parameter:

user_agent: URL-safe Base64 encoded string (without padding). Decode before use; if decoding fails, the server returns 400 Bad Request.

Responses:

200 OK with JSON object of the user agent.
404 Not Found if not found.
500 Internal Server Error on failure.

Retrieve List of Blocked Referrers

GET /api/v1/operator_ui/modules/blocked_referrers/

Fetches a list of blocked referrers, with optional sorting and limiting.

Query Parameters:

limit (optional): Limit number of results.
sort (optional): "asc" or "desc" (default: "asc").

Responses:

200 OK with JSON array of referrers.
404 Not Found if none found.
500 Internal Server Error on failure.

Retrieve a Specific Blocked Referrer

GET /api/v1/operator_ui/modules/blocked_referrers/{referrer}

Retrieves details of a specific blocked referrer.

Path Parameter:

referrer: URL-safe Base64 encoded string (without padding). Decode before use; if decoding fails, return 400 Bad Request. The response includes the decoded referrer.

Responses:

200 OK with JSON object containing the referrer.
404 Not Found if not found.
500 Internal Server Error on failure.

Additional Notes

For User Agents and Referrers, the path parameters are URL-safe Base64 encoded (per RFC 4648, using - and _ instead of + and /) with padding (=) removed. Clients should remove padding when constructing requests and restore it before decoding.
All endpoints returning specific items will respond with 404 Not Found if the item does not exist.
Errors during processing will return 500 Internal Server Error with an error message.

14 - Use Cases

Common use cases and examples

14.1 - Custom Deployments

How to selectively deploy components in acd-manager

In some environments, it may not be necessary to run all components of the ESB3027 AgileTV CDN Manager—such as when certain features are not used, or when components like the MIB Frontend Configuration GUI are hosted separately, for example, in a public cloud environment. The examples in this guide illustrate scenarios and the configuration properties needed to achieve specific configurations.

Manager Without Metrics and Monitoring Support

If metrics and monitoring are not required—perhaps because an existing monitoring solution is in place—it is possible to disable the deployment of Telegraf, Prometheus, Grafana, and VictoriaMetrics. You can choose to skip the entire metrics suite or disable individual components as needed.

Keep in mind, that disabling certain components may require adjustments elsewhere in the configuration. For example, disabling Prometheus will necessitate modifications to Grafana and VictoriaMetrics configurations, since they depend on Prometheus being available.

To disable all metrics components, set:

acd-metrics.enabled: false

Applying this configuration will prevent the deployment of the entire metrics suite. To disable individual components within the metrics framework, set their respective enabled flags to false. For example, to disable only Grafana but keep other metrics components active:

acd-metrics.grafana.enabled: false

Manager Without the MIB Frontend Configuration GUI

If the MIB-Frontend GUI will not be used to configure the ESB3024 AgileTV CDN Director instances, this component can be disabled by setting:

mib-frontend.enabled: false

This is also useful if the frontend is hosted in a separate cluster—such as in a public cloud like AWS —or if the manager is deployed within a customer’s network without the frontend.

15 - Troubleshooting Guide

How to troubleshoot ESB3027 AgileTV CDN Manager

This guide helps diagnose common issues with the acd-manager deployment and its associated pods.

1. Check Pod Status

Verify all pods are running:

kubectl get pods

Expected:

Most pods should be in Running state with READY as 1/1 or 2/2.
Pods marked as 0/1 or 0/2 are not fully ready, indicating potential issues.

2. Investigate Unready or Failed Pods

Example:

kubectl describe pod acd-manager-6c85ddd747-rdlg6

Look for events such as CrashLoopBackOff, ImagePullBackOff, or ErrImagePull.
Check container statuses for error messages.

3. Check Pod Logs

Fetch logs for troubleshooting:

kubectl logs acd-manager-6c85ddd747-rdlg6

For pods with multiple containers:

kubectl logs acd-manager-<pod_name> -c <container_name>

Focus on recent errors or exceptions.

4. Verify Connectivity and Dependencies

PostgreSQL: Confirm the acd-cluster-postgresql-0 pod is healthy and accepting connections.
Kafka: Check kafka-controller pods are running and not experiencing issues.
Redis: Ensure Redis master and replicas are healthy.
Grafana, Prometheus, VictoriaMetrics: Confirm these services are operational.

5. Check Resource Usage

High CPU or memory can cause pods to crash or become unresponsive:

kubectl top pods

Actions:

Scale resources if needed.
Review resource quotas and limits.

6. Check Events in Namespace

kubectl get events --sort-by='.lastTimestamp'

Look for warnings or errors related to pod scheduling, network issues, or resource constraints.

7. Restart Problematic Pods

Sometimes, restarting pods can resolve transient issues:

kubectl delete pod <pod_name>

Kubernetes will automatically recreate the pod.

8. Verify Configurations and Secrets

Check ConfigMaps and Secrets for correctness:

kubectl get configmaps
kubectl get secrets

Confirm environment variables and mounted volumes are correctly configured.

9. Check Cluster Network

Ensure network policies or firewalls are not blocking communication between pods and external services.

10. Additional Tips

Upgrade or Rollback: If recent changes caused issues, consider rolling back or upgrading the deployment.
Monitoring: Use Grafana and VictoriaMetrics dashboards for real-time insights.
Documentation: Consult application-specific logs and documentation for known issues.

Summary Table

Issue Type	Common Checks	Commands
Pod Not Ready	Describe pod, check logs	`kubectl describe pod`, `kubectl logs`
Connectivity	Verify service endpoints	`kubectl get svc`, `curl` from within pods
Resource Limits	Monitor resource usage	`kubectl top pods`
Events & Errors	Check cluster events	`kubectl get events`
Configuration	Validate configs and secrets	`kubectl get configmaps`, `kubectl get secrets`

If issues persist, consider scaling down and up components or consulting logs and metrics for deeper analysis.

16 - Glossary

ESB3027 AgileTV CDN Manager definitions of commonly used terms

Access Token: A credential used to authenticate and authorize access to resources or APIs on behalf of a user, usually issued by an authorization server as part of an OAuth 2.0 flow. It contains the necessary information to verify the user’s identity and define the permissions granted to the token holder.
Bearer Token: A type of access token that allows the holder to access protected resources without needing to provide additional credentials. It’s typically included in the HTTP Authorization header as Authorization: Bearer <token>, and grants access to any resource that recognizes the token.
Chart: A Helm Chart is a collection of files that describe a related set of Kubernetes resources required to deploy an application, tool, or service. It provides a structured way to package, configure, and manage Kubernetes applications.
Cluster: A group of interconnected computers or nodes that work together as a single system to provide high availability, scalability and redundancy for applications or services. In Kubernetes, a cluster usually consists of one primary node, and multiple worker or agent nodes.
Confd: An AgileTV backend service that hosts the service configuration. Comes with an API, a CLI and a GUI.
ConfigMap (Kubernetes): A Kubernetes resource used to store non-sensitive configuration data in key-value pairs, allowing applications to access configuration settings without hardcoding them into the container images.
Containerization: The practice of packaging applications and their dependencies into lightweight portable containers that can run consistently across different computing environments.
Deployment (Kubernetes): A resource object that provides declarative updates to applications by managing the creation and scaling of a set of Pods.
Director: The AgileTV Delivery OTT router and related services.
ESB: A software bundle that can be separately installed and upgraded, and is released as one entity with one change log. Each ESB is identified with a number. Over time, features and functions within an ESB can change.
Helm: A package manager for Kubernetes that simplifies the development and management of applications by using pre-configured templates called charts. It enables users to define, install, and upgrade complex applications on Kubernetes.
Ingress: A Kubernetes resource that manages external access to services within a cluster, typically HTTP. It provides routing rules to manage traffic to various services based on hostnames and paths.
K3s: A lightweight Kubernetes cluster developed by Rancher Labs. This is a complete Kubernetes system deployed as a single portable binary.
K8s: A common abbreviation for Kubernetes.
Kafka: Apache Kafka is an open-source distributed event streaming platform designed for building real-time data pipelines and streaming applications. It enables the publication, subscription, storage, and processing of streams of records in a fault-tolerant and scalable manner.
Kubectl: The command-line tool for interacting with Kubernetes clusters, allowing users to deploy applications, manage cluster resources, and inspect logs or configurations.
Kubernetes: An open-source container orchestration platform designed to automate scaling, and management of containerized applications. It enables developers and operations teams to manage complex applications consistently across various environments.
LoadBalancer: A networking tool that distributes network traffic across multiple servers or Pods to ensure no single server becomes overwhelmed, improving reliability and performance.
Manager: The AgileTV Management Software and related services.
Namespace: A mechanism for isolating resources within a Kubernetes cluster, allowing multiple teams or applications to coexist without conflict by providing a scope for names.
OAuth2: An open standard for authorization that allows third-party applications to gain limited access to a user’s resources on a server without exposing the user’s credentials.
Pod: The smallest deployable unit in Kubernetes that encapsulates one or more containers, sharing the same network and storage resources. It serves as a logical host for tightly coupled applications, allowing them to communicate and function effectively within a cluster.
Router: Unless otherwise specified, an HTTP router that manages an OTT session using HTTP redirect. There are also ways to use DNS instead of HTTP.
Secret (Kubernetes): A resource used to store sensitive information, such as passwords, API keys, or tokens in a secure manner. Secrets are encoded in base64 and can be made available to Pods as environment variables or mounted as files, ensuring that sensitive data is not exposed in the application code or configuration files.
Service (Kubernetes): An abstraction that defines a logical set of Pods and a policy to access them, enabling stable networking and load balancing to ensure reliable communication among application components.
Session Token: A session token is a temporary, unique identifier generated by a server and issued to a user upon successful authentication.
Stateful Set (Kubernetes): A Kubernetes deployment which guarantees ordering and uniqueness of Pods, typically used for applications that require stable network identities and persistent storage such as with databases.
Topic (Kafka): A category or feed name to which records (messages) are published. Messages flow through a topic in the order in which they are produced, and multiple consumers can subscribe to the stream to process the records in real time.
Volume (Kubernetes): A persistent storage resource in Kubernetes that allows data to be stored and preserved beyond the lifecycle of individual Pods, facilitating data sharing and durability.
Zitadel: An open-source identity and access management (IAM) platform designed to handle user authentication and authorization for applications. It provides features like single-sign-on (SSO), multi-factor authentication (MFA), and support for various authentication protocols.