Installation Guide
SELinux Requirements
SELinux is fully supported provided it is enabled and set to “Enforcing” mode at the time of the initial cluster installation on all Nodes. This is the default configuration for Red Hat Enterprise Linux and its derivatives, such as Oracle Linux and AlmaLinux. If the mode is set to “Enforcing” prior to install time, the necessary SELinux packages will be installed, and the cluster will be started with support for SELinux. For these reasons, enabling SELinux after the initial cluster installation is not supported.
Firewalld Requirements
Please see the Networking Guide for the current firewall recommendations.
Hardware Requirements
Refer to the System Requirements Guide for the current Hardware, Operating System, and Network Requirements.
Networking Requirements
A minimum of one Network Interface Card must be present and configured as the default gateway on the node when the cluster is installed. If the node does not have an interface with the default route, a default route must be configured. Even a black-hole route via a dummy interface will suffice. The K3s software requires a default route in order to auto-detect the node’s primary IP, and for cluster routing to function properly. To add a dummy route do the following:
ip link add dummy0 type dummy
ip link set dummy0 up
ip addr add 203.0.113.254/31 dev dummy0
ip route add default via 203.0.113.255 dev dummy0 metric 1000
Special Considerations when using Multiple Network Interfaces
If there are special network considerations, such as using a non-default interface for
cluster communication, that must be configured using the INSTALL_K3S_EXEC environment
variable as below before installing the cluster or joining nodes.
As an example, consider the case where the node contains two interfaces, bond0 and bond1, where the
default route exists through bond0, but where bond1 should be used for cluster communication. In
that case, ensure that the INSTALL_K3S_EXEC environment variable is set as follows in the environment
prior to installing or joining the cluster. Assuming that bond1 has the local IP address 10.0.0.10:
export INSTALL_K3S_EXEC="<MODE> --node-ip 10.0.0.10 --flannel-iface=bond1"
Where MODE should be one of server or agent depending on the role of the node. The initial
node used to create the cluster MUST be server, and additional nodes vary depending on the
role.
Air-Gapped Environments
In air-gapped environments—those without direct Internet access—additional considerations are
required. First, on each node, the Operating System’s ISO must be mounted so that dnf can be
used to install essential packages included with the OS. Second, the “Extras” ISO from the
ESB3027 AgileTV CDN Manager must be mounted to provide access to container images for
third-party software that would otherwise be downloaded from public repositories. Details on
mounting this ISO and loading the included images are provided below.
Introduction
details about node roles and sizing can be found in the System Requirements Guide. Installing the ESB3027 AgileTV CDN Manager for production requires a minimum of three nodes. More details about node roles and sizing can be found in the System Requirements Guide. Before beginning the installation, select one node as the primary “Server” node. This node will serve as the main installation point. Once additional Server nodes join the cluster, all Server nodes are considered equivalent, and cluster operations can be managed from any of them. The typical process involves installing the primary node as a Server, then adding more Server nodes to expand the cluster, followed by joining Agent nodes as needed to increase capacity.
Roles
All nodes in the cluster have one of two roles. Server nodes run the control-plane software necessary to manage the cluster and provide redundancy. Agent nodes do not run the control-plane software; instead, they are responsible for running the Pods that make up the applications. Jobs are distributed among agent nodes to enable horizontal scalability of workloads. However, agent nodes do not contribute to the cluster’s high availability. If an agent node fails, the Pods assigned to that node are automatically moved to another node, provided sufficient resources are available.
Control-plane only Server nodes
Both server nodes and agent nodes run workloads within the cluster. However, a special attribute called the “CriticalAddonsOnly” taint can be applied to server nodes. This taint prevents the node from scheduling workloads that are not part of the control plane. If the hardware allows, it is recommended to apply this taint to server nodes to separate their responsibilities. Doing so helps prevent misbehaving applications from negatively impacting the overall health of the cluster.
graph TD
subgraph Cluster
direction TB
ServerNodes[Server Nodes]
AgentNodes[Agent Nodes]
end
ServerNodes -->|Manage cluster and control plane| ControlPlane
ServerNodes -->|Provide redundancy| Redundancy
AgentNodes -->|Run application Pods| Pods
Pods -->|Handle workload distribution| Workloads
AgentNodes -->|Failover: Pods move if node fails| Pods
ServerNodes -->|Can run Pods unless tainted with CriticalAddonsOnly| PodExecution
Taint[CriticalAddonsOnly Taint] -->|Applied to server nodes to restrict workload| ServerNodesFor high availability, at least three nodes running the control plane are required, along with at least three nodes running workloads. These can be a combination of server and agent roles, provided that the control-plane nodes are sufficient. If a server node has the “CriticalAddonsOnly” taint applied, an additional agent node must be deployed to ensure workloads can run. For example, the cluster could consist of three untainted server nodes, or two untainted servers, one tainted server, and one agent, or three tainted servers and three agents—all while maintaining at least three control-plane nodes and three workload nodes.
The “CriticalAddonsOnly” taint can be applied to server nodes at any time after cluster installation. However, it only affects Pods scheduled in the future. Existing Pods that have already been assigned to a server node will remain there until they are recreated or rescheduled due to an external event.
kubectl taint nodes <node-name> CriticalAddonsOnly=true:NoSchedule
Where node-name is the hostname of the node for which to apply the taint. Multiple node names
may be specified in the same command. This command should only be run from one of the server nodes.
Installing the Primary Server Node
Mount the ESB3027 ISO
Start by mounting the core ESB3027 ISO on the system. There are no limitations on the exact
mountpoint used, but for this document, we will assume /mnt/esb3027.
mkdir -p /mnt/esb3027
mount -o loop,ro esb3027-acd-manager-X.Y.Z.iso /mnt/esb3027
Run the installer
Run the install command to install the base cluster software.
/mnt/esb3027/install
(Air-gapped only) Mount the “Extras” ISO and Load Container Images
In an air-gapped environment, after running the installer, the “extras” image must be mounted. This image contains publicly available container images that otherwise would be simply downloaded from the source repositories.
mkdir -p /mnt/esb3027-extras
mount -o loop,ro esb3027-acd-manager-extras-X.Y.Z.iso /mnt/esb3027-extras
The public container images for third-party products such as Kafka, Redis, Zitadel, etc., need to be loaded into the container runtime. An embedded registry mirror is used to distribute these images to other nodes within the cluster, so this only needs to be performed on one machine.
/mnt/esb3027-extras/load-images
Fetch the primary node token
In order to join additional nodes into the cluster, a unique node token must be provided. This token is automatically generated on the primary node during the installation process. Retrieve this now, and take note of it for later use.
cat /var/lib/rancher/k3s/server/node-token
Join Additional Server Nodes
From each additional server node, mount the core ISO and join the cluster using the following commands.
mkdir -p /mnt/esb3027
mount esb3027-acd-manager-X.Y.Z.iso /mnt/esb3027
Obtain the node token from the primary server as you will need to include it in the following command. You will also need the URL to the primary server for which to connect.
/mnt/esb3027/join-server https://primary-server-ip:6443 abcdefg0123456...987654321
Where primary-server-ip is replaced with the IP address to which this node should connect to the
primary server, and abcdef...321 is the contents of the node-token retrieved from the primary server.
Repeat the above steps on each additional Server node in the cluster.
Join Agent Nodes
From each additional agent node, mount the core ISO and join the cluster using the following commands.
mkdir -p /mnt/esb3027
mount esb3027-acd-manager-X.Y.Z.iso /mnt/esb3027
Obtain the node token from the primary server as you will need to include it in the following command. You will also need the URL to the primary server for which to connect.
/mnt/esb3027/join-agent https://primary-server-ip:6443 abcdefg0123456...987654321
Where primary-server-ip is replaced with the IP address to which this node should connect to the
primary server, and abcdef...321 is the contents of the node-token retrieved from the primary server.
Repeat the above steps on each additional Agent node in the cluster.
Verify the state of the cluster
At this point, a generic Kubernetes cluster should have multiple nodes connected and be marked Ready. Verify this is the case by running the following from any one of the Server nodes.
kubectl get nodes
Each node in the cluster should be listed in the output with the status “Ready”, and the Server nodes should have “control-plane” in the listed Roles. If this is not the case, see the Troubleshooting Guide to help diagnose the problem.
Deploy the cluster helm chart
The acd-cluster helm chart, which is included on the core ISO, contains the clustering software which
is required for self-hosted clusters, but may be optional in Cloud deployments. Currently this consists
of a PostgreSQL database server, but additional components may be added in later releases.
helm install --wait --timeout 10m acd-cluster /mnt/esb3027/helm/charts/acd-cluster
Deploying the Manager chart
The acd-manager helm chart is used to deploy the acd-manager application as well as any of the
third-party services on which the chart depends. Installing this chart requires at least a minimal
configuration to be applied. To get started, either copy the default values.yaml file from the chart
directory /mnt/esb3027/helm/charts/acd-manager/values.yaml or copy the following minimal template to a
writable location such as the user’s home directory.
global:
hosts:
manager:
- host: manager.local
routers:
- name: director-1
address: 192.0.2.1
- name: director-2
address: 192.0.2.2
zitadel:
zitadel:
configmapConfig:
ExternalDomain: manager.local
Where:
manager.localis either the external IP or resolvable DNS name used to access the manager’s cluster.- All director instances should be listed in the
global.hosts.routerssection. Thenamefield is used in URLs, and must consist of only alpha-numeric characters or ‘.’, ‘-’, or ‘_’.
Further details on the available configuration options in the default values.yaml file can be found in
the Configuration Guide.
You must set at a minimum the following properties:
| Property | Type | Description |
|---|---|---|
| global.hosts.manager | Array | List of external IP addresses or DNS hostnames for each node in the cluster |
| global.hosts.router | Array | List of name and address for each instance of ESB3024 AgileTV CDN Director |
| zitadel.zitadel.configmapConfig.ExternalDomain | String | External DNS domain name or IP address of one manager node. This must match the first entry from global.hosts.manager |
Note! The Zitadel ExternalDomain must match the hostname or IP address given in the first
global.hosts.manager entry, and MUST match the Origin used when accessing Zitadel. This is enforced by
CORS.
Hint: For non-air-gapped environments, where no DNS servers are present, a third-party service
sslip.io may be used to provide a resolvable DNS name which can be used for both the
global.hosts.manager and Zitadel ExternalDomain entries. Any IP address passed as
W.X.Y.Z.sslip.io will resolve to the IP W.X.Y.Z
Only the value used for Zitadel’s ExternalDomain may be used to access Zitadel due to CORS
restrictions. E.g. if that is set to “10.10.10.10.sslip.io”, then Zitadel must be accessed via the URL
https://10.10.10.10.sslip.io/ui/console. This must match the first entry in global.hosts.manager as
that entry will be used by internal services that need to interact with Zitadel, such as the frontend
GUI and the manager API services.
Importing TLS Certificates
By default, the manager will generate a self-signed TLS certificate for use with the cluster ingress.
In production environments, it is recommended to use a valid TLS certificate issued by a trusted Certificate Authority (CA).
To install the TLS certificate pair into the ingress controller, the certificate and key must be saved in a Kubernetes secret. The simplest way of doing this is to let Helm generate the secret by including the PEM formatted certificate and private key directly in the configuration values. Alternatively, the secret can be created manually and simply referenced by the configuration.
Option 1: Let Helm manage the secret
To have Helm automatically manage the secret based on the PEM formatted certificate and key, add a record
to ingress.secrets as described in the following snippet.
ingress:
secrets:
- name: <secret-name>
key: |-
-----BEGIN RSA PRIVATE KEY-----
...
-----END RSA PRIVATE KEY-----
certificate: |-
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
Option 2: Manually creating the secret
To manually create the secret in Kubernetes, execute the following command: This will create a secret named “secret-name”.
kubectl create secret tls secret-name --cert=tls.crt --key=tls.key
Configure the Ingress
The ingress controllers must be configured as to the name of the secret holding the certificate and key files. Additionally, the DNS hostname or IP address, covered by the certificate, which Must be used to access the ingress, must be set in the configuration.
ingress:
hostname: <dns-hostname>
tls: true
secretName: <secret-name>
zitadel:
ingress:
tls:
- hosts:
- <dns-hostname>
secretName: <secret-name>
confd:
ingress:
hostname: <dns-hostname>
tls: true
secretName: <secret-name>
mib-frontend:
ingress:
hostname: <dns-hostname>
tls: true
secretName: <secret-name>
dns-hostname- A valid DNS hostname for the cluster which is valid for the certificate. For compatibility with Zitadel and CORS restrictions, this MUST be the same DNS hostname listed as the first entry inglobal.hosts.manager.secret-name- An arbitry name used to identify the Kubernetes secret containing the TLS certificate and key. This has a maximum length limitation of 53 characters.
Loading Maxmind GeoIP databases
The Maxmind GeoIP databases are required if GeoIP lookups are to be performed by the manager. If this functionality is used, then Maxmind formatted GeoIP databases must be configured. The following databases are used by the manager.
GeoIP2-City.mmdb- The City Database.GeoLite2-ASN.mmdb- The ASN Database.GeoIP2-Anonymous-IP.mmdb- The VPN and Anonymous IP database.
A helper utility has been provided on the ISO called generate-maxmind-volume that will prompt the user
for the locations of these 3 database files, and the name of a volume, which will be created in
Kubernetes. After running this command, set the manager.maxmindDbVolume property in the configuration
to the volume name.
To run the utility, use:
/mnt/esb3027/generate-maxmind-volume
Installing the Chart
Install the acd-manager helm chart using the following command: (This assumes the configuration is in
~/values.yaml)
helm install acd-manager /mnt/esb3027/helm/charts/acd-manager --values ~/values.yaml --timeout 10m
By default, there is not expected to be much output from the helm install command itself. If you would
like to see more detailed information in real-time throughout the deployment process, you can add the
--debug flag to the command:
helm install acd-manager /mnt/esb3027/helm/charts/acd-manager --values ~/values.yaml --timeout 10m --debug
Note: The
--timeout 10mflag increases the default Helm timeout from 5 minutes to 10 minutes. This is recommended because the default may not be sufficient on slower hardware or in resource-constrained environments. You may need to adjust the timeout value further depending on your system’s performance or deployment conditions.
Monitor the chart rollout with the following command:
kubectl get pods
The output of which should look similar to the following:
NAME READY STATUS RESTARTS AGE
acd-cluster-postgresql-0 1/1 Running 0 44h
acd-manager-6c85ddd747-5j5gt 1/1 Running 0 43h
acd-manager-confd-558f49ffb5-n8dmr 1/1 Running 0 43h
acd-manager-gateway-7594479477-z4bbr 1/1 Running 0 43h
acd-manager-grafana-78c76d8c5-c2tl6 1/1 Running 0 43h
acd-manager-kafka-controller-0 2/2 Running 0 43h
acd-manager-kafka-controller-1 2/2 Running 0 43h
acd-manager-kafka-controller-2 2/2 Running 0 43h
acd-manager-metrics-aggregator-f6ff99654-tjbfs 1/1 Running 0 43h
acd-manager-mib-frontend-67678c69df-tkklr 1/1 Running 0 43h
acd-manager-prometheus-alertmanager-0 1/1 Running 0 43h
acd-manager-prometheus-server-768f5d5c-q78xb 1/1 Running 0 43h
acd-manager-redis-master-0 2/2 Running 0 43h
acd-manager-redis-replicas-0 2/2 Running 0 43h
acd-manager-selection-input-844599bc4d-x7dct 1/1 Running 0 43h
acd-manager-telegraf-585dfc5ff8-n8m5c 1/1 Running 0 43h
acd-manager-victoria-metrics-single-server-0 1/1 Running 0 43h
acd-manager-zitadel-69b6546f8f-v9lkp 1/1 Running 0 43h
acd-manager-zitadel-69b6546f8f-wwcmx 1/1 Running 0 43h
acd-manager-zitadel-init-hnr5p 0/1 Completed 0 43h
acd-manager-zitadel-setup-kjnwh 0/2 Completed 0 43h
The output contains a “READY” column, which indicates the number of ready pods on the left, and the number of requested pods on the right. Pods with status “Completed” are one time commands that have terminated successfully and can be ignored in this output. For “Running” pods, once all pods have the same number on both sides of the “READY” status the rollout is complete.
If a Pod is marked as “CrashLoopBackoff” or “Error” this means that either one of the containers in the pod has failed to deploy, or that the container has terminated in an Error state. See the Troubleshooting Guide to help diagnose the problem. The Kubernetes cluster will retry failed pod deployments several times, and the number in the “RESTARTS” column will show the number of times that has happened. If a pod restarts during the initial rollout, this may simply be that the state of the cluster was not as expected by the pod at that time, and this can be safely ignored. After the initial rollout has completed, the pods should stabilize, and multiple restarts may be an indication that something is wrong. In that case, refer to the Troubleshooting Guide for more information.
Next Steps
For post-installation steps, see the Post Install Guide.