Installation on DigitalOcean Cloud#
This article explains how to install the AMD enterprise AI reference stack in the DigitalOcean cloud environment. The article complements the full installation article by describing the DigitalOcean specific installation configuration. For more details about the installation steps, refer to the full installation article.
Prerequisites#
Suggested minimum configuration for DigitalOcean droplet:
AMD MI300X
1 GPU - 192 GB VRAM - 20 vCPU - 240 GB RAM
Boot disk: 720 GB NVMe- Scratch disk: 5 TB NVMe
Software requirements:
Choose an image pre-installed with ROCm™ Software (not ROCm enabled GPT_OSS), e.g. ROCm version >= 7.1 (ROCm 7.2 is currently not supported)
Installation steps#
In order to install on a DigitalOcean droplet, create a bloom.yaml file and copy the following text into the file, replacing <your-ip-address> with the ip address of the node.
DOMAIN: <your-ip-address>.nip.io
CERT_OPTION: generate
FIRST_NODE: true
GPU_NODE: true
CLUSTER_DISKS: /dev/vdc1
During installation, a large number of container images may be pulled from Docker Hub in a short period; unauthenticated pulls can hit rate limits and cause ImagePullBackOff on some pods. Add DOCKERHUB_USER and DOCKERHUB_TOKEN to bloom.yaml before running bloom (see Container Registry Configuration).
Download the installation tool (“bloom”):
wget https://github.com/silogen/cluster-bloom/releases/latest/download/bloom
Make the file executable:
chmod +x bloom
Then start the installation:
sudo ./bloom cli bloom.yaml
Exit and re-login to source the .bashrc, or run
source ~/.bashrc
Installation without AI Resource Manager#
This explains how to install the reference stack without AI Resource Manager. This installation requires running cluster-bloom and cluster-forge separately.
Create the bloom.yaml file and copy the following text into the file, replacing <your-ip-address> with the ip address of the node.
DOMAIN: <your-ip-address>.nip.io
CERT_OPTION: generate
FIRST_NODE: true
GPU_NODE: true
CLUSTER_DISKS: /dev/vdc1
CLUSTERFORGE_RELEASE: "none"
Add DOCKERHUB_USER and DOCKERHUB_TOKEN to bloom.yaml to avoid rate limiting. See the the full installation article for a longer discussion. Repeat the commands described in the installation steps above:
wget https://github.com/silogen/cluster-bloom/releases/latest/download/bloom
chmod +x bloom
sudo ./bloom cli bloom.yaml
source ~/.bashrc
Create a clusterforge folder, download Cluster-Forge Enterprise AI Package and extract the package:
mkdir clusterforge
chmod 755 clusterforge
wget -O "./clusterforge/clusterforge.tar.gz" https://github.com/silogen/cluster-forge/releases/download/v2.1.0/release-enterprise-ai-v2.1.0.tar.gz
tar -xzf "./clusterforge/clusterforge.tar.gz" -C ./clusterforge --no-same-owner
cd clusterforge/cluster-forge
Run bootstrap.sh
./scripts/bootstrap.sh <your-ip-address>.nip.io --aiwb-only
Add appDomain to values.yaml in Gitea (for --aiwb-only)#
Note
When installing the reference stack without AI Resource Manager, certain configuration values must be set manually, as described in this section. This step will be automated in an upcoming release.
Navigate to https://gitea.<your-domain> and sign in with the Gitea admin credentials. Retrieve the credentials by running the following command:
echo "username:silogen-admin" && kubectl get secret gitea-admin-credentials -n cf-gitea -o jsonpath='{.data.password}' | base64 -d && echo
After signing in, open the values.yaml file in the cluster-values repository:
https://gitea.<your-domain>/cluster-org/cluster-values/src/branch/main/values.yaml
Select Edit. Under the existing apps.aiwb.helmParameters, add the following entry for appDomain:
aiwb:
helmParameters:
# STANDALONE-MODE
- name: standAloneMode
value: "true"
- name: appDomain
value: <your-domain>
Select Commit Changes at the bottom of the page. The change will be synchronized to the cluster within a few minutes.