Installation on DigitalOcean Cloud#

This article explains how to install AMD Enterprise AI Suite in DigitalOcean cloud environment. The article complements the full installation article by describing the DigitalOcean specific installation configuration. For more details about the installation steps, refer to the full installation article.

Prerequisites#

Suggested minimum configuration for DigitalOcean droplet:

  • AMD MI300X

  • 1 GPU - 192 GB VRAM - 20 vCPU - 240 GB RAM

  • Boot disk: 720 GB NVMe- Scratch disk: 5 TB NVMe

Software requirements:

  • Choose an image pre-installed with ROCm™ Software, e.g. ROCm version >= 6.4.0

Installation steps#

In order to install on a DigitalOcean droplet, create bloom.yaml file and copy the following text into the file, replacing with the ip address of the node.

DOMAIN: <your-ip-address>.nip.io
CERT_OPTION: generate
FIRST_NODE: true
GPU_NODE: true
CLUSTER_DISKS: /dev/vdc1

Download the installation tool (“bloom”):

wget https://github.com/silogen/cluster-bloom/releases/latest/download/bloom

Make the file executable:

chmod +x bloom

Then start the installation:

sudo ./bloom cli bloom.yaml

Exit and re-login to source the .bashrc, or run

source ~/.bashrc

Manage dockercred to avoid rate limiting

Install AMD Enterprise AI Suite without AI Resource Manager#

This explains how to install AMD Enterprise AI Suite without AI Resource Manager. This installation requires running cluster-bloom and cluster-forge separately.

Create bloom.yaml file and copy the following text into the file, replacing with the ip address of the node.

DOMAIN: <your-ip-address>.nip.io
CERT_OPTION: generate
FIRST_NODE: true
GPU_NODE: true
CLUSTER_DISKS: /dev/vdc1
CLUSTERFORGE_RELEASE: "none"

Repeat the commands described in installation steps above

wget https://github.com/silogen/cluster-bloom/releases/latest/download/bloom
chmod +x bloom
sudo ./bloom cli bloom.yaml
source ~/.bashrc

Manage dockercred to avoid rate limiting

Create clusterforge folder, download Cluster-Forge Enterprise AI Package and extract the package

mkdir clusterforge
chmod 755 clusterforge
wget -O "./clusterforge/clusterforge.tar.gz" https://github.com/silogen/cluster-forge/releases/download/v2.0.2/release-enterprise-ai-v2.0.2.tar.gz
tar -xzf "./clusterforge/clusterforge.tar.gz" -C ./clusterforge --no-same-owner
cd clusterforge/cluster-forge

Run bootstrap.sh

./scripts/bootstrap.sh <your-ip-address>.nip.io --aiwb-only

Add appDomain to values.yaml in Gitea (for --aiwb-only)#

Note

When installing AMD Enterprise AI Suite without AI Resource Manager, certain configuration values must be set manually, as described in this section. This step will be automated in an upcoming release.

Navigate to https://gitea.<your-domain> and sign in with the Gitea admin credentials. Retrieve the credentials by running the following command:

echo "username:silogen-admin" && kubectl get secret gitea-admin-credentials -n cf-gitea -o jsonpath='{.data.password}' | base64 -d && echo

After signing in, open the values.yaml file in the cluster-values repository: https://gitea.<your-domain>/cluster-org/cluster-values/src/branch/main/values.yaml

Select Edit. Under the existing apps.aiwb.helmParameters, add the following entry for appDomain:

  aiwb:
    helmParameters:
      # STANDALONE-MODE
      - name: standAloneMode
        value: "true"
      - name: appDomain
        value: <your-domain>

Select Commit Changes at the bottom of the page. The change will be synchronized to the cluster within a few minutes.