Docker deployment#

This guide provides step-by-step instructions for deploying AMD Inference Microservice (AIM) container that supports different variants of Llama-3.1-8B-Instruct model. Follow these instructions to quickly get started with running an AI model on AMD GPUs. For more detailed information, please refer to development documentation.

Prerequisites#

  • AMD GPU with ROCm support (e.g., MI300X, MI325X)

  • Docker installed and configured with GPU support

  • Access to model repositories (Hugging Face account with appropriate permissions for gated models)

1. Docker deployment#

1.1 Running the container#

docker run \
  -e HF_TOKEN=<YOUR_HUGGINGFACE_TOKEN> \
  --device=/dev/kfd --device=/dev/dri \
  -p 8000:8000 \
  amdenterpriseai/aim-meta-llama-llama-3-1-8b-instruct:0.8.4

Where <YOUR_HUGGINGFACE_TOKEN> is your Hugging Face access token (required for gated models)

1.2 Customizing deployment with environment variables#

Customize your deployment with optional environment variables:

docker run \
  -e AIM_PRECISION=fp16 \
  -e AIM_GPU_COUNT=1 \
  -e AIM_METRIC=throughput \
  -e AIM_PORT=8080 \
  --device=/dev/kfd --device=/dev/dri \
  -p 8080:8080 \
  amdenterpriseai/aim-meta-llama-llama-3-1-8b-instruct:0.8.4

2. Model caching for production#

For production environments, pre-download models to a persistent cache:

2.1 Download model to cache#

# Create persistent cache directory
mkdir -p /path/to/model-cache

# Download model using the download-to-cache command
docker run --rm \
  -e HF_TOKEN=<YOUR_HUGGINGFACE_TOKEN> \
  -v /path/to/model-cache:/workspace/model-cache \
  amdenterpriseai/aim-meta-llama-llama-3-1-8b-instruct:0.8.4 \
  download-to-cache --model-id meta-llama/Llama-3.1-8B-Instruct

2.2 Run with pre-cached model#

docker run \
  -e HF_TOKEN=<YOUR_HUGGINGFACE_TOKEN> \
  -v /path/to/model-cache:/workspace/model-cache \
  --device=/dev/kfd --device=/dev/dri \
  -p 8000:8000 \
  amdenterpriseai/aim-meta-llama-llama-3-1-8b-instruct:0.8.4

3. Monitoring and troubleshooting#

3.1 Getting help on the commands#

A general help command is available as follows:

docker run \
  amdenterpriseai/aim-meta-llama-llama-3-1-8b-instruct:0.8.4 \
  --help

A help command for specific subcommands is also available:

docker run \
  amdenterpriseai/aim-meta-llama-llama-3-1-8b-instruct:0.8.4 \
  <subcommand> --help

3.2 Enabling detailed logging#

docker run \
  -e AIM_LOG_LEVEL=DEBUG \
  -e HF_TOKEN=<YOUR_HUGGINGFACE_TOKEN> \
  --device=/dev/kfd --device=/dev/dri \
  -p 8000:8000 \
  amdenterpriseai/aim-meta-llama-llama-3-1-8b-instruct:0.8.4

3.3 Checking profile selection results#

It is possible to check which profile AIM selects based on the provided environment variables.

docker run \
  -e AIM_GPU_COUNT=1 \
  -e AIM_PRECISION=fp16 \
  -e AIM_GPU_MODEL=MI300X \
  -e HF_TOKEN=<YOUR_HUGGINGFACE_TOKEN> \
  amdenterpriseai/aim-meta-llama-llama-3-1-8b-instruct:0.8.4 \
  dry-run

3.4 List available profiles#

docker run \
  amdenterpriseai/aim-meta-llama-llama-3-1-8b-instruct:0.8.4 \
  list-profiles