Docker deployment#
This guide provides step-by-step instructions for deploying AMD Inference Microservice (AIM) container that supports different variants of Llama-3.1-8B-Instruct model. Follow these instructions to quickly get started with running an AI model on AMD GPUs. For more detailed information, please refer to development documentation.
Prerequisites#
AMD GPU with ROCm support (e.g., MI300X, MI325X)
Docker installed and configured with GPU support
Access to model repositories (Hugging Face account with appropriate permissions for gated models)
1. Docker deployment#
1.1 Running the container#
docker run \
-e HF_TOKEN=<YOUR_HUGGINGFACE_TOKEN> \
--device=/dev/kfd --device=/dev/dri \
-p 8000:8000 \
amdenterpriseai/aim-meta-llama-llama-3-1-8b-instruct:0.8.4
Where <YOUR_HUGGINGFACE_TOKEN> is your Hugging Face access token (required for gated models)
1.2 Customizing deployment with environment variables#
Customize your deployment with optional environment variables:
docker run \
-e AIM_PRECISION=fp16 \
-e AIM_GPU_COUNT=1 \
-e AIM_METRIC=throughput \
-e AIM_PORT=8080 \
--device=/dev/kfd --device=/dev/dri \
-p 8080:8080 \
amdenterpriseai/aim-meta-llama-llama-3-1-8b-instruct:0.8.4
2. Model caching for production#
For production environments, pre-download models to a persistent cache:
2.1 Download model to cache#
# Create persistent cache directory
mkdir -p /path/to/model-cache
# Download model using the download-to-cache command
docker run --rm \
-e HF_TOKEN=<YOUR_HUGGINGFACE_TOKEN> \
-v /path/to/model-cache:/workspace/model-cache \
amdenterpriseai/aim-meta-llama-llama-3-1-8b-instruct:0.8.4 \
download-to-cache --model-id meta-llama/Llama-3.1-8B-Instruct
2.2 Run with pre-cached model#
docker run \
-e HF_TOKEN=<YOUR_HUGGINGFACE_TOKEN> \
-v /path/to/model-cache:/workspace/model-cache \
--device=/dev/kfd --device=/dev/dri \
-p 8000:8000 \
amdenterpriseai/aim-meta-llama-llama-3-1-8b-instruct:0.8.4
3. Monitoring and troubleshooting#
3.1 Getting help on the commands#
A general help command is available as follows:
docker run \
amdenterpriseai/aim-meta-llama-llama-3-1-8b-instruct:0.8.4 \
--help
A help command for specific subcommands is also available:
docker run \
amdenterpriseai/aim-meta-llama-llama-3-1-8b-instruct:0.8.4 \
<subcommand> --help
3.2 Enabling detailed logging#
docker run \
-e AIM_LOG_LEVEL=DEBUG \
-e HF_TOKEN=<YOUR_HUGGINGFACE_TOKEN> \
--device=/dev/kfd --device=/dev/dri \
-p 8000:8000 \
amdenterpriseai/aim-meta-llama-llama-3-1-8b-instruct:0.8.4
3.3 Checking profile selection results#
It is possible to check which profile AIM selects based on the provided environment variables.
docker run \
-e AIM_GPU_COUNT=1 \
-e AIM_PRECISION=fp16 \
-e AIM_GPU_MODEL=MI300X \
-e HF_TOKEN=<YOUR_HUGGINGFACE_TOKEN> \
amdenterpriseai/aim-meta-llama-llama-3-1-8b-instruct:0.8.4 \
dry-run
3.4 List available profiles#
docker run \
amdenterpriseai/aim-meta-llama-llama-3-1-8b-instruct:0.8.4 \
list-profiles