Custom Profiles#

AMD Inference Microservice (AIM) supports custom profile configurations that extend beyond the built-in optimized and general profiles. Custom profiles enable users to define specialized configurations for unique hardware setups, model variants not supported by AIM, or specific performance requirements not covered by standard profiles.

Overview#

Custom profiles follow the same YAML structure as standard profiles but are placed in the /workspace/aim-runtime/profiles/custom/ directory within the container. On the users’ side, custom profiles can be placed in any folder, but must be mounted to the container at the specified path. When AIM starts, it scans the custom profiles directory first, so custom profiles take precedence over both model-specific and general profiles.

Key Features:

  • Highest Search Precedence: Custom profiles are prioritized over model-specific and general profiles

  • Flexible Deployment: Mount custom profiles via volumes

  • Experimental Safe: Test new configurations without building new AIM images

Custom profiles are ideal for performance tuning, hardware-specific optimizations, or deploying models that are not yet supported by AIM but are compatible with supported engines.

Creating Custom Profiles#

A profile can be defined as a YAML file. The file should adhere to the AIM profile schema. Please refer to the existing profiles for more examples.

Using Custom Profiles#

Assume you have a custom profile YAML for DeepSeek R1 Distill Qwen 32B model named vllm-mi300x-fp16-tp1-latency-custom.yaml placed in the folder deepseek-ai/DeepSeek-R1-Distill-Qwen-32B.

It contains the following:

aim_id: deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
model_id: deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
metadata:
  engine: vllm
  gpu: MI300X
  precision: fp16
  gpu_count: 8
  metric: latency
  manual_selection_only: false
  type: unoptimized
engine_args:
  gpu-memory-utilization: 0.95
  distributed_executor_backend: mp
  no-enable-chunked-prefill: null
  max-model-len: 32768
  dtype: float16
  tensor-parallel-size: 8
env_vars:
  VLLM_DO_NOT_TRACK: "1"
  VLLM_USE_TRITON_FLASH_ATTN: "0"
  HIP_FORCE_DEV_KERNARG: "1"
  NCCL_MIN_NCHANNELS: "112"
  TORCH_BLAS_PREFER_HIPBLASLT: "1"
  PYTORCH_TUNABLEOP_ENABLED: "1"
  PYTORCH_TUNABLEOP_VERBOSE: "1"
  PYTORCH_TUNABLEOP_TUNING: "0"

See Profile Structure chapter in the development documentation for details on each field.

Usage with Docker#

To use a custom profile with Docker, mount the directory containing the profile to /workspace/aim-runtime/profiles/custom/ in the container. Put the profile in your directory of choice. In the examples the current working directory is assumed. All profiles, including the custom ones, are validated against the AIM profile schema at runtime.

Running base image with custom profile#

docker run \
  -e AIM_MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Qwen-32B \
  -v $(pwd)/custom-profiles:/workspace/aim-runtime/profiles/custom \
  --device=/dev/kfd --device=/dev/dri \
  -p 8000:8000 \
  amdenterpriseai/aim-base:0.8

Usage with Kubernetes#

To use custom profiles in Kubernetes, you need to create a ConfigMap or volume containing your custom profiles and mount it to the /workspace/aim-runtime/profiles/custom/ path in the container.

Creating ConfigMap with Custom Profile#

First, create a ConfigMap containing your custom profile:

kubectl create configmap custom-profiles \
  --from-file=deepseek-ai/DeepSeek-R1-Distill-Qwen-32B/vllm-mi300x-fp16-tp1-latency-custom.yaml \
  -n YOUR_K8S_NAMESPACE

Example Deployment with Custom Profile#

Here’s an example Kubernetes deployment that uses a custom profile:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: aim-custom-profile-deployment
  labels:
    app: aim-custom-profile
spec:
  progressDeadlineSeconds: 3600
  replicas: 1
  selector:
    matchLabels:
      app: aim-custom-profile
  template:
    metadata:
      labels:
        app: aim-custom-profile
    spec:
      containers:
        - name: aim-custom-profile
          image: "amdenterpriseai/aim-base:0.8"
          imagePullPolicy: Always
          env:
            - name: AIM_MODEL_ID
              value: "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B"
            - name: HF_TOKEN
              valueFrom:
                secretKeyRef:
                  name: hf-token
                  key: hf-token
          ports:
            - name: http
              containerPort: 8000
          resources:
            requests:
              memory: "32Gi"
              cpu: "4"
              amd.com/gpu: "1"
            limits:
              memory: "32Gi"
              cpu: "4"
              amd.com/gpu: "1"
          startupProbe:
            httpGet:
              path: /v1/models
              port: http
            periodSeconds: 10
            failureThreshold: 120
          livenessProbe:
            httpGet:
              path: /health
              port: http
          readinessProbe:
            httpGet:
              path: /v1/models
              port: http
          volumeMounts:
            - name: ephemeral-storage
              mountPath: /tmp
            - name: dshm
              mountPath: /dev/shm
            - name: custom-profiles
              mountPath: /workspace/aim-runtime/profiles/custom
              readOnly: true
      volumes:
        - name: ephemeral-storage
          emptyDir:
            sizeLimit: 512Gi
        - name: dshm
          emptyDir:
            medium: Memory
            sizeLimit: 64Gi
        - name: custom-profiles
          configMap:
            name: custom-profiles

Example Service#

apiVersion: v1
kind: Service
metadata:
  name: aim-custom-profile-service
  labels:
    app: aim-custom-profile
spec:
  type: ClusterIP
  ports:
    - name: http
      port: 80
      targetPort: 8000
  selector:
    app: aim-custom-profile

Deployment and test commands#

Deploy pod and service configured on the previous step

kubectl apply -f . -n YOUR_K8S_NAMESPACE

Port forward the service to access it locally

kubectl port-forward service/aim-custom-profile-service 8000:80 -n YOUR_K8S_NAMESPACE

Test the inference endpoint

Make a request to the inference endpoint using curl:

curl http://localhost:8000/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B",
        "prompt": "San Francisco is a",
        "max_tokens": 7,
        "temperature": 0
    }'