AIM Engine#

AIM (AMD Inference Microservice) Engine is a Kubernetes operator that simplifies the deployment and management of AI inference workloads on AMD GPUs. It provides a declarative, cloud-native approach to running ML models at scale.

Quick Example#

Deploy an inference service with a single resource:

apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMService
metadata:
  name: qwen-chat
spec:
  model:
    image: amdenterpriseai/aim-qwen-qwen3-32b:0.8.5

AIM images (like amdenterpriseai/aim-qwen-qwen3-32b) are container images that package open-source models optimized for AMD Instinct GPUs. Each image includes the model weights and a serving runtime tuned for specific GPU configurations and precision modes.

AIM Engine automatically resolves the model, selects an optimal runtime configuration for your hardware, deploys a KServe InferenceService, and optionally creates HTTP routing through Gateway API.

Where to Start#

  • Cluster AdministratorsInstallation covers prerequisites, KServe setup, GPU configuration, and cluster-wide defaults.

  • Developers & IntegratorsQuickstart gets you from zero to a running inference endpoint in 5 minutes.

  • Data ScientistsModel Catalog lets you browse available models and deploy them for experimentation.

Key Features#

  • Simple Service Deployment — Deploy inference endpoints with minimal configuration using AIMService resources

  • Automatic Optimization — Smart template selection picks the best runtime profile based on GPU availability, precision, and optimization goals

  • Model Catalog — Maintain a catalog of available models with automatic discovery from container registries

  • Model Caching — Pre-download model artifacts to shared PVCs for faster startup and reduced bandwidth

  • HTTP Routing — Expose services through Gateway API with customizable path templates

  • Autoscaling — KEDA integration with OpenTelemetry metrics for demand-based scaling

  • Multi-tenancy — Namespace-scoped and cluster-scoped resources for flexible team isolation