AIM Engine

AIM Engine#

AIM (AMD Inference Microservice) Engine is a Kubernetes operator that simplifies the deployment and management of AI inference workloads on AMD GPUs. It provides a declarative, cloud-native approach to running ML models at scale.

Quick example#

Deploy an inference service in two resources:

apiVersion: aim.eai.amd.com/v1alpha2
kind: AIMClusterModel
metadata:
  name: qwen3-32b
spec:
  image: amdenterpriseai/aim-qwen-qwen3-32b:0.8.5
---
apiVersion: aim.eai.amd.com/v1alpha2
kind: AIMService
metadata:
  name: qwen-chat
  namespace: ml-team
  annotations:
    aim.eai.amd.com/reconciler-pipeline: profile
spec:
  model:
    name: qwen3-32b

AIM images (like amdenterpriseai/aim-qwen-qwen3-32b) package open-source models optimized for AMD Instinct GPUs. Each image includes the model weights and a serving runtime tuned for specific GPU configurations and precision modes.

The AIMModel runs discovery on the image and publishes one AIMProfile per supported (GPU, precision, metric) combination. The AIMService resolves to the best deployable profile for your hardware, pre-warms the model cache, and creates a KServe InferenceService.

Why the reconciler-pipeline: profile annotation?

AIMService dispatch is decided by spec shape, not by apiVersion. During the v1alpha1 → v1alpha2 migration window, spec.model.name and spec.model.image default to the legacy template pipeline so existing deployments keep working unchanged. The annotation forces this service onto the v1alpha2 profile pipeline, which resolves qwen3-32b to one of the AIMClusterProfiles produced by the AIMClusterModel above. The annotation becomes unnecessary once v1alpha1 is removed — see Migration window for the full dispatch table.

Three model flows#

How you onboard a model depends on its relationship to AMD’s published catalog:

Flow	Use when	Read more
Official	Deploying a published AMD-supported AIM model unmodified	AIM Models
Fine-tuned	Deploying a fine-tune of a published architecture	Fine-Tuned Models
Custom	Deploying a model whose architecture isn’t in the catalog	Custom Models

Where to start#

Cluster administrators

Install AIM Engine, configure KServe, manage GPU resources, and set up cluster-wide defaults.

Installation

Developers & integrators

Deploy inference services, configure scaling, set up routing, integrate with your applications.

Quickstart

Data scientists

Browse the model catalog, deploy fine-tunes or custom models, tune inference parameters.

Model Catalog

Key features#

Three-flow model onboarding — official AIM images, fine-tuned models, and custom models that bring their own weights, all expressed through a single AIMModel shape.
Profile-driven deployment — AIMService resolves to a self-contained AIMProfile with everything the runtime needs (image, accelerator, engine config, model sources).
Smart selection — pick a profile by name, by model, by selector, or by model+selector; the controller ranks candidates by primary > type > version.
Profile overlays — spec.profileOverrides rebases a published profile onto custom weights without forking it.
Model caching — pre-download artifacts to shared PVCs for faster startup; HuggingFace downloader falls back across XET / HF_TRANSFER / HTTP protocols.
HTTP routing — expose services through Gateway API with customizable path templates.
Autoscaling — KEDA integration with OpenTelemetry metrics for demand-based scaling.
Multi-tenancy — namespace-scoped and cluster-scoped resources for flexible team isolation.

Documentation#

Getting started#

Installation — Prerequisites and Helm chart installation
Quickstart — Deploy your first model in minutes
Architecture — Components, CRDs, and reconciliation flow

Guides#

Task-oriented walkthroughs for common workflows:

Deploying Services — Resolution shapes, scaling, routing, caching
Model Catalog — Browse, apply, and auto-discover models
Fine-Tuned Models — Derive deployable profiles from a published AIM model
Custom Models — Derive deployable profiles from a base image + custom weights
Scaling and Autoscaling — Replicas, KEDA, custom metrics
Model Caching — Cache modes and download protocols
Routing and Ingress — Gateway API patterns and path templates
Private Registries — Authentication for HuggingFace, S3, and OCI
Multi-Tenancy — Namespace isolation patterns

Administration#

Installation Reference — Full install reference with all Helm values
KServe Configuration — Install and configure KServe
GPU Management — GPU allocation, node selectors, topology
Storage Configuration — PVCs, shared storage for caching
Upgrading — Version migration and CRD upgrades
Monitoring — Metrics, observability, log formats
Troubleshooting — Common issues and diagnostic steps
Security — RBAC, network policies, secrets management

Concepts#

AIM Services — Resolution shapes, overlays, caching, status
AIM Models — The three model flows
Profiles — Self-contained runtime configurations
AIM Profile Sets — Derivation engine
Model Sources — Auto-discovery from container registries
Runtime Configuration — Storage defaults, routing, environment resolution
Model Caching — Cache hierarchy, ownership, deletion behavior
Accelerator Detection — How AIM Engine sees GPUs and CPUs
Resource Lifecycle — Ownership, finalizers, deletion behavior

Reference#

CRD API (v1alpha2) — API specification for Models, Profiles, ProfileSets, Services
CRD API (v1alpha1) — Legacy API specification
Helm Chart Values — All configurable Helm chart values
CLI and Operator Flags — Operator binary flags and endpoints
Environment Variables — Operator and downloader configuration
Naming and Labels — Derived naming algorithm and label conventions
Conditions — Full catalog of conditions across all CRDs

Legacy (v1alpha1)#

Overview — Deprecation timeline and what changed
Migrating to v1alpha2 — Field-by-field mapping and recipes
Service Templates (v1alpha1) — The deprecated runtime-profile shape
AIMService (v1alpha1) — The deprecated template-based service shape
AIMModel (v1alpha1) — The deprecated custom-model and customTemplates shapes