AIM Engine#
AIM (AMD Inference Microservice) Engine is a Kubernetes operator that simplifies the deployment and management of AI inference workloads on AMD GPUs. It provides a declarative, cloud-native approach to running ML models at scale.
Quick Example#
Deploy an inference service with a single resource:
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMService
metadata:
name: qwen-chat
spec:
model:
image: amdenterpriseai/aim-qwen-qwen3-32b:0.8.5
AIM images (like amdenterpriseai/aim-qwen-qwen3-32b) are container images that package open-source models optimized for AMD Instinct GPUs. Each image includes the model weights and a serving runtime tuned for specific GPU configurations and precision modes.
AIM Engine automatically resolves the model, selects an optimal runtime configuration for your hardware, deploys a KServe InferenceService, and optionally creates HTTP routing through Gateway API.
Where to Start#
Cluster Administrators — Installation covers prerequisites, KServe setup, GPU configuration, and cluster-wide defaults.
Developers & Integrators — Quickstart gets you from zero to a running inference endpoint in 5 minutes.
Data Scientists — Model Catalog lets you browse available models and deploy them for experimentation.
Key Features#
Simple Service Deployment — Deploy inference endpoints with minimal configuration using
AIMServiceresourcesAutomatic Optimization — Smart template selection picks the best runtime profile based on GPU availability, precision, and optimization goals
Model Catalog — Maintain a catalog of available models with automatic discovery from container registries
Model Caching — Pre-download model artifacts to shared PVCs for faster startup and reduced bandwidth
HTTP Routing — Expose services through Gateway API with customizable path templates
Autoscaling — KEDA integration with OpenTelemetry metrics for demand-based scaling
Multi-tenancy — Namespace-scoped and cluster-scoped resources for flexible team isolation