AIM Engine#
AIM (AMD Inference Microservice) Engine is a Kubernetes operator that simplifies the deployment and management of AI inference workloads on AMD GPUs. It provides a declarative, cloud-native approach to running ML models at scale.
Note
The aim.eai.amd.com/v1alpha2 API introduces new capabilities for AIM Engine, including Profiles as self-contained runtime configurations and profile-based deployment and selection on AIMService (replacing the v1alpha1 template path over time).
These features are under development and have not been fully validated across all deployment scenarios; schema and behavior may change in future releases.
Quick Example#
Deploy an inference service with a single resource:
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMService
metadata:
name: qwen-chat
spec:
model:
image: amdenterpriseai/aim-qwen-qwen3-32b:0.8.5
AIM images (like amdenterpriseai/aim-qwen-qwen3-32b) are container images that package open-source models optimized for AMD Instinct GPUs. Each image includes the model weights and a serving runtime tuned for specific GPU configurations and precision modes.
AIM Engine automatically resolves the model, selects an optimal runtime configuration for your hardware, deploys a KServe InferenceService, and optionally creates HTTP routing through Gateway API.
Where to Start#
Cluster Administrators
Install AIM Engine, configure KServe, manage GPU resources, and set up cluster-wide defaults.
Developers & Integrators
Deploy inference services, configure scaling, set up routing, and integrate with your applications.
AI Practitioners
Browse the model catalog, deploy models for experimentation, and tune inference parameters.
Key Features#
Simple Service Deployment – Deploy inference endpoints with minimal configuration using
AIMServiceresourcesAutomatic Optimization – Smart profile selection picks the best runtime configuration based on GPU availability, precision, and optimization goals
Model Catalog – Maintain a catalog of available models with automatic discovery from container registries
Model Caching – Pre-download model artifacts to shared PVCs for faster startup and reduced bandwidth
HTTP Routing – Expose services through Gateway API with customizable path templates
Autoscaling – KEDA integration with OpenTelemetry metrics for demand-based scaling
Multi-tenancy – Namespace-scoped and cluster-scoped resources for flexible team isolation
Documentation#
Getting Started#
Installation – Prerequisites and Helm chart installation
Quickstart – Deploy your first model in 5 minutes
Architecture – High-level architecture and component overview
Guides#
Task-oriented walkthroughs for common workflows:
Deploying Services – Deploy and manage inference endpoints
Model Catalog – Browse and select models
Scaling and Autoscaling – Replicas, KEDA, and metrics
Model Caching – Pre-cache models for faster startup
Routing and Ingress – Gateway API patterns and path templates
Private Registries – Authentication for HuggingFace, S3, and OCI
Multi-Tenancy – Namespace isolation patterns
Administration#
Installation Reference – Full install reference with all Helm values
KServe Configuration – Install and configure KServe
GPU Management – GPU allocation, node selectors, topology
Storage Configuration – PVCs, shared storage for caching
Upgrading – Version migration and CRD upgrades
Monitoring – Metrics, observability, and log formats
Troubleshooting – Common issues and diagnostic steps
Security – RBAC, network policies, and secrets management
Concepts#
AIM Services – Service deployment lifecycle, template selection, and caching
AIM Models – Model catalog, discovery, and resolution
Model Sources – Automatic model discovery from container registries
Profiles – Self-contained runtime configurations (v1alpha2, recommended)
Service Templates – Runtime profiles and discovery cache (v1alpha1, deprecated)
Runtime Configuration – Storage defaults, routing, and environment resolution
Model Caching – Cache hierarchy, ownership, and deletion behavior
Resource Lifecycle – Ownership, finalizers, and deletion behavior
Reference#
CRD API Reference (v1alpha2) – API specification for Profiles
CRD API Reference (v1alpha1) – API specification for all other custom resources
Helm Chart Values – All configurable Helm chart values
CLI and Operator Flags – Operator binary flags and endpoints
Environment Variables – Operator and downloader configuration
Naming and Labels – Derived naming algorithm and label conventions
Conditions – Full catalog of conditions across all CRDs