AIM Engine#

AIM (AMD Inference Microservice) Engine is a Kubernetes operator that simplifies the deployment and management of AI inference workloads on AMD GPUs. It provides a declarative, cloud-native approach to running ML models at scale.

Note

The aim.eai.amd.com/v1alpha2 API introduces new capabilities for AIM Engine, including Profiles as self-contained runtime configurations and profile-based deployment and selection on AIMService (replacing the v1alpha1 template path over time).

These features are under development and have not been fully validated across all deployment scenarios; schema and behavior may change in future releases.

Quick Example#

Deploy an inference service with a single resource:

apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMService
metadata:
  name: qwen-chat
spec:
  model:
    image: amdenterpriseai/aim-qwen-qwen3-32b:0.8.5

AIM images (like amdenterpriseai/aim-qwen-qwen3-32b) are container images that package open-source models optimized for AMD Instinct GPUs. Each image includes the model weights and a serving runtime tuned for specific GPU configurations and precision modes.

AIM Engine automatically resolves the model, selects an optimal runtime configuration for your hardware, deploys a KServe InferenceService, and optionally creates HTTP routing through Gateway API.

Where to Start#

Cluster Administrators

Install AIM Engine, configure KServe, manage GPU resources, and set up cluster-wide defaults.

Developers & Integrators

Deploy inference services, configure scaling, set up routing, and integrate with your applications.

AI Practitioners

Browse the model catalog, deploy models for experimentation, and tune inference parameters.

Key Features#

  • Simple Service Deployment – Deploy inference endpoints with minimal configuration using AIMService resources

  • Automatic Optimization – Smart profile selection picks the best runtime configuration based on GPU availability, precision, and optimization goals

  • Model Catalog – Maintain a catalog of available models with automatic discovery from container registries

  • Model Caching – Pre-download model artifacts to shared PVCs for faster startup and reduced bandwidth

  • HTTP Routing – Expose services through Gateway API with customizable path templates

  • Autoscaling – KEDA integration with OpenTelemetry metrics for demand-based scaling

  • Multi-tenancy – Namespace-scoped and cluster-scoped resources for flexible team isolation

Documentation#

Getting Started#

  • Installation – Prerequisites and Helm chart installation

  • Quickstart – Deploy your first model in 5 minutes

  • Architecture – High-level architecture and component overview

Guides#

Task-oriented walkthroughs for common workflows:

Administration#

Concepts#

  • AIM Services – Service deployment lifecycle, template selection, and caching

  • AIM Models – Model catalog, discovery, and resolution

  • Model Sources – Automatic model discovery from container registries

  • Profiles – Self-contained runtime configurations (v1alpha2, recommended)

  • Service Templates – Runtime profiles and discovery cache (v1alpha1, deprecated)

  • Runtime Configuration – Storage defaults, routing, and environment resolution

  • Model Caching – Cache hierarchy, ownership, and deletion behavior

  • Resource Lifecycle – Ownership, finalizers, and deletion behavior

Reference#