AIM Models#
AIM Model resources form a catalog that maps model identifiers to specific container images. This document explains the model resource types, discovery mechanism, and lifecycle.
Overview#
Model resources serve two purposes:
Registry: Translate abstract model references into concrete container images
Version control: Update which container serves a model without changing service configurations
Cluster vs Namespace Scope#
AIMClusterModel#
Cluster-scoped models are typically installed by administrators through GitOps workflows or Helm charts. They represent curated model catalogs maintained by platform teams or model publishers.
Cluster models provide a consistent baseline across all namespaces. Any namespace can reference a cluster model unless it defines a namespace-scoped model with the same name, which takes precedence.
Discovery for cluster models runs in the operator namespace (default: aim-system). Auto-generated templates are created as cluster-scoped resources. When a cluster model uses the v1alpha2 API, discovery also creates AIMClusterProfiles.
AIMModel#
Namespace-scoped models allow teams to:
Define team-specific model variants
Override cluster-level definitions for testing
Control model access at the namespace level
When both cluster and namespace models exist with the same metadata.name, the namespace resource takes precedence within that namespace.
Discovery for namespace models runs in the model’s namespace. Auto-generated templates are created as namespace-scoped resources. When a namespace model uses the v1alpha2 API, discovery also creates AIMProfiles.
Model Specification#
An AIM Model uses metadata.name as the canonical model identifier:
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMClusterModel
metadata:
name: qwen-qwen3-32b
spec:
image: amdenterpriseai/aim-qwen-qwen3-32b:0.8.5
discovery:
extractMetadata: true
createServiceTemplates: true
resources:
limits:
cpu: "8"
memory: 64Gi
requests:
cpu: "4"
memory: 32Gi
Fields#
Field |
Purpose |
|---|---|
|
Container image URI implementing this model. The operator inspects this image during discovery. |
|
Controls metadata extraction and automatic template generation. Discovery is attempted automatically. |
|
When true (default), creates ServiceTemplates from recommended deployments published by the image. |
|
Default template name to use when services reference this model without specifying a template. Optional. |
|
Secrets for pulling the container image during discovery and inference. Must exist in the same namespace as the model (or operator namespace for cluster models). |
|
Service account to use for discovery jobs and metadata extraction. If empty, uses the default service account. |
|
Default resource requirements. These serve as baseline values that templates and services can override. |
Discovery Mechanism#
Discovery is an automatic process that extracts metadata from container images and creates templates.
Discovery Process#
When discovery is enabled:
Registry Inspection: The controller directly queries the container registry using the operator’s network context and any configured imagePullSecrets
Image Metadata Fetch: Using go-containerregistry, the controller pulls image metadata (labels) without downloading the full image
Metadata Storage: Extracted metadata is written to
status.imageMetadataTemplate Generation: If
createServiceTemplates: true, the controller examines the image’s recommended deployments and creates corresponding ServiceTemplate resources
Expected Labels#
AIM discovery looks for container image labels with the following prefix:
com.amd.aim.model.canonicalNamecom.amd.aim.model.deploymentsImages without these labels will have minimal metadata. IfcreateServiceTemplates: truebut norecommendedDeploymentsare found, no templates are created.
Lifecycle and Status#
Status Field#
The status field tracks discovery progress:
Field |
Description |
|---|---|
|
Enum: |
|
Detailed conditions including |
|
Metadata about the runtime config that was resolved (name, namespace, scope, UID) |
|
Extracted metadata from the container image including model info, OCI metadata, and |
Status Values#
Pending: Initial state, waiting for reconciliation
Progressing: Discovery job running or templates being created
Ready: Discovery succeeded and all auto-generated templates are healthy
Degraded: Discovery succeeded but some templates have issues
Failed: Discovery failed or required labels missing
Conditions#
RuntimeConfigReady: Reports runtime config resolution status. Common reasons:
ConfigFound: Runtime configuration was successfully resolvedDefaultConfigNotFound: No default runtime config found (non-fatal)ConfigNotFound: Explicitly referenced runtime config not found
ImageMetadataReady: Reports image inspection status. Common reasons:
ImageMetadataFound: Metadata extraction succeededImageFound: Image is reachable, but metadata labels are missingMetadataExtractionFailed: Failed to extract metadata from the image
Toggling Discovery#
You can enable discovery after image creation:
kubectl edit aimclustermodel qwen-qwen3-32b
# Set spec.discovery.extractMetadata: true
The controller runs extraction on the next reconciliation and updates status accordingly.
Disabling discovery after templates exist leaves templates in place. Existing templates are not deleted automatically.
Resource Resolution#
When services reference a model, the controller merges resources from multiple sources:
Service-level:
AIMService.spec.resources(highest precedence)Template-level:
AIMServiceTemplate.spec.resourcesModel-level:
AIMModel.spec.resources(baseline)
If GPU quantities remain unset after merging, the controller copies them from discovery metadata recorded on the template (status.profile.metadata.gpu_count).
Model Lookup#
For namespace-scoped lookups (from templates or services in a namespace):
Check for
AIMModelin the same namespaceFall back to
AIMClusterModelwith the same name
This allows namespace models to override cluster baselines.
Examples#
Cluster Model with Discovery#
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMClusterModel
metadata:
name: qwen-qwen3-32b
spec:
image: amdenterpriseai/aim-qwen-qwen3-32b:0.8.5
runtimeConfigName: platform-default
discovery:
extractMetadata: true
createServiceTemplates: true
resources:
limits:
cpu: "8"
memory: 64Gi
requests:
cpu: "4"
memory: 32Gi
Namespace Model Without Discovery#
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMModel
metadata:
name: qwen-qwen3-32b-dev
namespace: ml-team
spec:
image: amdenterpriseai/aim-qwen-qwen3-32b:0.8.5
runtimeConfigName: ml-team
defaultServiceTemplate: custom-template-name
discovery:
extractMetadata: false # skip image metadata extraction
createServiceTemplates: false
resources:
limits:
cpu: "6"
memory: 48Gi
Enabling Discovery for Private Container Images#
# Secret in namespace
apiVersion: v1
kind: Secret
metadata:
name: private-registry
namespace: ml-team
type: kubernetes.io/dockerconfigjson
data:
.dockerconfigjson: BASE64_CONFIG
---
# Runtime config in namespace
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMRuntimeConfig
metadata:
name: default
namespace: ml-team
spec:
serviceAccountName: aim-runtime
imagePullSecrets:
- name: private-registry
---
# Model with discovery
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMModel
metadata:
name: proprietary-model
namespace: ml-team
spec:
image: private.registry/models/proprietary:v1
runtimeConfigName: default # uses config above
discovery:
extractMetadata: true
createServiceTemplates: true
Troubleshooting#
Discovery Fails#
Check the operator logs for registry access errors:
kubectl -n aim-system logs -l app.kubernetes.io/name=aim-engine --tail=100 | grep -i "<model-name>"
Common causes:
Missing or invalid imagePullSecrets (secrets must exist in operator namespace for cluster models)
Image doesn’t exist or tag is invalid
Network connectivity issues to the registry
Templates Not Auto-Created#
Check the model status:
kubectl get aimclustermodel <name> -o yaml
# or
kubectl -n <namespace> get aimmodel <name> -o yaml
Look for:
discovery.extractMetadata: false- metadata extraction is disableddiscovery.createServiceTemplates: false- auto-template creation is disabledModel condition reasons such as
NoTemplatesExpectedorCreatingTemplates
ImageMetadataReady Condition False#
The container image is missing required labels or the discovery job failed. Check:
kubectl get aimclustermodel <name> -o jsonpath='{.status.conditions[?(@.type=="ImageMetadataReady")]}'
Inspect the container image labels:
docker pull <image>
docker inspect <image> --format='{{json .Config.Labels}}'
Auto-Creation from Services#
When a service uses spec.model.image directly (instead of spec.model.name), AIM automatically creates a model resource if one doesn’t already exist with that image URI. Auto-created models are namespace-scoped.
Discovery for Auto-Created Models#
The runtime config’s spec.model.autoDiscovery field controls whether auto-created models run discovery:
spec:
model:
autoDiscovery: true # auto-created models run discovery and create templates
Example#
Service using direct image reference:
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMService
metadata:
name: my-service
namespace: ml-team
spec:
model:
image: amdenterpriseai/aim-qwen-qwen3-32b:0.8.5
runtimeConfigName: default
If the runtime config has autoDiscovery: true, AIM creates a namespace-scoped model and discovery runs automatically:
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMModel
metadata:
name: auto-<hash-of-image>
namespace: ml-team
spec:
image: amdenterpriseai/aim-qwen-qwen3-32b:0.8.5
discovery:
extractMetadata: true
createServiceTemplates: true
Custom Models#
Custom models allow you to deploy models from external sources (S3, HuggingFace) without requiring a pre-built AIM container image. The AIM operator uses a generic base container that downloads model weights at runtime.
Overview#
Unlike image-based models where model weights are embedded in the container image, custom models:
Download weights from external sources (S3 or HuggingFace)
Use the
amdenterpriseai/aim-basecontainer for inferenceSkip discovery (no image metadata extraction needed)
Require explicit hardware specifications
Creating Custom Models#
There are two ways to create custom models:
1. Direct AIMModel with modelSources#
Create an AIMModel or AIMClusterModel with modelSources instead of relying on image discovery:
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMModel
metadata:
name: my-custom-qwen
namespace: ml-team
spec:
image: amdenterpriseai/aim-base:latest
modelSources:
- modelId: Qwen/Qwen3-32B
sourceUri: s3://my-bucket/models/qwen3-32b
# size: 16Gi # Optional - auto-discovered by download job if omitted
custom:
hardware:
gpu:
requests: 1
models:
- MI300X
2. Inline Custom Model in AIMService#
Create an AIMService with spec.model.custom to auto-create a custom model:
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMService
metadata:
name: my-qwen-service
namespace: ml-team
spec:
model:
custom:
baseImage: amdenterpriseai/aim-base:latest
modelSources:
- modelId: Qwen/Qwen3-32B
sourceUri: hf://Qwen/Qwen3-32B
# size is optional - auto-discovered by download job
hardware:
gpu:
requests: 1
template:
allowUnoptimized: true # Required - custom models default to unoptimized
The service automatically creates a namespace-scoped AIMModel. Custom models are shared resources that persist independently of the service, allowing them to be reused by other services or manually managed.
Model Sources#
Each model source specifies:
Field |
Required |
Description |
|---|---|---|
|
Yes |
Canonical identifier in |
|
Yes |
Download location. Schemes: |
|
No |
Storage size for PVC provisioning. If omitted, the download job automatically discovers the size. Can be set explicitly to pre-allocate storage. |
|
No |
Per-source credential overrides (e.g., |
Hardware Requirements#
Custom models require explicit hardware specifications since discovery doesn’t run.
These go under spec.custom.hardware for AIMModel, or spec.model.custom.hardware for inline AIMService:
# For AIMModel:
spec:
custom:
hardware:
gpu:
requests: 2 # Number of GPUs required
models: # Optional: specific GPU models for node affinity
- MI300X
- MI250
minVram: 64Gi # Optional: minimum VRAM per GPU for capacity planning
cpu:
requests: "4" # Required if cpu field is specified: CPU requests
limits: "8" # Optional: CPU limits
If no models are specified, the workload can run on any available GPU. The minVram field is used for capacity planning when the model size is known.
Template Generation#
When modelSources is specified:
Without custom.templates: A single template is auto-generated using
custom.hardwareWith custom.templates: Templates are created per entry, each inheriting from
custom.hardwareunless overridden
Templates also inherit the type field from spec.custom.type, which defaults to unoptimized. This can be overridden per-template via customTemplates[].type.
spec:
modelSources:
- modelId: Qwen/Qwen3-32B
sourceUri: s3://bucket/model
custom:
type: unoptimized # Default - can be omitted
hardware:
gpu:
requests: 1
templates:
- name: high-memory # Generated as {modelName}-custom-[{name}][-{precision}][-{gpu}]-{hash}
hardware:
gpu:
requests: 2 # Override
env:
- name: VLLM_GPU_MEMORY_UTILIZATION
value: "0.95"
- name: standard
# Inherits hardware and type from custom.*
Custom Profiles on Custom Templates#
Custom templates can include a customProfile to tune inference engine behavior. When customProfile is set, aimId, modelId, hardware, profile.metric, and profile.precision are all required:
spec:
image: amdenterpriseai/aim-vllm-base:0.10.0
modelSources:
- modelId: my-org/llama-finetuned
sourceUri: s3://my-bucket/weights/
size: 16Gi
customTemplates:
- name: llama-custom-tuned
aimId: meta-llama/Llama-3-8B
modelId: meta-llama/Llama-3-8B
hardware:
gpu:
model: MI300X
requests: 1
profile:
metric: latency
precision: fp16
customProfile:
engineArgs:
dtype: float16
gpu-memory-utilization: 0.95
envVars:
PYTORCH_TUNABLEOP_ENABLED: "1"
The model controller creates an AIMServiceTemplate with the custom profile data. The template goes through the standard discovery flow and becomes available for services. See Custom Profiles for details on the lifecycle and configuration layers.
Unoptimized Templates and allowUnoptimized#
Custom models generate templates with type: unoptimized by default because no discovery job runs to validate performance characteristics. This has an important implication:
Services will not auto-select unoptimized templates unless explicitly allowed.
When creating an AIMService that uses a custom model, you must either:
Set
allowUnoptimized: trueon the service’s template selector:
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMService
metadata:
name: my-service
spec:
model:
name: my-custom-model
template:
allowUnoptimized: true # Required for custom model templates
Explicitly specify the template name to bypass auto-selection:
spec:
template:
name: my-custom-model-custom-abc123 # Explicit template name
This safety mechanism prevents accidentally deploying unoptimized configurations in production. See Template Resolution for more details on how templates are selected and the role of optimization levels.
Authentication#
Configure credentials for private sources:
HuggingFace#
spec:
modelSources:
- modelId: Qwen/Qwen3-32B
sourceUri: hf://Qwen/Qwen3-32B
size: 16Gi
env:
- name: HF_TOKEN
valueFrom:
secretKeyRef:
name: hf-credentials
key: token
S3-Compatible Storage#
spec:
modelSources:
- modelId: my-org/custom-model
sourceUri: s3://my-bucket/models/custom
size: 32Gi
env:
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: s3-credentials
key: access-key
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: s3-credentials
key: secret-key
- name: AWS_ENDPOINT_URL
value: "https://s3.my-provider.com"
Lifecycle Differences#
Aspect |
Image-Based Models |
Custom Models |
|---|---|---|
Model weights |
source URI embedded in image |
source URI in spec |
Discovery |
Runs to extract metadata |
Skipped |
Hardware |
Optional (from discovery) |
Required |
Templates |
Auto-generated from image labels |
Auto-generated from spec |
Caching |
Uses shared template cache |
Uses dedicated template cache |
Status#
Custom models report sourceType: Custom in their status:
status:
status: Ready
sourceType: Custom
conditions:
- type: Ready
status: "True"
Example: Full Custom Model Deployment#
# Secret for HuggingFace access
apiVersion: v1
kind: Secret
metadata:
name: hf-token
namespace: ml-team
type: Opaque
stringData:
token: hf_xxxxxxxxxxxxx
---
# Custom model service
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMService
metadata:
name: qwen-custom
namespace: ml-team
spec:
model:
custom:
modelSources:
- modelId: Qwen/Qwen3-32B
sourceUri: hf://Qwen/Qwen3-32B
# size is optional - auto-discovered by download job
env:
- name: HF_TOKEN
valueFrom:
secretKeyRef:
name: hf-token
key: token
hardware:
gpu:
requests: 1
models:
- MI300X
template:
allowUnoptimized: true # Required - custom models default to unoptimized
replicas: 1
Fine-Tuned Models#
Fine-tuned models are a specialization of custom models where the user has custom weights for a known base model. Instead of requiring explicit hardware specifications and customTemplates, fine-tuned models use aimId-based template matching to automatically inherit runtime configuration from existing official templates.
Overview#
When an AIMModel specifies spec.aimId together with spec.modelSources, the controller treats it as a fine-tuned model and performs automatic template matching:
Finds official templates whose
spec.aimIdmatches the model’sspec.aimIdFilters by version according to
spec.custom.versionPolicyMatches by
modelId— the template’sspec.modelIdmust equal one of the model’smodelSources[].modelIdCreates template copies with hardware, engine args, and profile inherited from the official template, but with the custom weight source baked in. The controller stamps the resolved deployment image onto each copy as the
aim.eai.amd.com/deployment-image-refannotation, so different copies can target different base images when matched templates span owners with different versions or base families (see Deployment Image Resolution below)
Fine-Tuned vs Fully Custom#
Fine-Tuned Model |
Fully Custom Model |
|
|---|---|---|
|
Set — identifies the base model family |
Not set |
|
Matches an official template’s |
Arbitrary identifier |
Hardware |
Inherited from matched template |
Declared via |
|
Not required |
Required (or |
Example: Pinned Version#
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMModel
metadata:
name: my-finetuned-qwen
namespace: ml-team
spec:
image: amdenterpriseai/aim-base:0.8.5
aimId: qwen/qwen3-32b
modelSources:
- modelId: qwen/qwen3-32b-fp8
sourceUri: s3://my-bucket/weights/
The controller finds official templates for qwen/qwen3-32b, filters to those at version 0.8.5 (extracted from the image tag), matches the one whose modelId is qwen/qwen3-32b-fp8, and creates a copy with the custom sourceUri.
Example: Latest Version#
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMModel
metadata:
name: my-finetuned-qwen-latest
namespace: ml-team
spec:
aimId: qwen/qwen3-32b
modelSources:
- modelId: qwen/qwen3-32b-fp8
sourceUri: s3://my-bucket/weights/
custom:
versionPolicy: latest
With versionPolicy: latest, spec.image can be omitted. The controller resolves the deployment image per matched template — from the matched template’s owning model — and stamps it onto each generated copy.
Version Policy#
The spec.custom.versionPolicy field controls version filtering during template matching:
Policy |
|
Version Filter |
Deployment Image |
|---|---|---|---|
|
Required |
|
|
|
Optional |
Newest |
Resolved per-copy from the matched template owner’s |
|
Optional |
All versions accepted |
Resolved per-copy from the matched template owner’s |
Deployment Image Resolution#
Every official AIM model image declares the base image it was built from (e.g. ghcr.io/silogen/aim-base:0.11). The AIM image inspector reads the AIM_BASE_IMAGE_REF environment variable from the image’s OCI config at build time and records it on the owning model’s status.imageMetadata.baseImageRef.
The fine-tuned model’s spec.image is never patched. Instead, the controller resolves a deployment image once per matched template and stamps it onto the generated AIMServiceTemplate / AIMClusterServiceTemplate copy as the aim.eai.amd.com/deployment-image-ref annotation. AIMService reads this annotation when constructing the KServe InferenceService, falling back to AIMModel.spec.image only when the annotation is absent.
This per-copy resolution is necessary because matched templates may span owners with different base images — different architectures (e.g. aim-base vs aim-epyc-base) for the same aimId, or different versions under versionPolicy: any. Pinning a single image on the fine-tuned model would force every copy to share that image; annotating each copy individually keeps the deployments precisely aligned with the source they were derived from.
For each matched template, the controller:
Takes the matched template’s owning model (
AIMModelorAIMClusterModel).Reads the owner’s
status.imageMetadata.baseImageRef.Rebases that reference onto the owner’s
spec.imageregistry+org so the fine-tuned deployment pullsaim-basefrom the same place the base model was pulled from (see below).Stamps the result onto the generated copy as
aim.eai.amd.com/deployment-image-ref.
For versionPolicy: pinned with an explicit spec.image, the resolver short-circuits and stamps spec.image on every copy.
Registry rebasing. Official AIM images bake a docker.io/amdenterpriseai/aim-base:… reference into their OCI config, but operators frequently mirror both the base model and aim-base into a private registry. Rather than force fine-tuned deployments to reach back to Docker Hub, the resolver swaps the registry+org prefix of baseImageRef with the prefix of the source owner’s spec.image. Concretely:
Source owner |
Owner’s |
Resolved annotation on copy |
|---|---|---|
|
|
|
|
|
|
|
|
|
This means mirroring aim-base into the same org as the base model is sufficient — no cluster-wide configuration, no registry rewriting rules.
If a matched template’s owner has no resolvable image (no baseImageRef, no legacy fallback, owner not yet fetchable), the controller skips that copy and retries on the next reconcile. Image inspection is skipped on the fine-tuned model itself — it inherits its deployment plumbing from the matched templates’ owners, not from its own image metadata.
Legacy installs. baseImageRef is extracted from the base model’s image during metadata inspection, but inspection is skipped once status.imageMetadata is cached. Operators who upgraded past this feature therefore have existing base models with populated metadata but baseImageRef == "". For those owners the resolver falls back to synthesizing aim-base:MAJOR.MINOR from the owner’s spec.image tag (rebased onto its registry+org as above) so fine-tuned models keep working without a manual re-inspection. When the tag isn’t semver-shaped the fallback is skipped and that copy is omitted — clear the base model’s status.imageMetadata to force a fresh inspection if you need the real, image-declared reference.
Template Copies#
For each matched template, the controller creates a copy scoped to the owning model:
AIMModel (namespace-scoped) creates AIMServiceTemplate copies in the same namespace
AIMClusterModel (cluster-scoped) creates AIMClusterServiceTemplate copies
Copies inherit all configuration from the original template (hardware, profile, engine args, environment) and override:
spec.modelName— points to the fine-tuned modelspec.modelSources— uses the custom weight source from the fine-tuned modelLabels:
aim.eai.amd.com/model: <model-name>,aim.eai.amd.com/origin: fine-tunedAnnotation:
aim.eai.amd.com/deployment-image-ref: <resolved-image>— the imageAIMServicewill deploy for this copy
Each copy carries its own deployment image annotation, so heterogeneous matches (different versions under versionPolicy: any, or different base families under the same aimId) deploy with the correct image per copy.
Copies are owned by the model and garbage-collected when the model is deleted. The controller watches for new or deleted matching templates and reconciles copies accordingly.
Status#
Fine-tuned models report sourceType: Custom in their status, the same as fully custom models:
status:
status: Ready
sourceType: Custom
Inspect the generated AIMServiceTemplate copies (kubectl get aimservicetemplate -l aim.eai.amd.com/model=<model-name> -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.annotations.aim\.eai\.amd\.com/deployment-image-ref}{"\n"}{end}') to see the per-copy deployment image.
Note on Terminology#
AIM Model resources (AIMModel and AIMClusterModel) define the mapping between model identifiers and container images. While we sometimes refer to the “model catalog” conceptually, the Kubernetes resources are always AIMModel and AIMClusterModel.