Profiles#
Profiles are self-contained runtime configurations for AI inference workloads. A profile carries everything needed to deploy a model — accelerator requirements, resource requests, engine arguments, and the container image — without referencing any other resource.
Overview#
An AIMProfile answers five questions about a deployment without consulting any other resource:
Model architecture — What model family does this serve? (
aimId)Accelerator — What hardware is required? (
accelerator: type, model, count)Engine configuration — How should the inference engine be configured? (
engineArgs,engineEnv)Container image — What image runs the workload? (
image)Optimization target — Latency or throughput? At what precision? (
metric,precision,type)
Cluster vs Namespace Scope#
AIMClusterProfile#
Cluster-scoped profiles are installed by administrators, typically created during model discovery from AIM container images. They are visible across all namespaces.
Key characteristics:
Shared across all namespaces
Provide validated, production-ready runtime configurations
Can be created manually or, in the future, automatically during model discovery by a v1alpha2 model controller
AIMProfile#
Namespace-scoped profiles are created by ML engineers for custom configurations, fine-tuned models, or team-specific overrides.
Key characteristics:
Visible only within their namespace
Support namespace-specific secrets and authentication
Can enable model caching via
spec.caching.enabledUsed for custom weight deployments and per-team overrides
When both a namespace-scoped and cluster-scoped profile match, the namespace-scoped profile takes precedence.
Profile Specification#
apiVersion: aim.eai.amd.com/v1alpha2
kind: AIMClusterProfile
metadata:
name: qwen-qwen3-32b-mi300x-lat-fp8
spec:
aimId: qwen/qwen3-32b
modelId: qwen/qwen3-32b-fp8
engine: vllm
metric: latency
precision: fp8
type: optimized
primary: true
acceleratorModel: MI300X
acceleratorType: gpu
acceleratorCount: 1
resources: # optional override: cpu/memory requests
requests:
cpu: "4"
memory: 32Gi
image: amdenterpriseai/aim-qwen-qwen3-32b:0.8.5
engineArgs:
distributed_executor_backend: mp
gpu-memory-utilization: "0.95"
engineEnv:
VLLM_DO_NOT_TRACK: "1"
modelSources:
- modelId: qwen/qwen3-32b-fp8
sourceUri: hf://qwen/qwen3-32b-fp8
Spec Fields#
Field |
Description |
|---|---|
|
Model architecture identifier (e.g., |
|
Specific model variant or HuggingFace URI (e.g., |
|
Unique identifier for the source profile within an AIM container image (e.g., |
|
Inference engine identifier (e.g., |
|
Optimization target: |
|
Numeric precision: |
|
Optimization level. Hierarchy: |
|
Marks this as the recommended profile for its model and configuration. See Primary Profiles. Defaults to |
|
Accelerator identifier for node selection (e.g., |
|
|
|
Number of accelerator units required. Combined with |
|
Optional override for Kubernetes |
|
Required. Deployment container image. For purpose-built profiles: the full AIM image. For custom weight profiles: the base image. |
|
Inference engine CLI arguments as a free-form JSON object. Supports typed values (integers, floats, booleans, strings). Converted to |
|
Environment variables passed directly to the inference engine process. These are distinct from |
|
Model artifact sources with download URIs. Populated during discovery or set manually. |
|
Container-level environment variables for the AIM runtime process (K8s pod spec). |
|
Secrets for pulling container images. |
|
Service account for workloads. |
Namespace-Specific Fields#
These fields are only available on namespace-scoped AIMProfile, not on AIMClusterProfile.
Field |
Description |
|---|---|
|
Caching configuration. When |
Note
##Caching for cluster profiles
Cluster-scoped profiles do not have a caching field. Caching support for AIMClusterProfile via a dedicated AIMProfileCache resource is planned.
Accelerator and Node Affinity#
Three flat spec fields describe the hardware accelerator:
Field |
Description |
|---|---|
|
Accelerator identifier for node selection (e.g., |
|
|
|
Number of accelerator units required (e.g., GPU count or CPU core count). |
Node selection#
The profile specifies a single acceleratorModel string. AIM Engine constructs one label key and uses the Exists operator for node affinity:
acceleratorModel: MI300X → feature.node.kubernetes.io/aim-accelerator.MI300X (Exists)
The AcceleratorDetector (DaemonSet) labels each node with all applicable identifiers. For example, a node with an MI300X GPU and EPYC 9575F CPU gets:
feature.node.kubernetes.io/aim-accelerator.MI300X: "8" # GPU model + count
feature.node.kubernetes.io/aim-accelerator.EPYC_ZEN5: "128" # CPU arch + core count
A profile targeting MI300X matches that node (exact). A fallback profile targeting EPYC_ZEN5 also matches any Zen5 EPYC node. AIM Engine is fully blind — it constructs one label key from acceleratorModel and sets operator: Exists; the label value (accelerator count) is informational only.
Resource derivation#
AIM Engine computes default ResourceRequirements from acceleratorType, acceleratorCount, and cluster-level configuration (e.g., DCM ConfigMap for GPU partitioning modes, vendor-specific resource names), then merges any spec.resources override on top. The result is written to status.resources — the definitive resource requirements used for deployment.
spec:
acceleratorModel: MI300X
acceleratorType: gpu
acceleratorCount: 1 # engine computes: amd.com/gpu: "1"
resources: # optional override: cpu/memory
requests:
cpu: "4"
memory: 32Gi
The operator checks cluster nodes against both the accelerator model label and status.resources capacity. If no node matches, the profile status becomes NotAvailable.
Partitioned GPUs#
For partitioned GPU configurations (e.g., CPX-NPS4, MIG), override the derived device resource in spec.resources:
spec:
acceleratorModel: MI300X
acceleratorType: gpu
acceleratorCount: 0 # suppresses default amd.com/gpu derivation
resources:
requests:
amd.com/cpx-nps4: "4" # partition-specific device resource
Profile Status#
Status Fields#
Field |
Type |
Description |
|---|---|---|
|
enum |
|
|
string |
Extracted from |
|
int32 |
Count of cluster nodes matching accelerator labels and resource requests |
|
string |
Human-readable summary (e.g., |
|
ResourceRequirements |
Definitive resource requirements used for deployment: defaults computed from accelerator fields + cluster config, merged with any spec.resources override |
|
NodeAffinity |
Computed node affinity rules for pod scheduling |
|
[]Condition |
Standard Kubernetes conditions |
Status Lifecycle#
Pending — Profile created, not yet reconciled
Ready — At least one cluster node matches the accelerator labels and has sufficient resource capacity. For profiles without accelerator requirements (CPU-only), the profile is always
Ready.Degraded — The controller encountered a transient error (e.g., failed to list cluster nodes). The profile will be re-evaluated automatically.
NotAvailable — No cluster nodes match the profile’s hardware requirements. The profile becomes
Readyautomatically when matching nodes are added to the cluster.
Conditions#
HardwareAvailable: Reports whether the cluster has nodes matching the profile’s requirements.
Status |
Reason |
Description |
|---|---|---|
|
|
Matching nodes found in cluster |
|
|
No accelerator requirements — always available |
|
|
No cluster nodes match accelerator labels and resource requests |
The profile controller watches node events and re-evaluates hardware availability when nodes are added, removed, or their GPU labels change.
Primary Profiles#
AIM container images are built by profile authors — the team that benchmarks models on specific hardware and publishes validated runtime configurations. A single image may contain many profiles covering different precisions, optimization targets, and GPU configurations. The primary field marks the profile that the authors recommend as the default for a given model and hardware combination.
When primary: true:
The profile is selected by default when deploying a service without an explicit profile or template reference
The profile is used as the base configuration when onboarding custom model weights that match the same
aimIdandprecisionThe profile is preferred when multiple candidates match during automatic selection
Non-primary profiles remain available for explicit selection but are not considered during automatic selection by default.
Examples#
Cluster Profile — Latency Optimized#
apiVersion: aim.eai.amd.com/v1alpha2
kind: AIMClusterProfile
metadata:
name: qwen-qwen3-32b-mi300x-lat-fp8
spec:
aimId: qwen/qwen3-32b
modelId: qwen/qwen3-32b-fp8
engine: vllm
metric: latency
precision: fp8
type: optimized
primary: true
acceleratorModel: MI300X
acceleratorType: gpu
acceleratorCount: 1
image: amdenterpriseai/aim-qwen-qwen3-32b:0.8.5
engineArgs:
gpu-memory-utilization: "0.95"
Namespace Profile — Custom Weights#
apiVersion: aim.eai.amd.com/v1alpha2
kind: AIMProfile
metadata:
name: my-finetuned-qwen-mi300x-lat-fp8
namespace: ml-team
spec:
aimId: qwen/qwen3-32b
modelId: my-org/qwen-finetuned-fp8
engine: vllm
metric: latency
precision: fp8
type: general
primary: false
acceleratorModel: MI300X
acceleratorType: gpu
acceleratorCount: 2
resources:
requests:
cpu: "8"
memory: 64Gi
image: amdenterpriseai/aim-base:0.8.5
engineArgs:
distributed_executor_backend: mp
gpu-memory-utilization: "0.90"
tensor-parallel-size: "2"
modelSources:
- modelId: my-org/qwen-finetuned-fp8
sourceUri: s3://my-bucket/fp8-weights/
precision: fp8
CPU-Only Profile#
apiVersion: aim.eai.amd.com/v1alpha2
kind: AIMProfile
metadata:
name: small-model-cpu
namespace: dev-team
spec:
aimId: microsoft/phi-2
engine: vllm
metric: latency
precision: fp32
type: general
primary: false
image: amdenterpriseai/aim-phi-2:0.8.5
Profiles without accelerator fields (acceleratorModel, acceleratorType, acceleratorCount) and without extended device resource requests are treated as CPU-only and are always Ready.
Troubleshooting#
Profile Stuck in NotAvailable#
The profile’s accelerator requirements don’t match any cluster node:
# Check which nodes the profile expects
kubectl get aimclusterprofile <name> -o jsonpath='{.status.resolvedNodeAffinity}' | jq
# Check resolved resources
kubectl get aimclusterprofile <name> -o jsonpath='{.status.resources}' | jq
# Check matching node count
kubectl get aimclusterprofile <name> -o jsonpath='{.status.matchingNodes}'
# List nodes with accelerator labels
kubectl get nodes -l feature.node.kubernetes.io/aim-accelerator.MI300X
Common causes:
GPU nodes not yet added to the cluster
AcceleratorDetector not labeling nodes
Wrong accelerator model name in the profile
Profile Shows Wrong Hardware Summary#
The hardware summary is derived from acceleratorModel and acceleratorCount. Verify the accelerator fields:
kubectl get aimclusterprofile <name> -o yaml | grep -E 'accelerator(Model|Type|Count)'
Migration from Service Templates#
Profiles (v1alpha2) replace Service Templates (v1alpha1). Both API versions coexist during the transition period — existing v1alpha1 Service Templates continue to work. New deployments should use profiles.
Key differences for migrating users:
Profiles carry their own container image (
spec.image), removing the dependency on model lookupsHardware requirements use flat accelerator fields (
acceleratorModel,acceleratorType,acceleratorCount). The controller derives Kubernetes device resource requests from the accelerator count. Partitioned GPUs (e.g.,amd.com/cpx-nps4) are supported via explicitresourcesoverridesPrecision is always explicit — there is no
autovalueA
generaloptimization tier is available betweenoptimizedandpreview
See the Service Templates documentation for v1alpha1 usage.