Profiles

Profiles#

Profiles are self-contained runtime configurations for AI inference workloads. A profile carries everything needed to deploy a model — accelerator requirements, resource requests, engine arguments, and the container image — without referencing any other resource.

Overview#

An AIMProfile answers five questions about a deployment without consulting any other resource:

Model architecture — What model family does this serve? (aimId)
Accelerator — What hardware is required? (accelerator: type, model, count)
Engine configuration — How should the inference engine be configured? (engineArgs, engineEnv)
Container image — What image runs the workload? (image)
Optimization target — Latency or throughput? At what precision? (metric, precision, type)

Cluster vs Namespace Scope#

AIMClusterProfile#

Cluster-scoped profiles are installed by administrators, typically created during model discovery from AIM container images. They are visible across all namespaces.

Key characteristics:

Shared across all namespaces
Provide validated, production-ready runtime configurations
Can be created manually or, in the future, automatically during model discovery by a v1alpha2 model controller

AIMProfile#

Namespace-scoped profiles are created by ML engineers for custom configurations, fine-tuned models, or team-specific overrides.

Key characteristics:

Visible only within their namespace
Support namespace-specific secrets and authentication
Can enable model caching via spec.caching.enabled
Used for custom weight deployments and per-team overrides

When both a namespace-scoped and cluster-scoped profile match, the namespace-scoped profile takes precedence.

Profile Specification#

apiVersion: aim.eai.amd.com/v1alpha2
kind: AIMClusterProfile
metadata:
  name: qwen-qwen3-32b-mi300x-lat-fp8
spec:
  aimId: qwen/qwen3-32b
  modelId: qwen/qwen3-32b-fp8
  engine: vllm
  metric: latency
  precision: fp8
  type: optimized
  primary: true

  acceleratorModel: MI300X
  acceleratorType: gpu
  acceleratorCount: 1
  resources:                          # optional override: cpu/memory requests
    requests:
      cpu: "4"
      memory: 32Gi

  image: amdenterpriseai/aim-qwen-qwen3-32b:0.8.5
  engineArgs:
    distributed_executor_backend: mp
    gpu-memory-utilization: "0.95"
  engineEnv:
    VLLM_DO_NOT_TRACK: "1"

  modelSources:
    - modelId: qwen/qwen3-32b-fp8
      sourceUri: hf://qwen/qwen3-32b-fp8

Spec Fields#

Field	Description
`aimId`	Model architecture identifier (e.g., `qwen/qwen3-32b`). Used to match profiles to models during automatic selection. Immutable after creation.
`modelId`	Specific model variant or HuggingFace URI (e.g., `qwen/qwen3-32b-fp8`). Determines the cache path and is used to match profiles to specific model weights.
`profileId`	Unique identifier for the source profile within an AIM container image (e.g., `vllm-mi300x-fp8-tp1-latency`). Set during discovery to trace this profile back to its origin in the container. Not required for manually created profiles.
`engine`	Inference engine identifier (e.g., `vllm`, `tgi`).
`metric`	Optimization target: `latency` (interactive) or `throughput` (batch processing).
`precision`	Numeric precision: `fp4`, `fp8`, `fp16`, `fp32`, `bf16`, `int4`, `int8`.
`type`	Optimization level. Hierarchy: `optimized` > `general` > `preview` > `unoptimized`.
`primary`	Marks this as the recommended profile for its model and configuration. See Primary Profiles. Defaults to `false`.
`acceleratorModel`	Accelerator identifier for node selection (e.g., `MI300X`, `CDNA3`, `EPYC_9965`). See Accelerator and Node Affinity.
`acceleratorType`	`gpu` or `cpu`. Determines resource derivation strategy. AIM Engine computes default resource requests from this field combined with `acceleratorCount` and cluster-level configuration.
`acceleratorCount`	Number of accelerator units required. Combined with `acceleratorType` and cluster config to compute default resource requests in `status.resources`.
`resources`	Optional override for Kubernetes `ResourceRequirements`. When set, merged on top of the defaults that AIM Engine computes. The resolved result is in `status.resources`.
`image`	Required. Deployment container image. For purpose-built profiles: the full AIM image. For custom weight profiles: the base image.
`engineArgs`	Inference engine CLI arguments as a free-form JSON object. Supports typed values (integers, floats, booleans, strings). Converted to `--key value` flags by the AIM runtime.
`engineEnv`	Environment variables passed directly to the inference engine process. These are distinct from `containerEnv`, which sets variables on the outer container.
`modelSources`	Model artifact sources with download URIs. Populated during discovery or set manually.
`containerEnv`	Container-level environment variables for the AIM runtime process (K8s pod spec).
`imagePullSecrets`	Secrets for pulling container images.
`serviceAccountName`	Service account for workloads.

Namespace-Specific Fields#

These fields are only available on namespace-scoped AIMProfile, not on AIMClusterProfile.

Field	Description
`caching`	Caching configuration. When `caching.enabled` is `true`, model artifacts are pre-downloaded to a PVC on startup.

Note

##Caching for cluster profiles Cluster-scoped profiles do not have a caching field. Caching support for AIMClusterProfile via a dedicated AIMProfileCache resource is planned.

Accelerator and Node Affinity#

Three flat spec fields describe the hardware accelerator:

Field	Description
`acceleratorModel`	Accelerator identifier for node selection (e.g., `MI300X`, `CDNA3`, `EPYC_ZEN5`). Maps to a node label key with an `Exists` selector.
`acceleratorType`	`gpu` or `cpu`. Determines how AIM Engine derives Kubernetes resource requests.
`acceleratorCount`	Number of accelerator units required (e.g., GPU count or CPU core count).

Node selection#

The profile specifies a single acceleratorModel string. AIM Engine constructs one label key and uses the Exists operator for node affinity:

acceleratorModel: MI300X  →  feature.node.kubernetes.io/aim-accelerator.MI300X  (Exists)

The AcceleratorDetector (DaemonSet) labels each node with all applicable identifiers. For example, a node with an MI300X GPU and EPYC 9575F CPU gets:

feature.node.kubernetes.io/aim-accelerator.MI300X: "8"      # GPU model + count
feature.node.kubernetes.io/aim-accelerator.EPYC_ZEN5: "128" # CPU arch + core count

A profile targeting MI300X matches that node (exact). A fallback profile targeting EPYC_ZEN5 also matches any Zen5 EPYC node. AIM Engine is fully blind — it constructs one label key from acceleratorModel and sets operator: Exists; the label value (accelerator count) is informational only.

Resource derivation#

AIM Engine computes default ResourceRequirements from acceleratorType, acceleratorCount, and cluster-level configuration (e.g., DCM ConfigMap for GPU partitioning modes, vendor-specific resource names), then merges any spec.resources override on top. The result is written to status.resources — the definitive resource requirements used for deployment.

spec:
  acceleratorModel: MI300X
  acceleratorType: gpu
  acceleratorCount: 1        # engine computes: amd.com/gpu: "1"
  resources:                  # optional override: cpu/memory
    requests:
      cpu: "4"
      memory: 32Gi

The operator checks cluster nodes against both the accelerator model label and status.resources capacity. If no node matches, the profile status becomes NotAvailable.

Partitioned GPUs#

For partitioned GPU configurations (e.g., CPX-NPS4, MIG), override the derived device resource in spec.resources:

spec:
  acceleratorModel: MI300X
  acceleratorType: gpu
  acceleratorCount: 0             # suppresses default amd.com/gpu derivation
  resources:
    requests:
      amd.com/cpx-nps4: "4"      # partition-specific device resource

Profile Status#

Status Fields#

Field	Type	Description
`status`	enum	`Pending`, `Progressing`, `Ready`, `Degraded`, `Failed`, `NotAvailable`
`version`	string	Extracted from `spec.image` tag (e.g., `0.8.5`)
`matchingNodes`	int32	Count of cluster nodes matching accelerator labels and resource requests
`hardwareSummary`	string	Human-readable summary (e.g., `1 x MI300X`, `CPU`)
`resources`	ResourceRequirements	Definitive resource requirements used for deployment: defaults computed from accelerator fields + cluster config, merged with any spec.resources override
`resolvedNodeAffinity`	NodeAffinity	Computed node affinity rules for pod scheduling
`conditions`	[]Condition	Standard Kubernetes conditions

Status Lifecycle#

Pending — Profile created, not yet reconciled
Ready — At least one cluster node matches the accelerator labels and has sufficient resource capacity. For profiles without accelerator requirements (CPU-only), the profile is always Ready.
Degraded — The controller encountered a transient error (e.g., failed to list cluster nodes). The profile will be re-evaluated automatically.
NotAvailable — No cluster nodes match the profile’s hardware requirements. The profile becomes Ready automatically when matching nodes are added to the cluster.

Conditions#

HardwareAvailable: Reports whether the cluster has nodes matching the profile’s requirements.

Status	Reason	Description
`True`	`HardwareAvailable`	Matching nodes found in cluster
`True`	`NoAcceleratorSpecified`	No accelerator requirements — always available
`False`	`HardwareNotAvailable`	No cluster nodes match accelerator labels and resource requests

The profile controller watches node events and re-evaluates hardware availability when nodes are added, removed, or their GPU labels change.

Primary Profiles#

AIM container images are built by profile authors — the team that benchmarks models on specific hardware and publishes validated runtime configurations. A single image may contain many profiles covering different precisions, optimization targets, and GPU configurations. The primary field marks the profile that the authors recommend as the default for a given model and hardware combination.

When primary: true:

The profile is selected by default when deploying a service without an explicit profile or template reference
The profile is used as the base configuration when onboarding custom model weights that match the same aimId and precision
The profile is preferred when multiple candidates match during automatic selection

Non-primary profiles remain available for explicit selection but are not considered during automatic selection by default.

Examples#

Cluster Profile — Latency Optimized#

apiVersion: aim.eai.amd.com/v1alpha2
kind: AIMClusterProfile
metadata:
  name: qwen-qwen3-32b-mi300x-lat-fp8
spec:
  aimId: qwen/qwen3-32b
  modelId: qwen/qwen3-32b-fp8
  engine: vllm
  metric: latency
  precision: fp8
  type: optimized
  primary: true
  acceleratorModel: MI300X
  acceleratorType: gpu
  acceleratorCount: 1
  image: amdenterpriseai/aim-qwen-qwen3-32b:0.8.5
  engineArgs:
    gpu-memory-utilization: "0.95"

Namespace Profile — Custom Weights#

apiVersion: aim.eai.amd.com/v1alpha2
kind: AIMProfile
metadata:
  name: my-finetuned-qwen-mi300x-lat-fp8
  namespace: ml-team
spec:
  aimId: qwen/qwen3-32b
  modelId: my-org/qwen-finetuned-fp8
  engine: vllm
  metric: latency
  precision: fp8
  type: general
  primary: false
  acceleratorModel: MI300X
  acceleratorType: gpu
  acceleratorCount: 2
  resources:
    requests:
      cpu: "8"
      memory: 64Gi
  image: amdenterpriseai/aim-base:0.8.5
  engineArgs:
    distributed_executor_backend: mp
    gpu-memory-utilization: "0.90"
    tensor-parallel-size: "2"
  modelSources:
    - modelId: my-org/qwen-finetuned-fp8
      sourceUri: s3://my-bucket/fp8-weights/
      precision: fp8

CPU-Only Profile#

apiVersion: aim.eai.amd.com/v1alpha2
kind: AIMProfile
metadata:
  name: small-model-cpu
  namespace: dev-team
spec:
  aimId: microsoft/phi-2
  engine: vllm
  metric: latency
  precision: fp32
  type: general
  primary: false
  image: amdenterpriseai/aim-phi-2:0.8.5

Profiles without accelerator fields (acceleratorModel, acceleratorType, acceleratorCount) and without extended device resource requests are treated as CPU-only and are always Ready.

Troubleshooting#

Profile Stuck in NotAvailable#

The profile’s accelerator requirements don’t match any cluster node:

# Check which nodes the profile expects
kubectl get aimclusterprofile <name> -o jsonpath='{.status.resolvedNodeAffinity}' | jq
# Check resolved resources
kubectl get aimclusterprofile <name> -o jsonpath='{.status.resources}' | jq

# Check matching node count
kubectl get aimclusterprofile <name> -o jsonpath='{.status.matchingNodes}'

# List nodes with accelerator labels
kubectl get nodes -l feature.node.kubernetes.io/aim-accelerator.MI300X

Common causes:

GPU nodes not yet added to the cluster
AcceleratorDetector not labeling nodes
Wrong accelerator model name in the profile

Profile Shows Wrong Hardware Summary#

The hardware summary is derived from acceleratorModel and acceleratorCount. Verify the accelerator fields:

kubectl get aimclusterprofile <name> -o yaml | grep -E 'accelerator(Model|Type|Count)'

Migration from Service Templates#

Profiles (v1alpha2) replace Service Templates (v1alpha1). Both API versions coexist during the transition period — existing v1alpha1 Service Templates continue to work. New deployments should use profiles.

Key differences for migrating users:

Profiles carry their own container image (spec.image), removing the dependency on model lookups
Hardware requirements use flat accelerator fields (acceleratorModel, acceleratorType, acceleratorCount). The controller derives Kubernetes device resource requests from the accelerator count. Partitioned GPUs (e.g., amd.com/cpx-nps4) are supported via explicit resources overrides
Precision is always explicit — there is no auto value
A general optimization tier is available between optimized and preview

See the Service Templates documentation for v1alpha1 usage.