API Reference#

Packages#

aim.eai.amd.com/v1alpha2#

Package v1alpha2 contains API Schema definitions for the aim v1alpha2 API group.

Resource Types#

AIMClusterProfile#

AIMClusterProfile is the Schema for cluster-scoped AIM profiles. Cluster profiles are visible across all namespaces. They can be created manually or, in the future, automatically during model discovery by a v1alpha2 model controller. Unlike namespace-scoped AIMProfiles, cluster profiles do not support caching configuration since caches are namespace-scoped.

Appears in:

Field

Description

Default

Validation

apiVersion string

aim.eai.amd.com/v1alpha2

kind string

AIMClusterProfile

metadata ObjectMeta

Refer to Kubernetes API documentation for fields of metadata.

spec AIMClusterProfileSpec

status AIMProfileStatus

AIMClusterProfileList#

AIMClusterProfileList contains a list of AIMClusterProfile.

Field

Description

Default

Validation

apiVersion string

aim.eai.amd.com/v1alpha2

kind string

AIMClusterProfileList

metadata ListMeta

Refer to Kubernetes API documentation for fields of metadata.

items AIMClusterProfile array

AIMClusterProfileSpec#

AIMClusterProfileSpec defines the desired state of a cluster-scoped AIMClusterProfile.

Appears in:

Field

Description

Default

Validation

aimId string

AimId is the model architecture identifier (e.g., “qwen/qwen3-32b”).
Primary matching axis for profile selection and custom weight onboarding. Immutable.

MinLength: 1

modelId string

ModelId is the specific model / HuggingFace URI (e.g., “qwen/qwen3-32b-fp8”).
Determines the cache path (/workspace/cache/{modelId}) and serves as a secondary
discriminator for custom weight matching.

Optional: {}

profileId string

ProfileId is the on-disk profile identifier from the AIM image
(e.g., “vllm-mi300x-fp8-tp1-latency”). Populated during discovery to link this
CRD back to the profile YAML inside the container. Not required for manually
created profiles.

Optional: {}

engine string

Engine identifies the inference engine (e.g., “vllm”, “tgi”).

Optional: {}

metric AIMMetric

Metric is the optimization target for this profile.

Enum: [latency throughput]
Optional: {}

precision AIMPrecision

Precision is the numeric precision used by this profile.

Enum: [fp4 fp8 fp16 fp32 bf16 int4 int8]
Optional: {}

type AIMProfileType

Type indicates the optimization level. Hierarchy: optimized > general > preview > unoptimized.

Enum: [optimized general preview unoptimized]
Optional: {}

primary boolean

Primary marks this as a default/recommended profile. When true, the profile is
advertised for standard deployment and copied automatically for custom weight models.
Defaults to false when not specified.

false

engineArgs JSON

EngineArgs contains inference engine CLI arguments as a free-form JSON object.
Passed to the inference engine (e.g., vLLM) at startup.

Schemaless: {}
Optional: {}

engineEnv object (keys:string, values:string)

EngineEnv contains environment variables for the inference engine subprocess.
Applied via os.execv, distinct from container-level ContainerEnv.

Optional: {}

acceleratorModel string

AcceleratorModel is the accelerator identifier for node selection.
Maps to a node label key using the Exists operator:
feature.node.kubernetes.io/aim-accelerator.{value}: Exists
Supports both specific models (e.g., “MI300X”) and architecture-level
fallbacks (e.g., “EPYC_ZEN5”) — the AcceleratorDetector labels nodes
with all applicable identifiers.

MaxLength: 63
Pattern: ^[A-Za-z0-9]([A-Za-z0-9._-]*[A-Za-z0-9])?$
Optional: {}

acceleratorType AcceleratorType

AcceleratorType determines the resource derivation strategy: gpu or cpu.
AIM Engine computes default resource requests from this field combined
with AcceleratorCount and cluster-level configuration.

Enum: [gpu cpu]
Optional: {}

acceleratorCount integer

AcceleratorCount is the number of accelerator units required (e.g., GPU count).
Combined with AcceleratorType and cluster-level configuration to compute
default resource requests in status.resources.

Minimum: 0
Optional: {}

resources ResourceRequirements

Resources is an optional override for K8s resource requests/limits.
When set, merged on top of the defaults that AIM Engine computes from
AcceleratorType, AcceleratorCount, and cluster-level configuration.
The resolved result is written to status.resources.

Optional: {}

image string

Image is the deployment container image. Required.
For purpose-built profiles: the full AIM image.
For custom weight profiles: the base image (e.g., aim-base:0.8.5).

MinLength: 1

modelSources AIMModelSource array

ModelSources specifies model artifact sources for this profile.
Populated during discovery or set by user.

Optional: {}

containerEnv EnvVar array

ContainerEnv specifies container-level env vars for the AIM runtime process (K8s pod spec).

Optional: {}

imagePullSecrets LocalObjectReference array

ImagePullSecrets lists secrets for pulling container images.

Optional: {}

serviceAccountName string

ServiceAccountName specifies the service account for workloads.

Optional: {}

AIMMetric#

Underlying type: string

AIMMetric enumerates supported optimization targets.

Validation:

  • Enum: [latency throughput]

Appears in:

Field

Description

latency

throughput

AIMModelSource#

AIMModelSource describes a downloadable model artifact with optional credentials.

Appears in:

Field

Description

Default

Validation

modelId string

ModelID is the canonical identifier in {org}/{name} format.
Determines the cache mount path: /workspace/cache/{modelId}

Pattern: ^[a-zA-Z0-9_-]+/[a-zA-Z0-9._-]+$
Required: {}

sourceUri string

SourceURI is the location from which the model should be downloaded.
Supported schemes: hf:// (Hugging Face Hub), s3:// (S3-compatible storage).

Pattern: ^(hf|s3)://[^ \t\r\n]+$

size Quantity

Size is the expected storage space required for this model artifact.
Optional — if not specified, the download job discovers the size automatically.

Optional: {}

precision AIMPrecision

Precision describes the runtime precision this source is compatible with.
Used to match model sources to profiles during custom weight onboarding.

Enum: [fp4 fp8 fp16 fp32 bf16 int4 int8]
Optional: {}

env EnvVar array

Env specifies per-source credential overrides.
Takes precedence over base-level env for the same variable name.

Optional: {}

AIMPrecision#

Underlying type: string

AIMPrecision enumerates supported numeric precisions.

Validation:

  • Enum: [fp4 fp8 fp16 fp32 bf16 int4 int8]

Appears in:

Field

Description

fp4

fp8

fp16

fp32

bf16

int4

int8

AIMProfile#

AIMProfile is the Schema for namespace-scoped AIM profiles. A profile is a self-contained runtime configuration that answers five questions without consulting any other resource: model architecture, accelerator, K8s resources, runtime config, and container image.

Appears in:

Field

Description

Default

Validation

apiVersion string

aim.eai.amd.com/v1alpha2

kind string

AIMProfile

metadata ObjectMeta

Refer to Kubernetes API documentation for fields of metadata.

spec AIMProfileSpec

status AIMProfileStatus

AIMProfileCache#

AIMProfileCache pre-warms model artifacts for a specified profile’s model sources.

Appears in:

Field

Description

Default

Validation

apiVersion string

aim.eai.amd.com/v1alpha2

kind string

AIMProfileCache

metadata ObjectMeta

Refer to Kubernetes API documentation for fields of metadata.

spec AIMProfileCacheSpec

status AIMProfileCacheStatus

AIMProfileCacheList#

AIMProfileCacheList contains a list of AIMProfileCache.

Field

Description

Default

Validation

apiVersion string

aim.eai.amd.com/v1alpha2

kind string

AIMProfileCacheList

metadata ListMeta

Refer to Kubernetes API documentation for fields of metadata.

items AIMProfileCache array

AIMProfileCacheMode#

Underlying type: string

AIMProfileCacheMode controls the ownership behavior of artifacts created by a profile cache.

Validation:

  • Enum: [Dedicated Shared]

Appears in:

Field

Description

Dedicated

ProfileCacheModeDedicated means artifacts are owned by this profile cache and
garbage collected when it is deleted.

Shared

ProfileCacheModeShared means artifacts have no owner references and persist
independently of the profile cache lifecycle. This is the default mode.

AIMProfileCacheSpec#

AIMProfileCacheSpec defines the desired state of AIMProfileCache.

Appears in:

Field

Description

Default

Validation

profileName string

ProfileName is the name of the AIMProfile or AIMClusterProfile to cache.
The controller resolves model sources from the referenced profile’s spec.modelSources.

MinLength: 1

profileScope AIMResolutionScope

ProfileScope indicates whether the profile is namespace-scoped or cluster-scoped.

Enum: [Namespace Cluster]
Required: {}

storageClassName string

StorageClassName specifies the storage class for cache volumes.
When not specified, uses the cluster default storage class.

Optional: {}

env EnvVar array

Env specifies environment variables for authentication when downloading models.
These variables are used for authentication with model registries (e.g., HuggingFace tokens).

Optional: {}

mode AIMProfileCacheMode

Mode controls the ownership behavior of artifacts created by this profile cache.
- Dedicated: artifacts are owned by this profile cache and garbage collected when it’s deleted.
- Shared (default): artifacts have no owner references and persist independently.

Shared

Enum: [Dedicated Shared]
Optional: {}

AIMProfileCacheStatus#

AIMProfileCacheStatus defines the observed state of AIMProfileCache.

Appears in:

Field

Description

Default

Validation

observedGeneration integer

ObservedGeneration is the most recent generation observed by the controller.

conditions Condition array

Conditions represent the latest observations of the profile cache state.

status AIMStatus

Status represents the current high-level status of the profile cache.

Pending

Enum: [Pending Progressing Ready Failed Degraded NotAvailable]

artifacts object (keys:string, values:AIMResolvedArtifact)

Artifacts maps artifact names to their resolved AIMArtifact resources.

Optional: {}

AIMProfileCachingConfig#

AIMProfileCachingConfig configures model caching behavior for namespace-scoped profiles.

Appears in:

Field

Description

Default

Validation

enabled boolean

Enabled controls whether caching is enabled for this profile.

false

env EnvVar array

Env specifies environment variables for model download during caching.
If not set, falls back to the profile’s ContainerEnv.

Optional: {}

AIMProfileList#

AIMProfileList contains a list of AIMProfile.

Field

Description

Default

Validation

apiVersion string

aim.eai.amd.com/v1alpha2

kind string

AIMProfileList

metadata ListMeta

Refer to Kubernetes API documentation for fields of metadata.

items AIMProfile array

AIMProfileSpec#

AIMProfileSpec defines the desired state of a namespace-scoped AIMProfile.

Appears in:

Field

Description

Default

Validation

aimId string

AimId is the model architecture identifier (e.g., “qwen/qwen3-32b”).
Primary matching axis for profile selection and custom weight onboarding. Immutable.

MinLength: 1

modelId string

ModelId is the specific model / HuggingFace URI (e.g., “qwen/qwen3-32b-fp8”).
Determines the cache path (/workspace/cache/{modelId}) and serves as a secondary
discriminator for custom weight matching.

Optional: {}

profileId string

ProfileId is the on-disk profile identifier from the AIM image
(e.g., “vllm-mi300x-fp8-tp1-latency”). Populated during discovery to link this
CRD back to the profile YAML inside the container. Not required for manually
created profiles.

Optional: {}

engine string

Engine identifies the inference engine (e.g., “vllm”, “tgi”).

Optional: {}

metric AIMMetric

Metric is the optimization target for this profile.

Enum: [latency throughput]
Optional: {}

precision AIMPrecision

Precision is the numeric precision used by this profile.

Enum: [fp4 fp8 fp16 fp32 bf16 int4 int8]
Optional: {}

type AIMProfileType

Type indicates the optimization level. Hierarchy: optimized > general > preview > unoptimized.

Enum: [optimized general preview unoptimized]
Optional: {}

primary boolean

Primary marks this as a default/recommended profile. When true, the profile is
advertised for standard deployment and copied automatically for custom weight models.
Defaults to false when not specified.

false

engineArgs JSON

EngineArgs contains inference engine CLI arguments as a free-form JSON object.
Passed to the inference engine (e.g., vLLM) at startup.

Schemaless: {}
Optional: {}

engineEnv object (keys:string, values:string)

EngineEnv contains environment variables for the inference engine subprocess.
Applied via os.execv, distinct from container-level ContainerEnv.

Optional: {}

acceleratorModel string

AcceleratorModel is the accelerator identifier for node selection.
Maps to a node label key using the Exists operator:
feature.node.kubernetes.io/aim-accelerator.{value}: Exists
Supports both specific models (e.g., “MI300X”) and architecture-level
fallbacks (e.g., “EPYC_ZEN5”) — the AcceleratorDetector labels nodes
with all applicable identifiers.

MaxLength: 63
Pattern: ^[A-Za-z0-9]([A-Za-z0-9._-]*[A-Za-z0-9])?$
Optional: {}

acceleratorType AcceleratorType

AcceleratorType determines the resource derivation strategy: gpu or cpu.
AIM Engine computes default resource requests from this field combined
with AcceleratorCount and cluster-level configuration.

Enum: [gpu cpu]
Optional: {}

acceleratorCount integer

AcceleratorCount is the number of accelerator units required (e.g., GPU count).
Combined with AcceleratorType and cluster-level configuration to compute
default resource requests in status.resources.

Minimum: 0
Optional: {}

resources ResourceRequirements

Resources is an optional override for K8s resource requests/limits.
When set, merged on top of the defaults that AIM Engine computes from
AcceleratorType, AcceleratorCount, and cluster-level configuration.
The resolved result is written to status.resources.

Optional: {}

image string

Image is the deployment container image. Required.
For purpose-built profiles: the full AIM image.
For custom weight profiles: the base image (e.g., aim-base:0.8.5).

MinLength: 1

modelSources AIMModelSource array

ModelSources specifies model artifact sources for this profile.
Populated during discovery or set by user.

Optional: {}

containerEnv EnvVar array

ContainerEnv specifies container-level env vars for the AIM runtime process (K8s pod spec).

Optional: {}

imagePullSecrets LocalObjectReference array

ImagePullSecrets lists secrets for pulling container images.

Optional: {}

serviceAccountName string

ServiceAccountName specifies the service account for workloads.

Optional: {}

caching AIMProfileCachingConfig

Caching configures model caching behavior for this namespace-scoped profile.

Optional: {}

AIMProfileSpecCommon#

AIMProfileSpecCommon contains spec fields shared between AIMProfile and AIMClusterProfile. A profile answers five questions without consulting any other resource: model architecture (aimId), accelerator (acceleratorModel/Type/Count), K8s resources (status.resources), runtime config (engineArgs, engineEnv), and container image (image).

Appears in:

Field

Description

Default

Validation

aimId string

AimId is the model architecture identifier (e.g., “qwen/qwen3-32b”).
Primary matching axis for profile selection and custom weight onboarding. Immutable.

MinLength: 1

modelId string

ModelId is the specific model / HuggingFace URI (e.g., “qwen/qwen3-32b-fp8”).
Determines the cache path (/workspace/cache/{modelId}) and serves as a secondary
discriminator for custom weight matching.

Optional: {}

profileId string

ProfileId is the on-disk profile identifier from the AIM image
(e.g., “vllm-mi300x-fp8-tp1-latency”). Populated during discovery to link this
CRD back to the profile YAML inside the container. Not required for manually
created profiles.

Optional: {}

engine string

Engine identifies the inference engine (e.g., “vllm”, “tgi”).

Optional: {}

metric AIMMetric

Metric is the optimization target for this profile.

Enum: [latency throughput]
Optional: {}

precision AIMPrecision

Precision is the numeric precision used by this profile.

Enum: [fp4 fp8 fp16 fp32 bf16 int4 int8]
Optional: {}

type AIMProfileType

Type indicates the optimization level. Hierarchy: optimized > general > preview > unoptimized.

Enum: [optimized general preview unoptimized]
Optional: {}

primary boolean

Primary marks this as a default/recommended profile. When true, the profile is
advertised for standard deployment and copied automatically for custom weight models.
Defaults to false when not specified.

false

engineArgs JSON

EngineArgs contains inference engine CLI arguments as a free-form JSON object.
Passed to the inference engine (e.g., vLLM) at startup.

Schemaless: {}
Optional: {}

engineEnv object (keys:string, values:string)

EngineEnv contains environment variables for the inference engine subprocess.
Applied via os.execv, distinct from container-level ContainerEnv.

Optional: {}

acceleratorModel string

AcceleratorModel is the accelerator identifier for node selection.
Maps to a node label key using the Exists operator:
feature.node.kubernetes.io/aim-accelerator.{value}: Exists
Supports both specific models (e.g., “MI300X”) and architecture-level
fallbacks (e.g., “EPYC_ZEN5”) — the AcceleratorDetector labels nodes
with all applicable identifiers.

MaxLength: 63
Pattern: ^[A-Za-z0-9]([A-Za-z0-9._-]*[A-Za-z0-9])?$
Optional: {}

acceleratorType AcceleratorType

AcceleratorType determines the resource derivation strategy: gpu or cpu.
AIM Engine computes default resource requests from this field combined
with AcceleratorCount and cluster-level configuration.

Enum: [gpu cpu]
Optional: {}

acceleratorCount integer

AcceleratorCount is the number of accelerator units required (e.g., GPU count).
Combined with AcceleratorType and cluster-level configuration to compute
default resource requests in status.resources.

Minimum: 0
Optional: {}

resources ResourceRequirements

Resources is an optional override for K8s resource requests/limits.
When set, merged on top of the defaults that AIM Engine computes from
AcceleratorType, AcceleratorCount, and cluster-level configuration.
The resolved result is written to status.resources.

Optional: {}

image string

Image is the deployment container image. Required.
For purpose-built profiles: the full AIM image.
For custom weight profiles: the base image (e.g., aim-base:0.8.5).

MinLength: 1

modelSources AIMModelSource array

ModelSources specifies model artifact sources for this profile.
Populated during discovery or set by user.

Optional: {}

containerEnv EnvVar array

ContainerEnv specifies container-level env vars for the AIM runtime process (K8s pod spec).

Optional: {}

imagePullSecrets LocalObjectReference array

ImagePullSecrets lists secrets for pulling container images.

Optional: {}

serviceAccountName string

ServiceAccountName specifies the service account for workloads.

Optional: {}

AIMProfileStatus#

AIMProfileStatus defines the observed state of AIMProfile / AIMClusterProfile.

Appears in:

Field

Description

Default

Validation

observedGeneration integer

ObservedGeneration is the most recent generation observed by the controller.

status AIMStatus

Status represents the current high-level status of this profile.
Ready: at least one cluster node matches the profile’s accelerator labels and resource requests.
NotAvailable: no matching nodes found.

Pending

Enum: [Pending Progressing Ready Degraded Failed NotAvailable]

version string

Version is extracted from the spec.image tag during reconciliation (e.g., “0.8.5”).

Optional: {}

matchingNodes integer

MatchingNodes is the count of cluster nodes matching both the accelerator
model label and status.resources requests. Zero means NotAvailable.

Optional: {}

hardwareSummary string

HardwareSummary is a human-readable string describing the hardware requirements.
Format: “{count} x {model}” for GPU (e.g., “1 x MI300X”) or “CPU” for CPU-only.

Optional: {}

resources ResourceRequirements

Resources contains the definitive K8s resource requests/limits used for deployment.
Computed by AIM Engine from AcceleratorType, AcceleratorCount, and cluster-level
configuration, then merged with any spec.resources override.

Optional: {}

resolvedNodeAffinity NodeAffinity

ResolvedNodeAffinity contains the computed node affinity rules derived from
spec.acceleratorModel. Used by AIMService when building InferenceService pods.

Optional: {}

conditions Condition array

Conditions represent the latest observations of profile state.

AIMProfileType#

Underlying type: string

AIMProfileType indicates the optimization level of a profile. Hierarchy: optimized > general > preview > unoptimized.

Validation:

  • Enum: [optimized general preview unoptimized]

Appears in:

Field

Description

optimized

general

preview

unoptimized

AIMService#

AIMService manages a KServe-based AIM inference service for the selected model and template. Note: KServe uses {name}-{namespace} format which must not exceed 63 characters. This constraint is validated at runtime since CEL cannot access metadata.namespace.

Appears in:

Field

Description

Default

Validation

apiVersion string

aim.eai.amd.com/v1alpha2

kind string

AIMService

metadata ObjectMeta

Refer to Kubernetes API documentation for fields of metadata.

spec AIMServiceSpec

status AIMServiceStatus

AIMServiceList#

AIMServiceList contains a list of AIMService.

Field

Description

Default

Validation

apiVersion string

aim.eai.amd.com/v1alpha2

kind string

AIMServiceList

metadata ListMeta

Refer to Kubernetes API documentation for fields of metadata.

items AIMService array

AcceleratorType#

Underlying type: string

AcceleratorType distinguishes CPU from GPU accelerators. Used by AIM Engine to determine the resource derivation strategy (e.g., gpu → amd.com/gpu, cpu → cpu).

Validation:

  • Enum: [gpu cpu]

Appears in:

Field

Description

cpu

gpu