API Reference#
Packages#
aim.eai.amd.com/v1alpha2#
Package v1alpha2 contains API Schema definitions for the aim v1alpha2 API group.
Resource Types#
AIMClusterProfile#
AIMClusterProfile is the Schema for cluster-scoped AIM profiles. Cluster profiles are visible across all namespaces. They can be created manually or, in the future, automatically during model discovery by a v1alpha2 model controller. Unlike namespace-scoped AIMProfiles, cluster profiles do not support caching configuration since caches are namespace-scoped.
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
|
||
|
|
||
|
Refer to Kubernetes API documentation for fields of |
||
|
AIMClusterProfileList#
AIMClusterProfileList contains a list of AIMClusterProfile.
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
|
||
|
|
||
|
Refer to Kubernetes API documentation for fields of |
||
|
AIMClusterProfileSpec#
AIMClusterProfileSpec defines the desired state of a cluster-scoped AIMClusterProfile.
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
AimId is the model architecture identifier (e.g., “qwen/qwen3-32b”). |
MinLength: 1 |
|
|
ModelId is the specific model / HuggingFace URI (e.g., “qwen/qwen3-32b-fp8”). |
Optional: {} |
|
|
ProfileId is the on-disk profile identifier from the AIM image |
Optional: {} |
|
|
Engine identifies the inference engine (e.g., “vllm”, “tgi”). |
Optional: {} |
|
|
Metric is the optimization target for this profile. |
Enum: [latency throughput] |
|
|
Precision is the numeric precision used by this profile. |
Enum: [fp4 fp8 fp16 fp32 bf16 int4 int8] |
|
|
Type indicates the optimization level. Hierarchy: optimized > general > preview > unoptimized. |
Enum: [optimized general preview unoptimized] |
|
|
Primary marks this as a default/recommended profile. When true, the profile is |
false |
|
|
EngineArgs contains inference engine CLI arguments as a free-form JSON object. |
Schemaless: {} |
|
|
EngineEnv contains environment variables for the inference engine subprocess. |
Optional: {} |
|
|
AcceleratorModel is the accelerator identifier for node selection. |
MaxLength: 63 |
|
|
AcceleratorType determines the resource derivation strategy: gpu or cpu. |
Enum: [gpu cpu] |
|
|
AcceleratorCount is the number of accelerator units required (e.g., GPU count). |
Minimum: 0 |
|
|
Resources is an optional override for K8s resource requests/limits. |
Optional: {} |
|
|
Image is the deployment container image. Required. |
MinLength: 1 |
|
|
ModelSources specifies model artifact sources for this profile. |
Optional: {} |
|
|
ContainerEnv specifies container-level env vars for the AIM runtime process (K8s pod spec). |
Optional: {} |
|
|
ImagePullSecrets lists secrets for pulling container images. |
Optional: {} |
|
|
ServiceAccountName specifies the service account for workloads. |
Optional: {} |
AIMMetric#
Underlying type: string
AIMMetric enumerates supported optimization targets.
Validation:
Enum: [latency throughput]
Appears in:
Field |
Description |
|---|---|
|
|
|
AIMModelSource#
AIMModelSource describes a downloadable model artifact with optional credentials.
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
ModelID is the canonical identifier in {org}/{name} format. |
Pattern: |
|
|
SourceURI is the location from which the model should be downloaded. |
Pattern: |
|
|
Size is the expected storage space required for this model artifact. |
Optional: {} |
|
|
Precision describes the runtime precision this source is compatible with. |
Enum: [fp4 fp8 fp16 fp32 bf16 int4 int8] |
|
|
Env specifies per-source credential overrides. |
Optional: {} |
AIMPrecision#
Underlying type: string
AIMPrecision enumerates supported numeric precisions.
Validation:
Enum: [fp4 fp8 fp16 fp32 bf16 int4 int8]
Appears in:
Field |
Description |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
AIMProfile#
AIMProfile is the Schema for namespace-scoped AIM profiles. A profile is a self-contained runtime configuration that answers five questions without consulting any other resource: model architecture, accelerator, K8s resources, runtime config, and container image.
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
|
||
|
|
||
|
Refer to Kubernetes API documentation for fields of |
||
|
|||
|
AIMProfileCache#
AIMProfileCache pre-warms model artifacts for a specified profile’s model sources.
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
|
||
|
|
||
|
Refer to Kubernetes API documentation for fields of |
||
|
|||
|
AIMProfileCacheList#
AIMProfileCacheList contains a list of AIMProfileCache.
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
|
||
|
|
||
|
Refer to Kubernetes API documentation for fields of |
||
|
AIMProfileCacheMode#
Underlying type: string
AIMProfileCacheMode controls the ownership behavior of artifacts created by a profile cache.
Validation:
Enum: [Dedicated Shared]
Appears in:
Field |
Description |
|---|---|
|
ProfileCacheModeDedicated means artifacts are owned by this profile cache and |
|
ProfileCacheModeShared means artifacts have no owner references and persist |
AIMProfileCacheSpec#
AIMProfileCacheSpec defines the desired state of AIMProfileCache.
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
ProfileName is the name of the AIMProfile or AIMClusterProfile to cache. |
MinLength: 1 |
|
|
ProfileScope indicates whether the profile is namespace-scoped or cluster-scoped. |
Enum: [Namespace Cluster] |
|
|
StorageClassName specifies the storage class for cache volumes. |
Optional: {} |
|
|
Env specifies environment variables for authentication when downloading models. |
Optional: {} |
|
|
Mode controls the ownership behavior of artifacts created by this profile cache. |
Shared |
Enum: [Dedicated Shared] |
AIMProfileCacheStatus#
AIMProfileCacheStatus defines the observed state of AIMProfileCache.
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
ObservedGeneration is the most recent generation observed by the controller. |
||
|
Conditions represent the latest observations of the profile cache state. |
||
|
Status represents the current high-level status of the profile cache. |
Pending |
Enum: [Pending Progressing Ready Failed Degraded NotAvailable] |
|
Artifacts maps artifact names to their resolved AIMArtifact resources. |
Optional: {} |
AIMProfileCachingConfig#
AIMProfileCachingConfig configures model caching behavior for namespace-scoped profiles.
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
Enabled controls whether caching is enabled for this profile. |
false |
|
|
Env specifies environment variables for model download during caching. |
Optional: {} |
AIMProfileList#
AIMProfileList contains a list of AIMProfile.
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
|
||
|
|
||
|
Refer to Kubernetes API documentation for fields of |
||
|
AIMProfileSpec#
AIMProfileSpec defines the desired state of a namespace-scoped AIMProfile.
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
AimId is the model architecture identifier (e.g., “qwen/qwen3-32b”). |
MinLength: 1 |
|
|
ModelId is the specific model / HuggingFace URI (e.g., “qwen/qwen3-32b-fp8”). |
Optional: {} |
|
|
ProfileId is the on-disk profile identifier from the AIM image |
Optional: {} |
|
|
Engine identifies the inference engine (e.g., “vllm”, “tgi”). |
Optional: {} |
|
|
Metric is the optimization target for this profile. |
Enum: [latency throughput] |
|
|
Precision is the numeric precision used by this profile. |
Enum: [fp4 fp8 fp16 fp32 bf16 int4 int8] |
|
|
Type indicates the optimization level. Hierarchy: optimized > general > preview > unoptimized. |
Enum: [optimized general preview unoptimized] |
|
|
Primary marks this as a default/recommended profile. When true, the profile is |
false |
|
|
EngineArgs contains inference engine CLI arguments as a free-form JSON object. |
Schemaless: {} |
|
|
EngineEnv contains environment variables for the inference engine subprocess. |
Optional: {} |
|
|
AcceleratorModel is the accelerator identifier for node selection. |
MaxLength: 63 |
|
|
AcceleratorType determines the resource derivation strategy: gpu or cpu. |
Enum: [gpu cpu] |
|
|
AcceleratorCount is the number of accelerator units required (e.g., GPU count). |
Minimum: 0 |
|
|
Resources is an optional override for K8s resource requests/limits. |
Optional: {} |
|
|
Image is the deployment container image. Required. |
MinLength: 1 |
|
|
ModelSources specifies model artifact sources for this profile. |
Optional: {} |
|
|
ContainerEnv specifies container-level env vars for the AIM runtime process (K8s pod spec). |
Optional: {} |
|
|
ImagePullSecrets lists secrets for pulling container images. |
Optional: {} |
|
|
ServiceAccountName specifies the service account for workloads. |
Optional: {} |
|
|
Caching configures model caching behavior for this namespace-scoped profile. |
Optional: {} |
AIMProfileSpecCommon#
AIMProfileSpecCommon contains spec fields shared between AIMProfile and AIMClusterProfile. A profile answers five questions without consulting any other resource: model architecture (aimId), accelerator (acceleratorModel/Type/Count), K8s resources (status.resources), runtime config (engineArgs, engineEnv), and container image (image).
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
AimId is the model architecture identifier (e.g., “qwen/qwen3-32b”). |
MinLength: 1 |
|
|
ModelId is the specific model / HuggingFace URI (e.g., “qwen/qwen3-32b-fp8”). |
Optional: {} |
|
|
ProfileId is the on-disk profile identifier from the AIM image |
Optional: {} |
|
|
Engine identifies the inference engine (e.g., “vllm”, “tgi”). |
Optional: {} |
|
|
Metric is the optimization target for this profile. |
Enum: [latency throughput] |
|
|
Precision is the numeric precision used by this profile. |
Enum: [fp4 fp8 fp16 fp32 bf16 int4 int8] |
|
|
Type indicates the optimization level. Hierarchy: optimized > general > preview > unoptimized. |
Enum: [optimized general preview unoptimized] |
|
|
Primary marks this as a default/recommended profile. When true, the profile is |
false |
|
|
EngineArgs contains inference engine CLI arguments as a free-form JSON object. |
Schemaless: {} |
|
|
EngineEnv contains environment variables for the inference engine subprocess. |
Optional: {} |
|
|
AcceleratorModel is the accelerator identifier for node selection. |
MaxLength: 63 |
|
|
AcceleratorType determines the resource derivation strategy: gpu or cpu. |
Enum: [gpu cpu] |
|
|
AcceleratorCount is the number of accelerator units required (e.g., GPU count). |
Minimum: 0 |
|
|
Resources is an optional override for K8s resource requests/limits. |
Optional: {} |
|
|
Image is the deployment container image. Required. |
MinLength: 1 |
|
|
ModelSources specifies model artifact sources for this profile. |
Optional: {} |
|
|
ContainerEnv specifies container-level env vars for the AIM runtime process (K8s pod spec). |
Optional: {} |
|
|
ImagePullSecrets lists secrets for pulling container images. |
Optional: {} |
|
|
ServiceAccountName specifies the service account for workloads. |
Optional: {} |
AIMProfileStatus#
AIMProfileStatus defines the observed state of AIMProfile / AIMClusterProfile.
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
ObservedGeneration is the most recent generation observed by the controller. |
||
|
Status represents the current high-level status of this profile. |
Pending |
Enum: [Pending Progressing Ready Degraded Failed NotAvailable] |
|
Version is extracted from the spec.image tag during reconciliation (e.g., “0.8.5”). |
Optional: {} |
|
|
MatchingNodes is the count of cluster nodes matching both the accelerator |
Optional: {} |
|
|
HardwareSummary is a human-readable string describing the hardware requirements. |
Optional: {} |
|
|
Resources contains the definitive K8s resource requests/limits used for deployment. |
Optional: {} |
|
|
ResolvedNodeAffinity contains the computed node affinity rules derived from |
Optional: {} |
|
|
Conditions represent the latest observations of profile state. |
AIMProfileType#
Underlying type: string
AIMProfileType indicates the optimization level of a profile. Hierarchy: optimized > general > preview > unoptimized.
Validation:
Enum: [optimized general preview unoptimized]
Appears in:
Field |
Description |
|---|---|
|
|
|
|
|
|
|
AIMService#
AIMService manages a KServe-based AIM inference service for the selected model and template. Note: KServe uses {name}-{namespace} format which must not exceed 63 characters. This constraint is validated at runtime since CEL cannot access metadata.namespace.
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
|
||
|
|
||
|
Refer to Kubernetes API documentation for fields of |
||
|
|||
|
AIMServiceList#
AIMServiceList contains a list of AIMService.
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
|
||
|
|
||
|
Refer to Kubernetes API documentation for fields of |
||
|
AcceleratorType#
Underlying type: string
AcceleratorType distinguishes CPU from GPU accelerators. Used by AIM Engine to determine the resource derivation strategy (e.g., gpu → amd.com/gpu, cpu → cpu).
Validation:
Enum: [gpu cpu]
Appears in:
Field |
Description |
|---|---|
|
|
|