Storage Configuration#
AIM Engine uses persistent volumes for model caching. This guide covers storage setup and sizing.
Requirements#
Model caching requires ReadWriteMany (RWX) persistent volumes so that multiple pods can mount the same cached model data. You need a CSI driver that supports RWX access mode, such as:
Default Storage Class#
Set the default storage class for all AIM PVCs via cluster runtime configuration:
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMClusterRuntimeConfig
metadata:
name: default
spec:
storage:
defaultStorageClassName: longhorn
Without this setting, AIM Engine uses the cluster’s default storage class.
PVC Headroom#
AIM Engine sizes PVCs based on discovered model sizes plus a configurable headroom percentage. This accounts for filesystem overhead and temporary files during downloads.
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMClusterRuntimeConfig
metadata:
name: default
spec:
storage:
pvcHeadroomPercent: 15
The default headroom is 10%. The final PVC size is rounded up to the nearest GiB.
Storage Sizing Guidelines#
Model storage requirements vary significantly:
Model Size Category |
Approximate Storage |
Example |
|---|---|---|
Small (7-8B params) |
15-20 GiB |
Qwen3 8B |
Medium (30-70B params) |
60-140 GiB |
Qwen3 32B, DeepSeek R1 70B |
Large (100B+ params) |
200+ GiB |
Mixtral 8x22B |
These are per-model estimates. A template cache PVC holds all model sources for that template.
Monitoring Storage#
Check PVC usage:
# List AIM-related PVCs
kubectl get pvc -l aim.eai.amd.com/artifact -n <namespace>
# Check artifact download status
kubectl get aimartifact -n <namespace>
Storage Quotas#
AIM Engine can enforce storage limits on artifact PVCs to prevent unbounded growth. Quotas are evaluated before creating PVCs – when a new artifact would exceed the limit, it is either blocked or existing artifacts are evicted to make room.
Namespace Quota#
Set a per-namespace limit via annotation:
kubectl annotate namespace ml-team aim.eai.amd.com/artifact-storage-quota=100Gi
This limits the total allocated PVC storage for all AIMArtifacts in that namespace.
Cluster-Wide Defaults#
Configure default namespace limits and a cluster-wide cap via AIMClusterRuntimeConfig:
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMClusterRuntimeConfig
metadata:
name: default
spec:
artifactStorageQuota:
clusterLimit: 500Gi
defaultNamespaceLimit: 100Gi
clusterLimit: Maximum total PVC storage across all namespaces.defaultNamespaceLimit: Applied to namespaces that don’t have the annotation. The namespace annotation takes precedence when both are set.
Eviction Policy#
When quota is exceeded and evictable artifacts exist, AIM Engine automatically deletes the lowest-priority artifacts to make room. Eviction eligibility requires:
The artifact has a
retentionPriority(set explicitly or viadefaultRetentionPriorityin runtime config)The artifact is
SharedandReadyThe artifact is not in use by any
AIMTemplateCacheThe artifact is not annotated with
aim.eai.amd.com/eviction-protected: "true"
Lower retentionPriority values are evicted first. Among equal priorities, the oldest artifact is evicted first.
Default Retention Priority#
To make all artifacts evictable by default without setting retentionPriority on each one:
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMClusterRuntimeConfig
metadata:
name: default
spec:
artifact:
defaultRetentionPriority: 10
Artifacts with an explicit spec.retentionPriority override this default. Artifacts without either remain non-evictable.
Protecting Specific Artifacts#
To exempt an artifact from eviction even when a default priority is configured:
kubectl annotate aimartifact important-model aim.eai.amd.com/eviction-protected=true
Quota Status#
When an artifact is blocked by quota, its status shows the reason:
kubectl get aimartifact -n ml-team
# STATUS column shows "Failed" for blocked artifacts
kubectl get aimartifact blocked-model -n ml-team -o yaml
# status.conditions includes:
# - type: StorageQuotaExceeded
# status: "True"
# reason: NamespaceQuotaExceeded
# message: "Namespace quota exceeded: 90Gi used + 20Gi needed > 100Gi limit"
This condition propagates up through AIMTemplateCache and AIMService, so kubectl get aimservice shows the quota reason when a service is waiting for a blocked artifact.
Cleanup#
Template cache PVCs are owned by AIMTemplateCache resources, which are owned by templates. When a template is deleted, its caches and PVCs are cleaned up automatically.
To manually reclaim storage:
# Delete a template cache (also deletes its PVCs and artifacts)
kubectl delete aimtemplatecache <name> -n <namespace>
Next Steps#
Model Caching Guide — Caching modes and configuration
Model Caching Concepts — Cache hierarchy and ownership