Legacy fine-tuned models

AMD AI Workbench training fine-tuning upgrade

Legacy fine-tuned models#

Fine-tuned models created before this upgrade no longer appear in the dashboard. Their weights are intact in object storage; the new version of AMD AI Workbench just needs them registered as AIMs (specifically, as AIMModel Custom Resources in each project’s namespace) before users can see and deploy them.

This page describes how a cluster administrator registers those legacy models as AIMs after the AI Workbench upgrade.

How to register a legacy model#

For each row in the legacy inference_models table whose onboarding_status is ready and whose model_weights_path is not null, create one AIMModel Custom Resource in the same namespace. The mapping from database column to CR field is:

inference_models column

AIMModel field

namespace

metadata.namespace

name

drives metadata.name (sanitize to a valid K8s name; suffix with the row id to avoid collisions); also stored on the …/model-name label

canonical_name

spec.modelSources[0].modelId and the …/canonical-name label

model_weights_path

spec.modelSources[0].sourceUri, prefixed with s3://<bucket>/ — the legacy path already includes the checkpoint-final segment

id

the airm.silogen.ai/workload-id label and the aiwb.apps.eai.amd.com/legacy-inference-model-id annotation

Resulting AIMModel:

apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMModel
metadata:
  name: <slug-of-name>-<id-prefix>
  namespace: <inference_models.namespace>
  labels:
    aiwb.apps.eai.amd.com/model-name: <inference_models.name, sanitized>
    aiwb.apps.eai.amd.com/canonical-name: <inference_models.canonical_name, sanitized>
    airm.silogen.ai/workload-id: <inference_models.id>
    aiwb.apps.eai.amd.com/migrated-from-db: "true"
  annotations:
    aiwb.apps.eai.amd.com/legacy-inference-model-id: <inference_models.id>
spec:
  image: amdenterpriseai/aim-base:latest
  modelSources:
    - modelId: <inference_models.canonical_name>
      sourceUri: s3://<bucket>/<inference_models.model_weights_path>
  custom:
    hardware:
      gpu:
        requests: 1

The aiwb.apps.eai.amd.com/migrated-from-db label and the legacy-inference-model-id annotation make the operation idempotent: a re-run can detect already-migrated rows and skip them.

Verifying#

kubectl -n <namespace> get aimmodels -l aiwb.apps.eai.amd.com/migrated-from-db=true

The migrated models should appear under the project’s “Custom Models” tab. Each will show “Complete” once its AIMModel reconciles.

Troubleshooting#

AIMModel created but stays in Pending. Usually a credentials issue — the cluster’s bucket secret isn’t reachable from the model’s namespace. Check kubectl describe aimmodel <name> and look at the controller events. This procedure does not provision secrets; it assumes the namespace is already configured for fine-tuning.

Row has a null canonical_name. Skip it. Without canonical_name there is no modelId to set on the AIMModel and the model cannot be deployed.