AIMs Overview

AIMs Overview#

AIM stands for AMD Inference Microservice. AIMs provide standardized, portable inference microservices for serving AI models on AMD Instinct™ GPUs. AIMs use ROCm 7 under the hood.

AIMs are distributed as Docker images, making them easy to deploy and manage in various environments. Serving AI models in general and LLMs in particular is not a trivial task. AIMs abstract away the complexities involved in configuring and serving AI models by providing a mechanism to automatically choose optimal runtime parameters based on the user’s input, hardware, and model specifications.

AIM exposes an OpenAI-compatible API for LLMs, making it easy to integrate with existing applications and services.

Features#

  • Broad model support

    • Including community models, custom fine-tuned models, and popular foundation models.

  • Intelligent Configuration based on profiles.

    • Profiles are predefined configurations optimized for specific models and hardware.

    • Profile selection is an automated process of choosing the best profile based on the user’s input, hardware, and model.

      • It is possible to bypass automatic selection and specify a particular profile directly using an environment variable.

      • Custom profiles can be created by users to suit their specific needs.

    • All published profiles are validated against the schema, tested on the target hardware, and optimized for throughput or latency.

  • Models downloading and caching

    • Models can be downloaded from Hugging Face.

    • Downloaded models can be cached in different ways to speed-up subsequent runs.

    • Downloading gated models from Hugging Face is supported.

  • Integration

    • Logging is available on the container level and can be used by orchestrating frameworks.

    • AIM Runtime CLI simplifies the integration with orchestrating frameworks, such as Kubernetes.

    • AIM exposes OpenAI-compatible API for LLMs.

Terminology reference#

Word

Explanation

AIM

AMD Inference Microservice

Docker

A platform for developing, shipping, and running applications in containers

GPU

A graphics processing unit. Essential hardware for running AI models

HF

Hugging Face, a popular platform for sharing machine learning models and datasets

LLM

Large Language Model

Profile

A predefined AIM run configuration that can be optimized for specific models, compute, or use cases

ROCm

Radeon Open Compute, AMD’s open software platform for GPU computing

YAML

A human-readable data serialization format often used for configuration files