Advanced Deployment Options

Advanced Deployment Options#

When deploying AMD Inference Microservices (AIMs) from the model catalog, you have access to advanced configuration options that allow you to optimize your deployment for specific use cases and requirements.

Performance Metric Selection#

The performance metric option allows you to specify the optimization goal for your model deployment. This setting influences how the AIM container configures itself to best serve your workload.

Available Metrics#

Latency: Prioritizes low end-to-end latency for individual requests. Choose this metric when:

You need fast responses for real-time applications
User experience depends on quick individual predictions
You’re building interactive chatbots or live inference services

Throughput: Prioritizes sustained requests per second. Choose this metric when:

You need to process high volumes of requests
Batch processing is more important than individual response time
You’re running background inference tasks or bulk predictions

How to Select a Metric#

Navigate to the Models page and go to the AIM Catalog tab
Click Deploy on your chosen model
In the deployment drawer, locate the Performance metric section
Select your preferred metric from the dropdown:
- Latency - Optimize for low response times
- Throughput - Optimize for high request volume
- Default - Let the AIM automatically select the best metric based on the model and hardware

Performance metric selection in the deployment drawer

Note

If you don’t specify a metric, the AIM will automatically select the most appropriate optimization based on the model type and available hardware. For most use cases, the default selection provides excellent performance.

Performance Metric Impact#

The selected metric affects:

Memory allocation strategies
Batch processing configurations
Cache utilization patterns
GPU resource scheduling

Different models may respond differently to metric selection based on their architecture and size. Larger language models typically benefit more from throughput optimization when handling multiple concurrent users, while smaller models may show better results with latency optimization.

Unoptimized Deployments#

By default, AMD AI Workbench only deploys models using fully optimized configurations that have been validated for production performance on AMD hardware. However, you can optionally enable unoptimized deployments to access preview and experimental configurations.

Profile Types#

AIM deployments use one of three profile types:

Optimized Profiles: Configurations that achieve ≥90% performance compared to competitive platforms

Production-ready and fully supported
Recommended for all production workloads
Thoroughly tested and validated
Only type used by default

Preview Profiles: Configurations that achieve ≥70% performance compared to competitive platforms

Suitable for testing and development
May be promoted to optimized in future releases
Generally stable but may have minor limitations
Only available when unoptimized deployments are enabled

Unoptimized Profiles: Experimental configurations with <70% performance

Not recommended for production use
May result in significantly subpar performance
Useful for testing new hardware or experimental features
Only available when unoptimized deployments are enabled

When to Enable Unoptimized Deployments#

Enable unoptimized deployments when:

Testing models on newly released AMD hardware (e.g., MI325X)
Participating in early access programs or demos
Evaluating preview or experimental features
Working with AMD support on performance optimization
Deploying models that don’t yet have optimized profiles for your hardware

Do not enable unoptimized deployments for:

Production workloads
Performance benchmarking or comparisons
Customer-facing applications
Any use case where performance matters

How to Enable Unoptimized Deployments#

Warning

Unoptimized deployments may result in significantly lower performance. Only enable this option if you specifically need access to preview or experimental configurations.

Navigate to the Models page and go to the AIM Catalog tab
Click Deploy on your chosen model
Scroll to the Unoptimized deployment section
Toggle the Allow switch to enable unoptimized deployments

When enabled, the AIM deployment system will:

First attempt to use optimized profiles (if available)
Fall back to preview profiles if no optimized profile exists
Use unoptimized profiles only if no better option is available

When disabled (default), the deployment will fail if no optimized profiles are available for your hardware configuration.

Important

Even with this option enabled, the system always prefers optimized configurations when available. The toggle simply allows the use of preview and unoptimized profiles when necessary.

Profile Selection Priority#

The deployment system selects profiles in the following order:

With “Allow unoptimized” disabled (default):

Optimized profiles only
Deployment fails if no optimized profiles are available

With “Allow unoptimized” enabled:

Optimized profiles (highest priority)
Preview profiles (if no optimized available)
Unoptimized profiles (if no optimized or preview available)
Deployment fails only if no profiles exist at all

Performance Expectations#

When using unoptimized deployments with preview or unoptimized profiles, you may experience:

Higher latency per request
Lower throughput (requests per second)
Increased resource utilization
Less stable performance characteristics
Reduced GPU efficiency

These limitations are typically temporary as AMD continues to optimize configurations for new hardware and models. Preview profiles that demonstrate stable performance are regularly promoted to optimized status.

Combining Deployment Options#

You can combine both performance metric selection and unoptimized deployment options:

Example: Deploying a model on MI325X hardware with throughput optimization

Select Throughput as your performance metric
Enable Allow unoptimized deployments (if MI325X profiles are still in preview/unoptimized status)
Deploy the model

The system will select the best available throughput-focused profile, prioritizing optimized configurations first, then falling back to preview or unoptimized profiles if necessary.

Monitoring Deployment Performance#

After deploying with custom options, monitor your deployment’s performance:

Navigate to the Workloads page
Select Open details from your inference workload
View the Inference metrics section

Key metrics to watch:

Time to First Token: Indicates responsiveness (latency-focused)
Throughput: Shows requests per second (throughput-focused)
Resource Utilization: Helps identify if configuration is appropriate

See Inference Metrics for detailed information about monitoring your deployments.

Best Practices#

Start with defaults: Use default settings unless you have specific requirements
Use optimized profiles only in production: Keep “Allow unoptimized” disabled for production workloads
Test thoroughly: When using preview or unoptimized deployments, conduct thorough performance testing before production use
Monitor actively: Keep an eye on metrics to ensure your configuration meets expectations
Stay updated: Check release notes for when preview profiles are promoted to optimized
Contact support: Work with AMD support when deploying on new hardware or encountering performance issues

Troubleshooting#

Deployment Fails with “No optimized profile available”#

If your deployment fails with this error:

Verify your hardware is supported for this model
Consider enabling “Allow unoptimized deployments” if you need to proceed with preview or experimental configurations
Check the AIM catalog for alternative models that have optimized profiles for your hardware
Contact AMD support for information about optimization timelines

Performance Lower Than Expected#

If your deployed model shows poor performance:

Check if you’re using an unoptimized or preview profile (visible in deployment details)
Try different performance metrics (latency vs throughput)
Review the inference metrics to identify bottlenecks
Ensure you’re using the latest AIM version (check for updates in the catalog)