Dashboard

Dashboard#

The AI Workbench dashboard provides a high-level overview of recent activity and resource usage in AI Workbench. It summarizes your current workloads, showing how many are running or pending, and highlights GPU device and VRAM usage over time. Below the overview, a detailed list of workloads displays their type, status, resource consumption, creation time, and owner, allowing you to quickly assess system utilization and the state of your active inference jobs.

Workloads#

A workload is a batch job or a service running in the cluster with resources defined by the workload and, when used in a resource managed cluster, limited by the resource quota that the owner has. This page shows all started workloads. The view shows all workload statuses by default except Deleted.

If AI Workbench is used together with the Resource Manager and a user belongs to multiple projects, the project needs to be selected first from the top of the page.

Project selector

The Overview section gives a snapshot of your current AI Workbench usage. It shows the total number of workloads and their current state (running vs. pending), along with aggregated GPU device usage and total GPU VRAM consumption. This section is designed to let you quickly understand overall system load and resource utilization at a glance before diving into individual workloads.

Project overview

The paginated Workloads table shows the name and type of the workload, and how many resources the workload requires. The Status column shows the last known status of the workload and is not updated automatically.

Workload list

Actions#

Workload actions are available from the actions column by pressing the three-dot button.

Workload actions menu

Available actions#

Open details – Opens the workload details page which displays metadata about the workload, such as when it was created and by whom and what sub-components it consists of.
Open workspace – If the workload type is Workspace, selecting this action opens the workspace in a new browser tab.
Chat with model – If the workload type is Inference,opens the model pre-selected in the Chat view.
Connect to model – If the workload type is Inference, displays a connection info modal for interacting with the model programmatically.
View logs – Allows viewing logs and Kubernetes events produced by the workload.
Delete – Queues the workload for deletion.

Workload types#

Type	Description
Inference	Inference service (AIM or fine-tuned model)
Fine-Tuning	Model fine-tuning batch job which generates a new model
Workspace	Workspace for model development and experiments

Workload statuses#

Status	Description	Condition
Added	Workload has been created	Workload component creation has not started
Pending	Waiting to start	All components are in the `Pending` state
Running	Workload is being executed	Any component is in the `Running` state
Complete	Workload has finished successfully	All components are in the `Completed` state
Failed	An error has occurred and the workload did not complete	Any component is in the `Failed` state
Deleting	Workload is queued for removal	Delete started, but not all components are `Deleted`
Deleted	Workload successfully deleted	All components are in the `Deleted` state
Delete Failed	Delete failed and manual cleanup might be needed	Any component is in the `Delete Failed` state
Terminated	Execution has been terminated	All components are in the `Completed` or `Deleted` state
Unknown	Status cannot be determined

Workload logs#

Workload logs provide near real-time visibility into the execution and status of running workloads. The logging functionality allows users to monitor workload progress, troubleshoot issues, and analyze performance.

Accessing logs#

Logs can be accessed through the workload details view:

Navigate to the Dashboard, Models or Workspaces page
Open the action menu (⁝ icon) on a workload item and use the Open details action
Click the Logs button in the workload details page
Alternatively you can also access the logs directly from the workload action menu by selecting View logs

Log features#

Log sources: Currently we support stdout/stderr collection from the running workloads and Kubernetes events emitted by the workload pods
Real-time streaming: Logs can be streamed in near real-time for active workloads and stored for later analysis for completed workloads. To enable real-time streaming, click the Enable log streaming toggle in the log viewer.
Log levels: View logs by severity level (Trace, Debug, Info, Unknown, Warning, Error, Critical)
Timestamps: All log entries include precise timestamps for chronological tracking sorted in descending order (newest first)

Log retention#

Logs are retained until disk space is needed; there is no fixed retention period
When disk usage is high, older logs are rotated and deleted to free up space
Typical retention is currently 1–2 weeks, but this may vary depending on workload volume and disk usage
Pending workloads: Event logs are available upon workload creation
Active workloads: Workload logs are available immediately upon workload start
Completed workloads: Historical logs remain accessible until rotated out
Failed workloads: Runtime error logs are preserved like other logs, subject to rotation policy

Log types by workload#

Workload Type	Available Logs
Inference	Request/response logs, model loading, performance metrics
Fine-Tuning	Training progress, loss metrics, checkpoint saves
Workspace	Jupyter/VS Code server logs, user session activity