Working in AI Workspaces Hands On#
This guide teaches you how to work in the AMD AI Workbench workspace using a Jupyter Lab notebook.
Tutorial: fine-tune Llama-3.1 8B with torchtune#
This tutorial demonstrates how to fine-tune the Llama-3.1 8B large language model (LLM) on AMD ROCm GPUs using torchtune. Torchtune is an easy-to-use PyTorch library for authoring, post-training, and experimenting with LLMs.
Access the tutorial here.
Tip
Skip steps 1-3 in the chapter Prepare the training environment, as these do not apply to the (Kubernetes) environment.
Tutorial: Prepare and upload a dataset to AMD AI Workbench#
1. Run the data preparation script in the Jupyter Notebook#
from huggingface_hub import hf_hub_download
import json
import os
import random
def convert_opi(input_file, output_file):
"""
Converts a JSON array of objects with 'instruction', 'input', and 'output' fields
into a JSONL file with the specified message format.
"""
with open(input_file, 'r') as f:
data = json.load(f)
with open(output_file, 'w') as f:
for row in data:
if all(k in row for k in ("instruction", "input", "output")):
line = {
"messages": [
{"role": "user", "content": f"{row['instruction']} Sequence: {row['input']}"},
{"role": "assistant", "content": row["output"]}
]
}
f.write(json.dumps(line) + "\n")
def create_sample(input_jsonl, output_jsonl, n):
"""Create a random sample of n lines from input_jsonl and write to output_jsonl."""
with open(input_jsonl, 'r') as f:
lines = f.readlines()
sample = random.sample(lines, min(n, len(lines)))
with open(output_jsonl, 'w') as f:
f.writelines(sample)
repo_id = "BAAI/OPI"
target_dir = "./datasets"
output_dir = "./datasets"
data_in = [
"OPI_DATA/OPI_updated_160k.json",
]
create_sample_n = 1000 # Set to None to disable
for file in data_in:
hf_hub_download(repo_id=repo_id,
filename=file,
repo_type="dataset",
local_dir=target_dir)
print('Downloaded', file)
file_out = file.split('/')[1].replace(".json", ".jsonl")
out_path = os.path.join(output_dir, file_out)
convert_opi(os.path.join(target_dir, file), out_path)
print('Converted', file, 'to', file_out)
if create_sample_n is not None:
sample_out = out_path.replace(".jsonl", f".sample{create_sample_n}.jsonl")
create_sample(out_path, sample_out, create_sample_n)
print(f'Created random sample of {create_sample_n} lines: {sample_out}')
2. Upload the dataset to the AMD AI Workbench catalog#
You can upload a dataset to the platform using the API. See the examples below.
Example: API call using Curl
curl -X 'POST' \
'https://api-demo.silogen.ai/v1/datasets/upload?project_id=YOUR_PROJECT_UUID_HERE' \
-H 'accept: application/json' \
-H 'Authorization: Bearer YOUR_TOKEN_HERE' \
-H 'Content-Type: multipart/form-data' \
-F 'name=string' \
-F 'description=string' \
-F 'type=Fine-tuning' \
-F '[email protected]'
Example: API call using Python
from pathlib import Path
import requests, certifi
BASE_URL = "https://api-demo.silogen.ai/v1/datasets/upload?project_id=ADD_YOUR_PROJECT_ID"
file_path = Path("path_to_your_dataset")
headers = {"accept": "application/json", "Authorization": "Bearer ADD_YOUR_TOKEN"}
data = {"name": "dataset_name", "description": "dataset_decription", "type": "Fine-tuning"}
with file_path.open("rb") as f:
response = requests.post(
url=BASE_URL,
headers=headers,
data=data,
files={"jsonl": f},
verify=certifi.where(),
timeout=300,
)
print(response.json())