Kkit.llm_utils
LLM utils
LoRA fine tuning server
This service provides an API endpoint for fine-tuning large language models using Low-Rank Adaptation (LoRA) technique. Built with FastAPI, it supports asynchronous training jobs with progress tracking and integrates with Weights & Biases for experiment logging.
Install
pip install git+https://github.com/erwinliyh/kylis_kit@main[llm]
Install flash attantion (optional):
conda install -c nvidia cuda-python # (optional)
pip install flash_attn
Start server
kkit-lora-server --base_path /path/to/models --port 8000
Or call train_model
in one line without server, see example.
API Endpoints
- Start Training (
POST /train
)
Request Format:
{
"config": {
"model_name": "Qwen/Qwen2.5-0.5B",
"lora_path": null,
"lora_rank": 8,
"lora_alpha": 32,
"lora_dropout": 0.05,
"epochs": 3,
"batch_size": 4,
"learning_rate": 3e-4,
"max_length": null,
"model_save_path": "my_lora_model",
"response_template": "<|im_start|>assistant\n",
"lora_target_modules": "all-linear",
"lora_modules_to_save": ["lm_head", "embed_token"],
"tokenizer_padding_side": "left",
"attn_implementation": "flash_attention_2"
},
"file": "<UPLOADED_JSON_FILE>"
}
Parameters:
config
: Training configuration object (see Configuration Options)file
: Training dataset in JSON format (see Dataset Format)
Responses:
- 202 Accepted: Training started successfully
- 409 Conflict: Training already in progress
- 500 Internal Server Error: File upload failed
2. Get Training Status (GET /status
)
Response Format:
{
"status": "training",
"message": "Training...",
"current_step": 150,
"total_steps": 1000,
"current_epoch": 1,
"total_epochs": 3,
"model_path": "/path/to/models/my_lora_model" // Only in completed status
}
Possible status values: idle
, training
, completed
, error
Configuration Options
Parameter | Type | Default | Description |
---|---|---|---|
model_name | str | "Qwen/Qwen2.5-0.5B" | Base model identifier |
lora_path | str? | null | Path to existing LoRA checkpoint |
lora_rank | int | 8 | LoRA rank dimension |
lora_alpha | int | 32 | LoRA alpha scaling factor |
lora_dropout | float | 0.05 | LoRA dropout rate |
epochs | int | 3 | Number of training epochs |
batch_size | int | 4 | Per-device batch size |
learning_rate | float | 3e-4 | Training learning rate |
max_length | int? | null | Maximum sequence length |
model_save_path | str? | null | Custom model output path |
response_template | str | "<|im_start|>assistant\n" | Response separator template |
lora_target_modules | List/str | "all-linear" | Modules to apply LoRA to |
lora_modules_to_save | List[str] | ["lm_head", "embed_token"] | Modules to fully train |
tokenizer_padding_side | str? | "left" | Tokenizer padding direction |
attn_implementation | str | "flash_attention_2" | Attention implementation |
Dataset Format
JSON format with chat-style conversations:
{
"messages": [
{"role": "user", "content": "What color is the sky?"},
{"role": "assistant", "content": "It's blue on Earth."}
]
}
Requirements:
Each entry must have alternating user/assistant messages
File must be valid JSON lines with .jsonl extension
Recommended size: 100-10,000 examples
Example Usage of API
Starting Training
curl -X POST "http://localhost:8000/train" \
-H "Content-Type: multipart/form-data" \
-F "config={\"epochs\": 3, \"batch_size\": 4};type=application/json" \
-F "file=@training_data.json;type=application/json"
Monitoring Progress
curl http://localhost:8000/status