Axolotl Update v0.7.0 - with GRPO

New fixes and updates to parameters since December 2024

Axolotl AI

Feb 18, 2025

Releases and updates to Axolotl AI on Github

Release Announcements

v0.7.0

Axolotl v0.7.0 is out!

GRPO support
Process Reward Model support
KD Training from offline top-k logprobs
Multi-GPU LoRA kernels
Deploy your training and evaluation workloads straight to Modal from the axolotl CLI
Sweeps
Chat template parsing improvements
Improved Mac OS support
Dependency upgrades & lots of various fixes

Process Reward Models

Take your test-time scaling to new heights by training your own Process Reward Models (PRM)! Thanks to PRM training support in @huggingface TRL we’ve streamlined fine-tuning and configuration of PRMs, which can be used as powerful step-by-step verifiers for reasoning models. We’ve also open-sourced several datasets which you can use out-of-the-box with our trainer, and a cookbook to help you evaluate your trained PRMs. Check out our blogpost below for more details.

Blog Post: https://axolotlai.substack.com/p/Process-Reward-Models

Cookbook: https://github.com/axolotl-ai-cloud/axolotl-cookbook/prm

Collection: PRM 🤗 Collection: https://huggingface.co/collections/axolotl-ai-co/process-reward-models-67b4b4355da4e1fe6ba44875

KD Training from offline top-k logprobs

The software stack for knowledge distillation from teacher models is now much simpler by simply leveraging top-k logprobs (instead of logits) from off-the-shelf inference engines like @vllm_project. We have online top-k KD on our roadmap for a future release.

Many thanks to Charles Goddard, Fernando and Lucas from @arcee_ai for their guidance on this.

Modal Deployment

Deploying your workloads to @modal_labs is now simpler through your local axolotl CLI. Just configure your cloud resources in a YAML file, and our CLI takes care of everything else.

Multi-GPU LoRA kernels

Accelerate your LoRA and QLoRA post-training runs using our newly implemented Triton kernels and custom autograd functions! Inspired by Unsloth, these optimizations can be patched into common LLM architectures in order to speedup model forward and backward passes ~25-50%, and save ~25-40% peak VRAM usage. Check out our forthcoming blog post for more details.

GRPO

Post-train your models using the latest SOTA RL technique pioneered by DeepSeek. We make it easier to configure your GRPO workload in @huggingface TRL. We’ve upstreamed our PEFT + vLLM support to TRL to improve the efficiency of post-training with GRPO.

Cookbook:

GitHubaxolotl-cookbook/grpo at main · axolotl-ai-cloud/axolotl-coo…

Release Notes: https://github.com/axolotl-ai-cloud/axolotl/releases/tag/v0.7.0

See All that Changed

fix build w pyproject to respect insalled torch version by @winglian in #2168
evaluation_strategy was fully deprecated in recent release by @winglian in #2169
parity for nightly ci - make sure to install setuptools by @winglian in #2176
pin transformers to 4.47.0 by @winglian in #2180
[feature] add pytorch profiling by @winglian in #2182
Basic evaluate CLI command / codepath by @djsaunde in #2188
transformers 4.47.1 by @winglian in #2187
Add hub model id config options to all example yml files. by @bursteratom in #2196
move the setting of PYTORCH_CUDA_ALLOC_CONF to the cli rather than train module by @winglian in #2183
use axolotl contribs for fix_untrained_tokens by @winglian in #2194
upgrade to liger 0.5.2 by @winglian in #2181
dataset tags don't support https uris by @winglian in #2195
fix: use apply_chat_template to find turn boundaries and allow tool_calling field by @NanoCode012 in #2179
handle torch_compile set to auto by @winglian in #2172
use DataCollatorWithFlattening when not sample packing by @winglian in #2167
remove cicd pytest xdist args by @djsaunde in #2201
add outputs (symlink) to gitignore by @winglian in #2205
adding test_datasets compat with pretraining_dataset (streaming) by @djsaunde in #2206
move the dataset loading from remote/disk to a shared function so we can re-use for RL by @winglian in #2204
GC every n steps by @winglian in #2209
add deepspeed example with torch compile enabled by @winglian in #2212
inference - don't default w accelerate, fix base model by @winglian in #2216
fix untrained tokens if specified explicitly from a list by @winglian in #2210
fix: allow trainer builder to use custom jinja chat template by @NJordan72 in #2219
make sure padding is labeled as -100 for pretraining by @winglian in #2227
Fixing OSX installation by @SalmanMohammadi in #2231
Merge group queue by @winglian in #2248
fix: mistral nemo does not recognize token_type_ids in forward by @NanoCode012 in #2233
add hf cache caching for GHA by @winglian in #2247
update modal version for ci by @winglian in #2242
feat: use SequentialSampler if curriculum_sampling is enabled with sample_packing by @v-dicicco in #2235
feat: add support for data_files in pretraining by @NanoCode012 in #2238
update upstream HF deps by @winglian in #2239
rename liger test so it properly runs in ci by @winglian in #2246
use 2.5.1 docker images as latest tag as it seems stable by @winglian in #2198
add helper to verify the correct model output file exists by @winglian in #2245
assume empty lora dropout means 0.0 and add tests by @winglian in #2243
skip over rows in pretraining dataset by @winglian in #2223
CLI cleanup and documentation by @djsaunde in #2244
rename references to dpo dataset prep to pref data by @winglian in #2258
fix: use text_column even when not packing for pretraining by @NanoCode012 in #2254
fix for indexing error inside torch.embeddings caused by num embeddings > num tokens in tokenizer by @jwongTensora in #2257
option to not concatenate during pretraining by @winglian in #2263
Add 5000 line history limit to tmux for docker cloud by @adi-kmt in #2268
use the extracted field_messages to parse the role fields by @winglian in #2265
support for latest transformers release 4.48.1 by @winglian in #2256
chore(doc): fix explanation on gcs creds retrieval by @NanoCode012 in #2272
Take split param from config in all load_dataset instances by @mashdragon in #2281
chore(doc): improve explanation for *_steps and *_strategy by @NanoCode012 in #2270
Pretrain multipack by @winglian in #2278
support for custom lr groups for non-embedding modules by @winglian in #2213
bump bnb to 0.45.1 by @winglian in #2289
chore: refactor SaveModelCallback to stop handle fractional save_steps by @NanoCode012 in #2291
Num epochs float by @mashdragon in #2282
Removing torch 2.3.1 by @SalmanMohammadi in #2294
Process reward models by @SalmanMohammadi in #2241
Ray Train Axolotl Integration by @erictang000 in #2251
native support for modal cloud from CLI by @winglian in #2237
Defaulting to fused=True AdamW by @SalmanMohammadi in #2293
match the cuda version for 2.4.1 build w/o tmux by @winglian in #2299
make save_safetensors: true the default by @winglian in #2292
refactor README; hardcode links to quarto docs; add additional quarto doc pages by @djsaunde in #2295
Misc fixes 20250130 by @winglian in #2301
fix: add warning for invalid eval_steps or save_steps by @NanoCode012 in #2298
KD Trainer V2 by @winglian in #2303
set MODAL_IMAGE_BUILDER_VERSION=2024.10 to 2024.10 to test latest builder by @winglian in #2302
better handling of multipack dataset length by @winglian in #2296
[feature] sweeps by @winglian in #2171
fix: drop long seq even if not sample packing by @NanoCode012 in #2211
Torch 2.6 support for base docker image by @winglian in #2312
feat: add torch2.6 to ci by @NanoCode012 in #2311
batch add of spectrum snr results by @winglian in #2320
bump transformers to 4.48.3 by @winglian in #2318
feat: update FA to 2.7.4.post1 which includes torch2.6 binary by @NanoCode012 in #2315
chore: remove redundant py310 from tests by @NanoCode012 in #2316
fix(config): missing config not being documented and fix model_ override by @NanoCode012 in #2317
feat(doc): Add multi-node torchrun info by @NanoCode012 in #2304
Update faq.qmd by @bursteratom in #2319
lint docs by @winglian in #2327
disable ray tests for latest torch release by @winglian in #2328
[Fixing #2149] load_from_disk for RL-type training by @leeparkuky in #2193
GRPO by @winglian in #2307
feat(doc): Improve guide to dataset types with better examples by @NanoCode012 in #2286
feat(doc): add tensorboard config to docs by @NanoCode012 in #2329
Add bos_token and add_generation_prompt to the alpaca chat template by @minpeter in #2322
fix: add missing shards_idx, preprocess_shards to docs and validator by @NanoCode012 in #2331
add support for include_tokens_per_second in training args by @winglian in #2269
Select input_ids explicitly after panda conversion by @seungduk-yanolja in #2335
Activation function Triton kernels, LoRA custom autograd functions by @djsaunde in #2324
Move sweeps code to separate module by @djsaunde in #2338
feat: add config for optional parameters in a chat message by @NJordan72 in #2260
chore: cleanup deprecated config elements by @NJordan72 in #2309
v0.7.0 for release by @winglian in #2341

Join us on Discord, Axolotl-AI

Axolotl Update v0.7.0 - with GRPO

New fixes and updates to parameters since December 2024

Release Announcements

See All that Changed

Discussion about this post