Skip to content

Pinned Loading

  1. understand-r1-zero understand-r1-zero Public

    Understanding R1-Zero-Like Training: A Critical Perspective

    Python 1.2k 56

  2. zero-bubble-pipeline-parallelism zero-bubble-pipeline-parallelism Public

    Forked from NVIDIA/Megatron-LM

    Zero Bubble Pipeline Parallelism

    Python 449 31

  3. lorahub lorahub Public

    [COLM 2024] LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition

    Python 668 39

  4. oat oat Public

    🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.

    Python 625 59

  5. stde stde Public

    Official implementation of Stochastic Taylor Derivative Estimator (STDE) NeurIPS2024

    Python 128 10

  6. feedback-conditional-policy feedback-conditional-policy Public

    Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"

    Python 58 2

Repositories

Showing 10 of 100 repositories

Most used topics

Loading…