Skip to content
This repository was archived by the owner on Nov 19, 2025. It is now read-only.

Alit/draftpp#106

Open
JRD971000 wants to merge 25 commits into
mainfrom
alit/draftpp
Open

Alit/draftpp#106
JRD971000 wants to merge 25 commits into
mainfrom
alit/draftpp

Conversation

@JRD971000

@JRD971000 JRD971000 commented Feb 14, 2024

Copy link
Copy Markdown
Collaborator

What does this PR do ?

Adding DRaFT algorithm for SD fine-tuning

Changelog

  • Please update the CHANGELOG.md under next version with high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

Checklist when contributing a new algorithm

  • Does the trainer resume and restore model state all states?
  • Does the trainer support all parallelism techniques(PP, TP, DP)?
  • Does the trainer support max_steps=-1 and validation?
  • Does the trainer only call APIs defined in alignable_interface.py?
  • Does the trainer have proper logging?

Additional Information

  • Related to # (issue)

@gshennvm gshennvm requested review from SahilJain314 and removed request for gshennvm February 14, 2024 19:31
Comment thread nemo_aligner/algorithms/draft.py Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file required after the move to SupervisedTrainer? I see it in the import statement for train_sd_draft.py, but nowhere else.

JRD971000 and others added 3 commits February 20, 2024 07:48
Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
JRD971000 and others added 6 commits February 20, 2024 08:12
Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
@@ -0,0 +1,55 @@
#!/bin/bash
PROJECT="NeMo-draft+"
WANDB="8256bec8f68d1a0ee4a3208685a8db0474d3806b"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove your wandb key

exp_manager.wandb_logger_kwargs.name=${ACTOR_WANDB_NAME} \
exp_manager.resume_if_exists=False \
exp_manager.explicit_log_dir=${DIR_SAVE_CKPT_PATH} \
exp_manager.wandb_logger_kwargs.project=${PROJECT} #&> /opt/nemo-aligner/examples/mm/logs/draft_log.txt &

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanup dead code

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry the launcher script should be removed altogether I guess, right?

@@ -0,0 +1,154 @@
# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copyright notices -> 2024 (all new files)

VAE_CKPT="/opt/nemo-aligner/checkpoints/vae.bin"
ACTOR_WANDB_NAME=DRaFT+-ws-LR_${LR}-KL_${KL_COEF}-BS_${ACTOR_GLOBAL_BATCH_SIZE}
DIR_SAVE_CKPT_PATH="/opt/nemo-aligner/draft_p_saved_ckpts"
DATASET_PATH="/opt/nemo-aligner/datasets/animals45.tar"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we shipping animals45.tar? if not, can we remove/make generic?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants