Restore shuffling of the validation set by odelalleau · Pull Request #123 · NVIDIA/NeMo-Aligner

odelalleau · 2024-03-08T21:45:20Z

What does this PR do ?

This is a follow-up to #112 (comment) to restore the shuffling of the validation set.

I implemented two versions and would appreciate feedback from @gshennvm / @gleibovich-nvidia / @trias702 on what you all prefer:

v1 (first commit 2b24550) is the more "minimal" version
v2 (second commit 1c02cdd) enforces setting consumed_samples in the trainer manually

Personally I have a slight preference for v2 because I prefer explicit to implicit -- the implicit update of consumed_samples in the sampler can be dangerous (as we've experienced), and also leads to a weird behavior where the len() of the sampler decreases as we iterate over it (which is quite unusual for objects we typically iterate on). I realize however that this adds an extra line to the trainers and diverges from what happens within NeMo, so I'm ok with v1 as well if I'm in the minority here.

I will also add the sister PR to NeMo but I want to get some feedback first to decide on the final implementation.

(currently marked as draft until we agree and I properly test this, but please do provide feedback now)

gshennvm · 2024-03-11T21:25:21Z

thanks for the change. I looked over it and am ok with going with 2.

One thing that is nice is that now only our trainer is abstractly stateful, the data sampler is no longer keeping any state

gleibovich-nvidia · 2024-03-12T09:08:36Z

Thanks @odelalleau!
I actually prefer v1, as I find this version clearer and more explicit in terms of the difference between val/train dataloader use-cases. It explicitly defines an extra parameter to the dataloader instantiation here.
For instance, for me, when I first worked with PPO with Aligner, I was surprised to see the same single batch being used for validation again and again as it turned out from the wandb experiment. Observing the explicit difference in the dataloaders parameters would be clearer in this sense, for a new user. With v2, I think it would be harder for a new user going through the code to grasp what's going on between train and val use-cases.

I am fine with v2 as well, though maybe in this case, to make it clearer, we should document the val_dataloader irregular usage (with regard to creating a new iterator every validation run) around the dataloader instantiation or in the run_validation method?

trias702 · 2024-03-12T21:43:20Z

thanks for the change. I looked over it and am ok with going with 2.

One thing that is nice is that now only our trainer is abstractly stateful, the data sampler is no longer keeping any state

Only for the validation sampler, the training sampler is still stateful, unless I'm misunderstanding something?

trias702 · 2024-03-12T21:50:09Z

I have a slight personal preference for v1 as it results in less code to maintain, and removes an extra API-binding-contract which an algo writer must obey in terms of providing the update directly to the dataloader/sampler, which is a bit of an odd dynamic.

That being said, the comp sci professor in me prefers the explicit safety of v2, and notes that it is the philosophically more correct of the two.

So honestly, hard to say, I would be happy with either approach. I note that we have 1 vote for v1, 1 vote for v2, and Olivier prefers v2, so to not create a deadlock, count me in for v2 as well.

trias702

LGTM!

gshennvm · 2024-03-13T00:08:29Z

thanks for the change. I looked over it and am ok with going with 2.
One thing that is nice is that now only our trainer is abstractly stateful, the data sampler is no longer keeping any state

Only for the validation sampler, the training sampler is still stateful, unless I'm misunderstanding something?

https://github.com/NVIDIA/NeMo-Aligner/pull/123/files#diff-37bc718ace18b8050161b5dd54c54ac6fd8e58c874db0b3ca4a8649bef80095bR212

It is stateful but the trainer now overwrites the statefulness, so our trainer state determines everything.

odelalleau added 3 commits March 8, 2024 21:31

Restore the shuffling of the validation set

2b24550

Enforce explictly setting consumed_samples

1c02cdd

Update CHANGELOG

aee8c52

odelalleau requested review from gleibovich-nvidia, gshennvm and trias702 March 8, 2024 21:45

odelalleau marked this pull request as draft March 8, 2024 21:45

github-actions Bot added Utils Algorithms labels Mar 8, 2024

trias702 approved these changes Mar 12, 2024

View reviewed changes

gleibovich-nvidia approved these changes Mar 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restore shuffling of the validation set#123

Restore shuffling of the validation set#123
odelalleau wants to merge 3 commits into
mainfrom
od/val-shuffle

odelalleau commented Mar 8, 2024

Uh oh!

gshennvm commented Mar 11, 2024

Uh oh!

gleibovich-nvidia commented Mar 12, 2024

Uh oh!

trias702 commented Mar 12, 2024

Uh oh!

trias702 commented Mar 12, 2024

Uh oh!

trias702 left a comment

Uh oh!

gshennvm commented Mar 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

odelalleau commented Mar 8, 2024

What does this PR do ?

Uh oh!

gshennvm commented Mar 11, 2024

Uh oh!

gleibovich-nvidia commented Mar 12, 2024

Uh oh!

trias702 commented Mar 12, 2024

Uh oh!

trias702 commented Mar 12, 2024

Uh oh!

trias702 left a comment

Choose a reason for hiding this comment

Uh oh!

gshennvm commented Mar 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants