Restore shuffling of the validation set#123
Conversation
|
thanks for the change. I looked over it and am ok with going with 2. One thing that is nice is that now only our trainer is abstractly stateful, the data sampler is no longer keeping any state |
|
Thanks @odelalleau! I am fine with v2 as well, though maybe in this case, to make it clearer, we should document the |
Only for the validation sampler, the training sampler is still stateful, unless I'm misunderstanding something? |
|
I have a slight personal preference for v1 as it results in less code to maintain, and removes an extra API-binding-contract which an algo writer must obey in terms of providing the update directly to the dataloader/sampler, which is a bit of an odd dynamic. That being said, the comp sci professor in me prefers the explicit safety of v2, and notes that it is the philosophically more correct of the two. So honestly, hard to say, I would be happy with either approach. I note that we have 1 vote for v1, 1 vote for v2, and Olivier prefers v2, so to not create a deadlock, count me in for v2 as well. |
It is stateful but the trainer now overwrites the statefulness, so our trainer state determines everything. |
What does this PR do ?
This is a follow-up to #112 (comment) to restore the shuffling of the validation set.
I implemented two versions and would appreciate feedback from @gshennvm / @gleibovich-nvidia / @trias702 on what you all prefer:
consumed_samplesin the trainer manuallyPersonally I have a slight preference for v2 because I prefer explicit to implicit -- the implicit update of
consumed_samplesin the sampler can be dangerous (as we've experienced), and also leads to a weird behavior where thelen()of the sampler decreases as we iterate over it (which is quite unusual for objects we typically iterate on). I realize however that this adds an extra line to the trainers and diverges from what happens within NeMo, so I'm ok with v1 as well if I'm in the minority here.I will also add the sister PR to NeMo but I want to get some feedback first to decide on the final implementation.
(currently marked as draft until we agree and I properly test this, but please do provide feedback now)