Hi, Impressive work! I want to ask how to extract features from my own video-text datasets for finetuning model?
Hi,
Impressive work! I want to ask how to extract features from my own video-text datasets for finetuning model?