Skip to content

FudanCVL/FeVOS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FeVOS: Foresight Expression Video Object Segmentation

arXiv Paper Hugging Face Dataset Hugging Face Model

Kehan Lan    Kaining Ying    Henghui Ding ✉️

Fudan University, China

ECCV 2026

Teaser

Abstract

Existing Referring Video Object Segmentation tasks focus on referring expressions describing events, actions or appearances of relevant objects within the observed frames, lacking evaluation in scenarios that require pre-decisive spatio-temporal reasoning, thereby limiting their applicability. To address this, we propose Foresight Expression Video Object Segmentation, a task that queries future events in upcoming video segments and requires masks of the objects in the observed frames as visual answers. For example, in ego-centric scenes, the question "What tool will be used?" demands reasoning over spatio-temporal cues to predict the masks of the next tool to be used, which helps with the understanding of future actions and decisions. To support this task, we introduce FeVOS, a dataset with 968 video clips, 14,525 foresight expressions, and 2,904 chain-of-thought annotations to provide explicit and interpretable reasoning steps. We further develop FeVOS-R1, an MLLM-based model trained on our dataset via a two-stage pipeline of supervised fine-tuning and reinforcement learning. FeVOS-R1 not only achieves state-of-the-art performance on FeVOS, but also demonstrates strong generalization to existing RVOS benchmarks. We hope this work can inspire more research on predictive reasoning in video perception.

🛠️ TODO List 🛠️

  • Release the dataset and model on Huggingface.

  • Release the complete training configuration and code.

📦 Installation 📦

conda env create -f environment.yml

conda activate vlm

mkdir -p data
mkdir -p models

Getting Started

🧪 Dataset 🧪

Please download the complete benchmark from huggingface 🤗. And put it in data/.

Evaluation:

To evaluate our baseline model FeVOS-R1, firstly download the model from huggingface 🤗. And put it in models/

Then, run the following command to evaluate the model on the FeVOS benchmark:

bash scripts/reproduce_eval.sh

Training:

Preparing.

Acknowledgments

We would like to express our gratitude to some other projects that have contributed to our work:

Citation

If you find our paper and dataset useful for your research, please generously cite our paper.

@inproceedings{lan2026fevos,
  title={FeVOS: Foresight Expression Video Object Segmentation},
  author={Lan, Kehan and Ying, Kaining and Ding, Henghui},
  booktitle={ECCV},
  year={2026}
}

About

[ECCV2026] FeVOS: Foresight Expression Video Object Segmentation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors