GitHub - FudanCVL/FeVOS: [ECCV2026] FeVOS: Foresight Expression Video Object Segmentation

FeVOS: Foresight Expression Video Object Segmentation

Kehan Lan Kaining Ying Henghui Ding^✉️

Fudan University, China

ECCV 2026

Abstract

Existing Referring Video Object Segmentation tasks focus on referring expressions describing events, actions or appearances of relevant objects within the observed frames, lacking evaluation in scenarios that require pre-decisive spatio-temporal reasoning, thereby limiting their applicability. To address this, we propose Foresight Expression Video Object Segmentation, a task that queries future events in upcoming video segments and requires masks of the objects in the observed frames as visual answers. For example, in ego-centric scenes, the question "What tool will be used?" demands reasoning over spatio-temporal cues to predict the masks of the next tool to be used, which helps with the understanding of future actions and decisions. To support this task, we introduce FeVOS, a dataset with 968 video clips, 14,525 foresight expressions, and 2,904 chain-of-thought annotations to provide explicit and interpretable reasoning steps. We further develop FeVOS-R1, an MLLM-based model trained on our dataset via a two-stage pipeline of supervised fine-tuning and reinforcement learning. FeVOS-R1 not only achieves state-of-the-art performance on FeVOS, but also demonstrates strong generalization to existing RVOS benchmarks. We hope this work can inspire more research on predictive reasoning in video perception.

🛠️ TODO List 🛠️

Release the dataset and model on Huggingface.
Release the complete training configuration and code.

📦 Installation 📦

conda env create -f environment.yml

conda activate vlm

mkdir -p data
mkdir -p models

Getting Started

🧪 Dataset 🧪

Please download the complete benchmark from huggingface 🤗. And put it in data/.

Evaluation:

To evaluate our baseline model FeVOS-R1, firstly download the model from huggingface 🤗. And put it in models/

Then, run the following command to evaluate the model on the FeVOS benchmark:

bash scripts/reproduce_eval.sh

Training:

Preparing.

Acknowledgments

We would like to express our gratitude to some other projects that have contributed to our work:

Sa2VA & VLM-R1 & ReVOS & MeViS

Citation

If you find our paper and dataset useful for your research, please generously cite our paper.

@inproceedings{lan2026fevos,
  title={FeVOS: Foresight Expression Video Object Segmentation},
  author={Lan, Kehan and Ying, Kaining and Ding, Henghui},
  booktitle={ECCV},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
projects		projects
scripts		scripts
third_parts		third_parts
tools/eval		tools/eval
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FeVOS: Foresight Expression Video Object Segmentation

Abstract

🛠️ TODO List 🛠️

📦 Installation 📦

Getting Started

🧪 Dataset 🧪

Evaluation:

Training:

Acknowledgments

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

FeVOS: Foresight Expression Video Object Segmentation

Abstract

🛠️ TODO List 🛠️

📦 Installation 📦

Getting Started

🧪 Dataset 🧪

Evaluation:

Training:

Acknowledgments

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages