GitHub - idiap/MTGS: MTGS jointly models gaze following and social gaze prediction in multi-person scenarios. It leverages temporal information and interactions among individuals to enhance gaze prediction accuracy.

Overview

MTGS: A Novel Framework for Multi-Person Temporal Gaze Following and Social Gaze Prediction
Anshul Gupta, Samy Tafasca, Arya Farkhondeh, Pierre Vuillecard, Jean-Marc Odobez
NeurIPS 2024
[Paper] [Video] [Dataset]

This repository provides the official code and checkpoints for the MTGS model, as introduced in our NeurIPS paper, MTGS: A Novel Framework for Multi-Person Temporal Gaze Following and Social Gaze Prediction.

Model

MTGS jointly models gaze following and social gaze prediction in multi-person scenarios. It leverages temporal information and interactions among individuals to enhance gaze prediction accuracy.

Results

We release an improved version of MTGS that uses a frozen DINOv2 encoder, providing significant gains over the original model and surpassing recent state-of-the-art methods.

Results on GazeFollow test set:

Method	Resolution	AUC	Avg. Dist.	Min. Dist.
Gaze-LLE [CVPR'25]	448x448	0.956	0.104	0.045
MTGS-static [NeurIPS'24]	224x224	0.929	0.116	0.059
MTGS-DINO-static	224x224	0.944	0.101	0.045
MTGS-DINO-static	448x448	0.949	0.098	0.043

Results on the VideoAttentionTarget test set:

Method	Resolution	Dist.	AP_IO
Gaze-LLE [CVPR'25]	448x448	0.107	0.897
MTGS-static [NeurIPS'24]	224x224	0.114	0.843
MTGS (temporal) [NeurIPS'24]	224x224	0.112	0.845
MTGS-DINO-static	224x224	0.102	0.852
MTGS-DINO-static	448x448	0.096	0.878
MTGS-DINO (temporal)	448x448	0.096	0.888

Results on the VSGaze test set:

Method	Resolution	Dist.	AP_IO	F1_LAH (PP)	F1_LAEO (PP)	AP_SA
MTGS-static [NeurIPS'24]	224x224	0.108	0.946	0.806	0.599	0.555
MTGS (temporal) [NeurIPS'24]	224x224	0.107	0.940	0.812	0.603	0.576
MTGS-DINO-static	224x224	0.096	0.951	0.833	0.624	0.551
MTGS-DINO-static	448x448	0.089	0.960	0.832	0.621	0.595
MTGS-DINO (temporal)	448x448	0.087	0.961	0.837	0.626	0.596

If you use this updated version, please reference these improved results in your work.

Setup

Download Data

Follow the instructions on the VSGaze dataset page to download and prepare the data.

After downloading, update the paths in mtgs/config/config.yaml:

Set ann_root to the directory containing the annotation H5 files.
Set root to the root path of the images for each dataset.

Setup Conda Environment

We use PyTorch for our experiments.

Create a conda environment:

conda create -n mtgs-env python=3.8
conda activate mtgs-env
pip install -r requirements.txt

Then install the MTGS package. Go to the root directory and run:

pip install -e .

NOTE: All scripts are inside the scripts folder.

Demo

Update YOLO_CHECKPOINT_FILE to point to the head detection model checkpoint (we use a pre-trained YOLOv5 model trained on the CrowdHuman dataset).
Also update MTGS_CHECKPOINT_FILE with the path to the mtgs-static-vsgaze model, and set the paths to VIDEO_FILE and OUTPUT_FOLDER.

bash demo.sh

Training

Train on GazeFollow

Update GAZE_WEIGHTS to point to the pre-trained gaze backbone:

bash train_gazefollow.sh

Train on VSGaze

Update WEIGHTS to point to the GazeFollow pre-trained checkpoint:

bash train_vsgaze.sh

This trains the temporal MTGS model.

NOTE:

You can also train on individual datasets in VSGaze by setting the experiment.dataset variable to childplay, videocoatt, uco_laeo or vat.
Depending on batch size and dataset, you may need to update the data.num_steps variable to ensure a smooth learning rate schedule.

Testing

Update the TEST_CHECKPOINT path:

bash test_vat.sh

This performs evaluation on VideoAttentionTarget. To perform evaluation on a different dataset, set DATASET to one of gazefollow, vsgaze, childplay, videocoatt, uco_laeo or vat.

We provide additional scripts for computing in-out, post-processing metrics under mtgs/performance/.

Pre-trained Models

Pre-trained models are available on Huggingface.

Model Name	Download Link	Description
`mtgs-static-gazefollow`	Link	MTGS model trained on GazeFollow without temporal modeling.
`mtgs-static-vsgaze`	Link	MTGS model trained on VSGaze without temporal modeling.
`mtgs-vsgaze`	Link	MTGS model trained on VSGaze with temporal modeling.
`resnet18-gaze360`	Link	ResNet-18 trained on Gaze360, used for initializing the gaze backbone.

Citation

If you use our code, please cite:

@article{gupta2024mtgs,
title={MTGS: A Novel Framework for Multi-Person Temporal Gaze Following and Social Gaze Prediction},
author={Gupta, Anshul and Tafasca, Samy and Farkhondeh, Arya and Vuillecard, Pierre and Odobez, Jean-Marc},
journal={Advances in Neural Information Processing Systems},
volume={37},
pages={15646--15673},
year={2024}
}

References

Parts of the code are adapted from the ViT-Adapter codebase. The demo code uses a pre-trained YOLOv5 based head detection model.

The code template was based on Sharingan.

We thank the authors for their contributions.

License

This code is released under the GPL-3.0-only license. See the LICENSE file for more details. Please also see the DEPENDENCIES file for license information regarding third-party software used in this project.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
LICENSES		LICENSES
THIRD_PARTY		THIRD_PARTY
images		images
mtgs		mtgs
scripts		scripts
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Model

Results

Setup

Download Data

Setup Conda Environment

Demo

Training

Train on GazeFollow

Train on VSGaze

Testing

Pre-trained Models

Citation

References

License

About

Uh oh!

Releases

Packages

Languages

idiap/MTGS

Folders and files

Latest commit

History

Repository files navigation

Overview

Model

Results

Setup

Download Data

Setup Conda Environment

Demo

Training

Train on GazeFollow

Train on VSGaze

Testing

Pre-trained Models

Citation

References

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages