- We introduce a TokenMatcher, a novel framework designed to extract reliable cross-modality fine-grained person features, facilitating accurate cross-modality correspondences.
- We present the Diverse Tokens Neighbor Learning (DTNL) module, which identifies reliable neighbors. This capability allows the model to effectively capture modality-invariant and discriminative features.
- We propose the Homogeneous Fusion (HF) module, which aims to minimize the differences between various camera views, thereby drawing clusters with the same identity closer together.
- Experiments on SYSU-MM01 and RegDB datasets demonstrate the superiority of our method compared with existing US-VI-ReID methods.
Put SYSU-MM01 and RegDB dataset (run prepare_sysu.py and prepare_regdb.py to convert to market1501 format) into data/sysu and data/regdb. (Following previous work ADCA)
Following SDCL, we adopt the self-supervised pre-trained models (ViT-B/16+ICS) from Self-Supervised Pre-Training for Transformer-Based Person Re-Identification
- sh run_train_sysu.sh
- sh run_train_regdb.sh
| Datestes | Rank-1 | mAP | Download |
|---|---|---|---|
| SYSU-MM01 (Stage 1) | model | ||
| SYSU-MM01 (All Search) | 65.07% | 62.79% | model |
| RegDB (Visible to Infrared) | 92.96% | 86.32% | model |
Our implementation is mainly based on the following codebases. We gratefully thank the authors for their wonderful works.
