MOSE: Complex Video Object Segmentation Dataset

Abstract

Video object segmentation (VOS) aims at segmenting a particular object throughout the entire video clip sequence. The state-of-the-art VOS methods have achieved excellent performance (e.g., 90+% J&F) on existing datasets. However, since the target objects in these existing datasets are usually relatively salient, dominant, and isolated, VOS under complex scenes has rarely been studied. To revisit VOS and make it more applicable in the real world, we collect a new VOS dataset called coMplex video Object SEgmentation (MOSE) to study the tracking and segmenting objects in complex environments. MOSE contains 2,149 video clips and 5,200 objects from 36 categories, with 431,725 high-quality object segmentation masks. The most notable feature of MOSE dataset is complex scenes with crowded and occluded objects. The target objects in the videos are commonly occluded by others and disappear in some frames. To analyze the proposed MOSE dataset, we benchmark 18 existing VOS methods under 4 different settings on the proposed MOSE dataset and conduct comprehensive comparisons. The experiments show that current VOS algorithms cannot well perceive objects in complex scenes. For example, under the semi-supervised VOS setting, the highest J&F by existing state-of-the-art VOS methods is only 59.4% on MOSE, much lower than their ∼90% J&F performance on DAVIS. The results reveal that although excellent performance has been achieved on existing benchmarks, there are unresolved challenges under complex scenes and more efforts are desired to explore these challenges in the future.

Visualization

Dataset Statistics

**TABLE 1. Scale comparison between MOSE and existing VOS datasets.**
“mBOR”: mean of the Bounding-box-Occlusion Rate. “Disapp. Rate”: the frequency of disappearance objects.
Dataset	Year	Videos	Categories	Objects	Annotations	Duration (min)	mBOR	Disapp. Rate
YouTube-Objects	2012	96	10	96	1,692	9.01	-	-
SegTrack-v2	2013	14	11	24	1,475	0.69	0.12	8.3%
FBMS	2014	59	16	139	1,465	7.70	0.01	11.2%
JumpCut	2015	22	14	22	6,331	3.52	0	0%
DAVIS-2016	2016	50	-	50	3,440	2.28	-	-
DAVIS-2017	2017	90	-	205	13,543	5.17	0.03	16.1%
YouTube-VOS	2018	4,453	94	7,755	197,272	334.81	0.05	13.0%
MOSE (ours)	2023	2,149	36	5,200	431,725	443.62	0.23	41.5%

Experiments

We benchmark the state-of-the-art methods to the best of our knowledge, please see the Dataset Report for details. If your method is more powerful, please feel free to contract us for benchmark evaluation, we will update the results.

TABLE 2. Benchmark results of semi-supervised (one-shot) VOS.

Downloads

The dataset is avalibale on OneDrive, Google Drive, and Baidu WangPan (Access Code: MOSE), please kindly refer to MOSE-api for more details.

🚀 Download the dataset using gdown command:
📦 train.tar.gz 20.5 GB
  gdown https://drive.google.com/uc\?id\=ID_removed_to_avoid_overaccesses_get_it_by_yourself
📦 valid.tar.gz 3.61 GB
  gdown https://drive.google.com/uc\?id\=ID_removed_to_avoid_overaccesses_get_it_by_yourself

Tips: gdown may be temporarily throttled by Google Drive due to excessive downloads, you may wait 24h or download from the Google Drive page with a google account. Please feel free to open an issue on MOSE-api.

Evaluation

Online Evaluation (🔥ready now!)

● Following DAVIS, we use Region Jaccard J, Boundary F measure F, and their mean J&F as the evaluation metrics.
● For the validation sets, the first-frame annotations are released to indicate the objects that are considered in evaluation.
● The validation set online evaluation server is [here] for daily evaluation.
● The test set online evaluation server will be open during the competition period only.

BibTeX

Please consider to cite MOSE if it helps your research.

@inproceedings{MOSE,
  title={{MOSE}: A New Dataset for Video Object Segmentation in Complex Scenes},
  author={Ding, Henghui and Liu, Chang and He, Shuting and Jiang, Xudong and Torr, Philip HS and Bai, Song},
  booktitle={ICCV},
  year={2023}
}

License

MOSE is licensed under a CC BY-NC-SA 4.0 License. The data of MOSE is released for non-commercial research purpose only.

MOSE: A New Dataset for Video Object Segmentation in Complex Scenes

News