Introduction

The 1st MOSE challenge will be held in conjunction with CVPR 2024 PVUW Workshop in Seattle, USA. In this edition of the workshop and challenge, we focus on video object segmentation under complex environments. MOSE contains 2,149 video clips and 5,200 objects, with 431,725 high-quality object segmentation masks. The video resolution is 1920×1080 and the video lengths are 5 to 60 seconds in general. The most notable feature of MOSE is complex scenes, including the disappearance-reappearance of objects, inconspicuous small objects, heavy occlusions, crowded environments, etc. The goal of MOSE dataset is to provide a platform that promotes the development of more comprehensive and robust video object segmentation algorithms. The workshop will culminate in a round table discussion, in which speakers will debate the future of video object representations.

Leaderboard

TABLE 1. Top 3 Leaderboard of MOSE Challenge in CVPR 2024 PVUW Workshop.
Team Name Team Members Organization Technical Report J&F  |    J    |    F  
PCL_VisionLab Deshui Miao1,2,
Xin Li2,
Zhenyu He1,2,
Yaowei Wang2,
Ming-Hsuan Yang3
1Harbin Institute of Technology (ShenZhen),
2Peng Cheng Laboratory,
3University of California at Merced
PDF
Video
84.5 | 81.0 | 87.9
Yao_Xu_MTLab Zhensong Xu1,
Jiangtao Yao1,
Chengjing Wu1,
Ting Liu1,
Luoqi Liu1
1MT Lab, Meitu Inc PDF 83.5 | 80.1| 86.8
ISS Xinyu Liu1,
Jing Zhang1,
Kexin Zhang1,
Yuting Yang1,
Licheng Jiao1,
Shuyuan Yang1
1Intelligent Perception and Image Understanding Lab, Xidian University PDF 82.2 | 78.8 | 85.6

Dates

  ● 1 Feb 2024: Release the training and validation dataset, check [here].
  ● 1 Feb 2024: Setup the submission server on CodaLab and open the submission of the validation results.
  ● 8 Apr 2024: Workshop paper submission deadline.
  ● 12 Apr 2024: Notification to authors of workshop paper.
  ● 15 May 2024: Release the test dataset and open the submission of the test results.
  ● 25 May 2024: Challenge submission end.
  ● 30 May 2024: The final competition results will be announced and high-performance teams will be invited.
  ● 17 Jun 2024: The workshop begins.

Rules

  ● Extra training datasets besides MOSE are allowed, but contestants must disclose any extra datasets used.
  ● There is no limitations to the models, large models like SAM can be used, but contestants must report the models used.

Call for Papers

This workshop includes workshop papers, covering but not limit to the following topics:
  ● Semantic/panoptic segmentation for images/videos
  ● Video object/instance segmentation
  ● Efficient computation for video scene parsing
  ● Object tracking
  ● Language-guided segmentation
  ● Semi-supervised recognition in videos
  ● New metrics to evaluate the quality of video scene parsing results
  ● Real-world video applications, including autonomous driving, indoor robotics, visual navigation, etc.

Submission: We invite authors to submit unpublished papers (8-page CVPR format) to our workshop, to be presented at a poster session upon acceptance. All submissions will go through a double-blind review process. Accepted papers will be published in the official CVPR Workshops proceedings and the Computer Vision Foundation (CVF) Open Access archive. All contributions must be submitted (along with supplementary materials, if any) at this link.
Paper Submission Dates:
  ● Workshop paper submission deadline: 8 April 2024 (23:59 PST)
  ● Notification to authors: 12 April 2024
  ● Camera ready deadline: 14 April 2024

MOSE Dataset Examples

0442a954 d321dde4 02221fb0 bbe97d18
002b4dce 26ed56e6 c791ddbb e5e9eb29

Evaluation



  • ● Following DAVIS, we use Region Jaccard J, Boundary F measure F, and their mean J&F as the evaluation metrics.
    ● For the validation sets, the first-frame annotations are released to indicate the objects that are considered in evaluation.
    ● The validation set online evaluation server is [here] for daily evaluation.
    ● The test set online evaluation server will be open during the competition period only (TBD).

    MOSE Challenge Organizers

    Henghui Ding
    Chang Liu
    Henghui Ding

    Primary Organizer
    Fudan University

    Chang Liu

    Primary Organizer
    Nanyang Technological University

    Shuting He
    Xudong Jiang
    Philip H.S. Torr
    Song Bai
    Shuting He

    Nanyang Technological University

    Xudong Jiang

    Nanyang Technological University

    Philip H.S. Torr

    University of Oxford

    Song Bai

    ByteDance

    BibTeX

    Please consider to cite MOSE if it helps your research.
    @inproceedings{MOSE,
      title={{MOSE}: A New Dataset for Video Object Segmentation in Complex Scenes},
      author={Ding, Henghui and Liu, Chang and He, Shuting and Jiang, Xudong and Torr, Philip HS and Bai, Song},
      booktitle={ICCV},
      year={2023}
    }

    License

    Creative Commons License
    MOSE is licensed under a CC BY-NC-SA 4.0 License. The data of MOSE is released for non-commercial research purpose only.