Semantic-Promoted Debiasing and Background Disambiguation for Zero-Shot Instance Segmentation

1Zhejiang University     2Nanyang Technological University    
Project Leader & Corresponding Author

Figure 1. Two key challenges in generalized zero-shot instance segmentation. 1) Bias issue: the model tends to label novel objects with seen categories, e.g., ZSI incorrectly classifies unseen class "dog" as training class "horse". 2) Background ambiguation: objects that do not belong to any training categories are considered background, e.g., "parking meter" and "fire hydrant".


Zero-shot instance segmentation aims to detect and precisely segment objects of unseen categories without any training samples. Since the model is trained on seen categories, there is a strong bias that the model tends to classify all the objects into seen categories. Besides, there is a natural confusion between background and novel objects that have never shown up in training. These two challenges make novel objects hard to be raised in the final instance segmentation results. It is desired to rescue novel objects from background and dominated seen categories. To this end, we propose D2Zero with Semantic-Promoted Debiasing and Background Disambiguation to enhance the performance of Zero-shot instance segmentation. Semantic-promoted debiasing utilizes inter-class semantic relationships to involve unseen categories in visual feature training and learns an input-conditional classifier to conduct dynamical classification based on the input image. Background disambiguation produces image-adaptive background representation to avoid mistaking novel objects for background. Extensive experiments show that we significantly outperform previous state-of-the-art methods by a large margin, e.g., 16.86% improvement on COCO.

D2Zero Framework

Figure 2. Framework overview of our D2Zero. The model proposes a set of class-agnostic masks and their corresponding proposal embeddings. The proposed input-conditional classifier takes semantic embeddings and proposal embeddings as input and generates image-specific prototypes. Then we use these prototypes to classify image embeddings, under the supervision of both seen CE loss Ls and unseen CE loss Lu. The unseen CE loss enables unseen classes to join the training of feature extractor. We collect all the masks and produce a background mask, then apply this mask to the image feature to generate an image-adaptive background prototype for classification.


TABLE 1. Results on GZSIS.
“cp” denotes copy-paste strategy of ZSI, i.e., sharing instances between seen and unseen groups.

TABLE 2. Results on ZSIS.


Please consider to cite D2Zero if it helps your research.
  title={Semantic-Promoted Debiasing and Background Disambiguation for Zero-Shot Instance Segmentation},
  author={He, Shuting and Ding, Henghui and Jiang, Wei},


Creative Commons License
D2Zero is licensed under a CC BY-NC-SA 4.0 License.