News Blog Paper China
Weak Supervision for Generating Pixel-Level Annotations in Scene Text Segmentation2019-11-19   ${\displaystyle \cong }$
Providing pixel-level supervisions for scene text segmentation is inherently difficult and costly, so that only few small datasets are available for this task. To face the scarcity of training data, previous approaches based on Convolutional Neural Networks (CNNs) rely on the use of a synthetic dataset for pre-training. However, synthetic data cannot reproduce the complexity and variability of natural images. In this work, we propose to use a weakly supervised learning approach to reduce the domain-shift between synthetic and real data. Leveraging the bounding-box supervision of the COCO-Text and the MLT datasets, we generate weak pixel-level supervisions of real images. In particular, the COCO-Text-Segmentation (COCO_TS) and the MLT-Segmentation (MLT_S) datasets are created and released. These two datasets are used to train a CNN, the Segmentation Multiscale Attention Network (SMANet), which is specifically designed to face some peculiarities of the scene text segmentation task. The SMANet is trained end-to-end on the proposed datasets, and the experiments show that COCO_TS and MLT_S are a valid alternative to synthetic images, allowing to use only a fraction of the training samples and improving significantly the performances.
Few-Shot Segmentation Propagation with Guided Networks2018-05-25   ${\displaystyle \cong }$
Learning-based methods for visual segmentation have made progress on particular types of segmentation tasks, but are limited by the necessary supervision, the narrow definitions of fixed tasks, and the lack of control during inference for correcting errors. To remedy the rigidity and annotation burden of standard approaches, we address the problem of few-shot segmentation: given few image and few pixel supervision, segment any images accordingly. We propose guided networks, which extract a latent task representation from any amount of supervision, and optimize our architecture end-to-end for fast, accurate few-shot segmentation. Our method can switch tasks without further optimization and quickly update when given more guidance. We report the first results for segmentation from one pixel per concept and show real-time interactive video segmentation. Our unified approach propagates pixel annotations across space for interactive segmentation, across time for video segmentation, and across scenes for semantic segmentation. Our guided segmentor is state-of-the-art in accuracy for the amount of annotation and time. See http://github.com/shelhamer/revolver for code, models, and more details.
End-to-End Boundary Aware Networks for Medical Image Segmentation2019-09-10   ${\displaystyle \cong }$
Fully convolutional neural networks (CNNs) have proven to be effective at representing and classifying textural information, thus transforming image intensity into output class masks that achieve semantic image segmentation. In medical image analysis, however, expert manual segmentation often relies on the boundaries of anatomical structures of interest. We propose boundary aware CNNs for medical image segmentation. Our networks are designed to account for organ boundary information, both by providing a special network edge branch and edge-aware loss terms, and they are trainable end-to-end. We validate their effectiveness on the task of brain tumor segmentation using the BraTS 2018 dataset. Our experiments reveal that our approach yields more accurate segmentation results, which makes it promising for more extensive application to medical image segmentation.
Exploiting Clinically Available Delineations for CNN-based Segmentation in Radiotherapy Treatment Planning2019-11-12   ${\displaystyle \cong }$
Convolutional neural networks (CNNs) have been widely and successfully used for medical image segmentation. However, CNNs are typically considered to require large numbers of dedicated expert-segmented training volumes, which may be limiting in practice. This work investigates whether clinically obtained segmentations which are readily available in picture archiving and communication systems (PACS) could provide a possible source of data to train a CNN for segmentation of organs-at-risk (OARs) in radiotherapy treatment planning. In such data, delineations of structures deemed irrelevant to the target clinical use may be lacking. To overcome this issue, we use multi-label instead of multi-class segmentation. We empirically assess how many clinical delineations would be sufficient to train a CNN for the segmentation of OARs and find that increasing the training set size beyond a limited number of images leads to sharply diminishing returns. Moreover, we find that by using multi-label segmentation, missing structures in the reference standard do not have a negative effect on overall segmentation accuracy. These results indicate that segmentations obtained in a clinical workflow can be used to train an accurate OAR segmentation model.
AinnoSeg: Panoramic Segmentation with High Perfomance2020-07-21   ${\displaystyle \cong }$
Panoramic segmentation is a scene where image segmentation tasks is more difficult. With the development of CNN networks, panoramic segmentation tasks have been sufficiently developed.However, the current panoramic segmentation algorithms are more concerned with context semantics, but the details of image are not processed enough. Moreover, they cannot solve the problems which contains the accuracy of occluded object segmentation,little object segmentation,boundary pixel in object segmentation etc. Aiming to address these issues, this paper presents some useful tricks. (a) By changing the basic segmentation model, the model can take into account the large objects and the boundary pixel classification of image details. (b) Modify the loss function so that it can take into account the boundary pixels of multiple objects in the image. (c) Use a semi-supervised approach to regain control of the training process. (d) Using multi-scale training and reasoning. All these operations named AinnoSeg, AinnoSeg can achieve state-of-art performance on the well-known dataset ADE20K.
VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation2017-08-15   ${\displaystyle \cong }$
Rich and dense human labeled datasets are among the main enabling factors for the recent advance on vision-language understanding. Many seemingly distant annotations (e.g., semantic segmentation and visual question answering (VQA)) are inherently connected in that they reveal different levels and perspectives of human understandings about the same visual scenes --- and even the same set of images (e.g., of COCO). The popularity of COCO correlates those annotations and tasks. Explicitly linking them up may significantly benefit both individual tasks and the unified vision and language modeling. We present the preliminary work of linking the instance segmentations provided by COCO to the questions and answers (QAs) in the VQA dataset, and name the collected links visual questions and segmentation answers (VQS). They transfer human supervision between the previously separate tasks, offer more effective leverage to existing problems, and also open the door for new research problems and models. We study two applications of the VQS data in this paper: supervised attention for VQA and a novel question-focused semantic segmentation task. For the former, we obtain state-of-the-art results on the VQA real multiple-choice task by simply augmenting the multilayer perceptrons with some attention features that are learned using the segmentation-QA links as explicit supervision. To put the latter in perspective, we study two plausible methods and compare them to an oracle method assuming that the instance segmentations are given at the test stage.
Panoptic Lintention Network: Towards Efficient Navigational Perception for the Visually Impaired2021-03-06   ${\displaystyle \cong }$
Classic computer vision algorithms, instance segmentation, and semantic segmentation can not provide a holistic understanding of the surroundings for the visually impaired. In this paper, we utilize panoptic segmentation to assist the navigation of visually impaired people by offering both things and stuff awareness in the proximity of the visually impaired efficiently. To this end, we propose an efficient Attention module -- Lintention which can model long-range interactions in linear time using linear space. Based on Lintention, we then devise a novel panoptic segmentation model which we term Panoptic Lintention Net. Experiments on the COCO dataset indicate that the Panoptic Lintention Net raises the Panoptic Quality (PQ) from 39.39 to 41.42 with 4.6\% performance gain while only requiring 10\% fewer GFLOPs and 25\% fewer parameters in the semantic branch. Furthermore, a real-world test via our designed compact wearable panoptic segmentation system, indicates that our system based on the Panoptic Lintention Net accomplishes a relatively stable and exceptionally remarkable panoptic segmentation in real-world scenes.
Rethinking Pre-training and Self-training2020-06-11   ${\displaystyle \cong }$
Pre-training is a dominant paradigm in computer vision. For example, supervised ImageNet pre-training is commonly used to initialize the backbones of object detection and segmentation models. He et al., however, show a surprising result that ImageNet pre-training has limited impact on COCO object detection. Here we investigate self-training as another method to utilize additional data on the same setup and contrast it against ImageNet pre-training. Our study reveals the generality and flexibility of self-training with three additional insights: 1) stronger data augmentation and more labeled data further diminish the value of pre-training, 2) unlike pre-training, self-training is always helpful when using stronger data augmentation, in both low-data and high-data regimes, and 3) in the case that pre-training is helpful, self-training improves upon pre-training. For example, on the COCO object detection dataset, pre-training benefits when we use one fifth of the labeled data, and hurts accuracy when we use all labeled data. Self-training, on the other hand, shows positive improvements from +1.3 to +3.4AP across all dataset sizes. In other words, self-training works well exactly on the same setup that pre-training does not work (using ImageNet to help COCO). On the PASCAL segmentation dataset, which is a much smaller dataset than COCO, though pre-training does help significantly, self-training improves upon the pre-trained model. On COCO object detection, we achieve 54.3AP, an improvement of +1.5AP over the strongest SpineNet model. On PASCAL segmentation, we achieve 90.5 mIOU, an improvement of +1.5% mIOU over the previous state-of-the-art result by DeepLabv3+.
Mix-and-Match Tuning for Self-Supervised Semantic Segmentation2018-01-29   ${\displaystyle \cong }$
Deep convolutional networks for semantic image segmentation typically require large-scale labeled data, e.g. ImageNet and MS COCO, for network pre-training. To reduce annotation efforts, self-supervised semantic segmentation is recently proposed to pre-train a network without any human-provided labels. The key of this new form of learning is to design a proxy task (e.g. image colorization), from which a discriminative loss can be formulated on unlabeled data. Many proxy tasks, however, lack the critical supervision signals that could induce discriminative representation for the target image segmentation task. Thus self-supervision's performance is still far from that of supervised pre-training. In this study, we overcome this limitation by incorporating a "mix-and-match" (M&M) tuning stage in the self-supervision pipeline. The proposed approach is readily pluggable to many self-supervision methods and does not use more annotated samples than the original process. Yet, it is capable of boosting the performance of target image segmentation task to surpass fully-supervised pre-trained counterpart. The improvement is made possible by better harnessing the limited pixel-wise annotations in the target dataset. Specifically, we first introduce the "mix" stage, which sparsely samples and mixes patches from the target set to reflect rich and diverse local patch statistics of target images. A "match" stage then forms a class-wise connected graph, which can be used to derive a strong triplet-based discriminative loss for fine-tuning the network. Our paradigm follows the standard practice in existing self-supervised studies and no extra data or label is required. With the proposed M&M approach, for the first time, a self-supervision method can achieve comparable or even better performance compared to its ImageNet pre-trained counterpart on both PASCAL VOC2012 dataset and CityScapes dataset.
DeepMRSeg: A convolutional deep neural network for anatomy and abnormality segmentation on MR images2019-07-03   ${\displaystyle \cong }$
Segmentation has been a major task in neuroimaging. A large number of automated methods have been developed for segmenting healthy and diseased brain tissues. In recent years, deep learning techniques have attracted a lot of attention as a result of their high accuracy in different segmentation problems. We present a new deep learning based segmentation method, DeepMRSeg, that can be applied in a generic way to a variety of segmentation tasks. The proposed architecture combines recent advances in the field of biomedical image segmentation and computer vision. We use a modified UNet architecture that takes advantage of multiple convolution filter sizes to achieve multi-scale feature extraction adaptive to the desired segmentation task. Importantly, our method operates on minimally processed raw MRI scan. We validated our method on a wide range of segmentation tasks, including white matter lesion segmentation, segmentation of deep brain structures and hippocampus segmentation. We provide code and pre-trained models to allow researchers apply our method on their own datasets.
QANet -- Quality Assurance Network for Image Segmentation2019-11-05   ${\displaystyle \cong }$
We introduce a novel Deep Learning framework, which quantitatively estimates image segmentation quality without the need for human inspection or labeling. We refer to this method as a Quality Assurance Network -- QANet. Specifically, given an image and a `proposed' corresponding segmentation, obtained by any method including manual annotation, the QANet solves a regression problem in order to estimate a predefined quality measure with respect to the unknown ground truth. The QANet is by no means yet another segmentation method. Instead, it performs a multi-level, multi-feature comparison of an image-segmentation pair based on a unique network architecture, called the RibCage. To demonstrate the strength of the QANet, we addressed the evaluation of instance segmentation using two different datasets from different domains, namely, high throughput live cell microscopy images from the Cell Segmentation Benchmark and natural images of plants from the Leaf Segmentation Challenge. While synthesized segmentations were used to train the QANet, it was tested on segmentations obtained by publicly available methods that participated in the different challenges. We show that the QANet accurately estimates the scores of the evaluated segmentations with respect to the hidden ground truth, as published by the challenges' organizers. The code is available at: TBD.
Mumford-Shah Loss Functional for Image Segmentation with Deep Learning2019-09-09   ${\displaystyle \cong }$
Recent state-of-the-art image segmentation algorithms are mostly based on deep neural networks, thanks to their high performance and fast computation time. However, these methods are usually trained in a supervised manner, which requires large number of high quality ground-truth segmentation masks. On the other hand, classical image segmentation approaches such as level-set methods are formulated in a self-supervised manner by minimizing energy functions such as Mumford-Shah functional, so they are still useful to help generation of segmentation masks without labels. Unfortunately, these algorithms are usually computationally expensive and often have limitation in semantic segmentation. In this paper, we propose a novel loss function based on Mumford-Shah functional that can be used in deep-learning based image segmentation without or with small labeled data. This loss function is based on the observation that the softmax layer of deep neural networks has striking similarity to the characteristic function in the Mumford-Shah functional. We show that the new loss function enables semi-supervised and unsupervised segmentation. In addition, our loss function can be also used as a regularized function to enhance supervised semantic segmentation algorithms. Experimental results on multiple datasets demonstrate the effectiveness of the proposed method.
ErrorNet: Learning error representations from limited data to improve vascular segmentation2020-02-01   ${\displaystyle \cong }$
Deep convolutional neural networks have proved effective in segmenting lesions and anatomies in various medical imaging modalities. However, in the presence of small sample size and domain shift problems, these models often produce masks with non-intuitive segmentation mistakes. In this paper, we propose a segmentation framework called ErrorNet, which learns to correct these segmentation mistakes through the repeated process of injecting systematic segmentation errors to the segmentation result based on a learned shape prior, followed by attempting to predict the injected error. During inference, ErrorNet corrects the segmentation mistakes by adding the predicted error map to the initial segmentation result. ErrorNet has advantages over alternatives based on domain adaptation or CRF-based post processing, because it requires neither domain-specific parameter tuning nor any data from the target domains. We have evaluated ErrorNet using five public datasets for the task of retinal vessel segmentation. The selected datasets differ in size and patient population, allowing us to evaluate the effectiveness of ErrorNet in handling small sample size and domain shift problems. Our experiments demonstrate that ErrorNet outperforms a base segmentation model, a CRF-based post processing scheme, and a domain adaptation method, with a greater performance gain in the presence of the aforementioned dataset limitations.
Deeply Supervised Active Learning for Finger Bones Segmentation2020-05-06   ${\displaystyle \cong }$
Segmentation is a prerequisite yet challenging task for medical image analysis. In this paper, we introduce a novel deeply supervised active learning approach for finger bones segmentation. The proposed architecture is fine-tuned in an iterative and incremental learning manner. In each step, the deep supervision mechanism guides the learning process of hidden layers and selects samples to be labeled. Extensive experiments demonstrated that our method achieves competitive segmentation results using less labeled samples as compared with full annotation.
iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images2019-08-28   ${\displaystyle \cong }$
Existing Earth Vision datasets are either suitable for semantic segmentation or object detection. In this work, we introduce the first benchmark dataset for instance segmentation in aerial imagery that combines instance-level object detection and pixel-level segmentation tasks. In comparison to instance segmentation in natural scenes, aerial images present unique challenges e.g., a huge number of instances per image, large object-scale variations and abundant tiny objects. Our large-scale and densely annotated Instance Segmentation in Aerial Images Dataset (iSAID) comes with 655,451 object instances for 15 categories across 2,806 high-resolution images. Such precise per-pixel annotations for each instance ensure accurate localization that is essential for detailed scene analysis. Compared to existing small-scale aerial image based instance segmentation datasets, iSAID contains 15$\times$ the number of object categories and 5$\times$ the number of instances. We benchmark our dataset using two popular instance segmentation approaches for natural images, namely Mask R-CNN and PANet. In our experiments we show that direct application of off-the-shelf Mask R-CNN and PANet on aerial images provide suboptimal instance segmentation results, thus requiring specialized solutions from the research community. The dataset is publicly available at: https://captain-whu.github.io/iSAID/index.html
Secure 3D medical Imaging2020-10-06   ${\displaystyle \cong }$
Image segmentation has proved its importance and plays an important role in various domains such as health systems and satellite-oriented military applications. In this context, accuracy, image quality, and execution time deem to be the major issues to always consider. Although many techniques have been applied, and their experimental results have shown appealing achievements for 2D images in real-time environments, however, there is a lack of works about 3D image segmentation despite its importance in improving segmentation accuracy. Specifically, HMM was used in this domain. However, it suffers from the time complexity, which was updated using different accelerators. As it is important to have efficient 3D image segmentation, we propose in this paper a novel system for partitioning the 3D segmentation process across several distributed machines. The concepts behind distributed multi-media network segmentation were employed to accelerate the segmentation computational time of training Hidden Markov Model (HMMs). Furthermore, a secure transmission has been considered in this distributed environment and various bidirectional multimedia security algorithms have been applied. The contribution of this work lies in providing an efficient and secure algorithm for 3D image segmentation. Through a number of extensive experiments, it was proved that our proposed system is of comparable efficiency to the state of art methods in terms of segmentation accuracy, security and execution time.
Deep Learning with Mixed Supervision for Brain Tumor Segmentation2018-12-10   ${\displaystyle \cong }$
Most of the current state-of-the-art methods for tumor segmentation are based on machine learning models trained on manually segmented images. This type of training data is particularly costly, as manual delineation of tumors is not only time-consuming but also requires medical expertise. On the other hand, images with a provided global label (indicating presence or absence of a tumor) are less informative but can be obtained at a substantially lower cost. In this paper, we propose to use both types of training data (fully-annotated and weakly-annotated) to train a deep learning model for segmentation. The idea of our approach is to extend segmentation networks with an additional branch performing image-level classification. The model is jointly trained for segmentation and classification tasks in order to exploit information contained in weakly-annotated images while preventing the network to learn features which are irrelevant for the segmentation task. We evaluate our method on the challenging task of brain tumor segmentation in Magnetic Resonance images from BRATS 2018 challenge. We show that the proposed approach provides a significant improvement of segmentation performance compared to the standard supervised learning. The observed improvement is proportional to the ratio between weakly-annotated and fully-annotated images available for training.
Spatial Context-Aware Self-Attention Model For Multi-Organ Segmentation2020-12-16   ${\displaystyle \cong }$
Multi-organ segmentation is one of most successful applications of deep learning in medical image analysis. Deep convolutional neural nets (CNNs) have shown great promise in achieving clinically applicable image segmentation performance on CT or MRI images. State-of-the-art CNN segmentation models apply either 2D or 3D convolutions on input images, with pros and cons associated with each method: 2D convolution is fast, less memory-intensive but inadequate for extracting 3D contextual information from volumetric images, while the opposite is true for 3D convolution. To fit a 3D CNN model on CT or MRI images on commodity GPUs, one usually has to either downsample input images or use cropped local regions as inputs, which limits the utility of 3D models for multi-organ segmentation. In this work, we propose a new framework for combining 3D and 2D models, in which the segmentation is realized through high-resolution 2D convolutions, but guided by spatial contextual information extracted from a low-resolution 3D model. We implement a self-attention mechanism to control which 3D features should be used to guide 2D segmentation. Our model is light on memory usage but fully equipped to take 3D contextual information into account. Experiments on multiple organ segmentation datasets demonstrate that by taking advantage of both 2D and 3D models, our method is consistently outperforms existing 2D and 3D models in organ segmentation accuracy, while being able to directly take raw whole-volume image data as inputs.
Understanding Deep Learning Techniques for Image Segmentation2019-07-13   ${\displaystyle \cong }$
The machine learning community has been overwhelmed by a plethora of deep learning based approaches. Many challenging computer vision tasks such as detection, localization, recognition and segmentation of objects in unconstrained environment are being efficiently addressed by various types of deep neural networks like convolutional neural networks, recurrent networks, adversarial networks, autoencoders and so on. While there have been plenty of analytical studies regarding the object detection or recognition domain, many new deep learning techniques have surfaced with respect to image segmentation techniques. This paper approaches these various deep learning techniques of image segmentation from an analytical perspective. The main goal of this work is to provide an intuitive understanding of the major techniques that has made significant contribution to the image segmentation domain. Starting from some of the traditional image segmentation approaches, the paper progresses describing the effect deep learning had on the image segmentation domain. Thereafter, most of the major segmentation algorithms have been logically categorized with paragraphs dedicated to their unique contribution. With an ample amount of intuitive explanations, the reader is expected to have an improved ability to visualize the internal dynamics of these processes.
Estimating Uncertainty in Neural Networks for Cardiac MRI Segmentation: A Benchmark Study2020-12-31   ${\displaystyle \cong }$
Convolutional neural networks (CNNs) have demonstrated promise in automated cardiac magnetic resonance imaging segmentation. However, when using CNNs in a large real world dataset, it is important to quantify segmentation uncertainty in order to know which segmentations could be problematic. In this work, we performed a systematic study of Bayesian and non-Bayesian methods for estimating uncertainty in segmentation neural networks. We evaluated Bayes by Backprop (BBB), Monte Carlo (MC) Dropout, and Deep Ensembles in terms of segmentation accuracy, probability calibration, uncertainty on out-of-distribution images, and segmentation quality control. We tested these algorithms on datasets with various distortions and observed that Deep Ensembles outperformed the other methods except for images with heavy noise distortions. For segmentation quality control, we showed that segmentation uncertainty is correlated with segmentation accuracy. With the incorporation of uncertainty estimates, we were able to reduce the percentage of poor segmentation to 5% by flagging 31% to 48% of the most uncertain images for manual review, substantially lower than random review of the results without using neural network uncertainty.