News Blog Paper China
Weakly Supervised Clustering by Exploiting Unique Class Count2020-01-25   ${\displaystyle \cong }$
A weakly supervised learning based clustering framework is proposed in this paper. As the core of this framework, we introduce a novel multiple instance learning task based on a bag level label called unique class count ($ucc$), which is the number of unique classes among all instances inside the bag. In this task, no annotations on individual instances inside the bag are needed during training of the models. We mathematically prove that with a perfect $ucc$ classifier, perfect clustering of individual instances inside the bags is possible even when no annotations on individual instances are given during training. We have constructed a neural network based $ucc$ classifier and experimentally shown that the clustering performance of our framework with our weakly supervised $ucc$ classifier is comparable to that of fully supervised learning models where labels for all instances are known. Furthermore, we have tested the applicability of our framework to a real world task of semantic segmentation of breast cancer metastases in histological lymph node sections and shown that the performance of our weakly supervised framework is comparable to the performance of a fully supervised Unet model.
Non-I.I.D. Multi-Instance Learning for Predicting Instance and Bag Labels using Variational Auto-Encoder2021-05-03   ${\displaystyle \cong }$
Multi-instance learning is a type of weakly supervised learning. It deals with tasks where the data is a set of bags and each bag is a set of instances. Only the bag labels are observed whereas the labels for the instances are unknown. An important advantage of multi-instance learning is that by representing objects as a bag of instances, it is able to preserve the inherent dependencies among parts of the objects. Unfortunately, most existing algorithms assume all instances to be \textit{identically and independently distributed}, which violates real-world scenarios since the instances within a bag are rarely independent. In this work, we propose the Multi-Instance Variational Auto-Encoder (MIVAE) algorithm which explicitly models the dependencies among the instances for predicting both bag labels and instance labels. Experimental results on several multi-instance benchmarks and end-to-end medical imaging datasets demonstrate that MIVAE performs better than state-of-the-art algorithms for both instance label and bag label prediction tasks.
Multi-Instance Learning by Treating Instances As Non-I.I.D. Samples2009-05-13   ${\displaystyle \cong }$
Multi-instance learning attempts to learn from a training set consisting of labeled bags each containing many unlabeled instances. Previous studies typically treat the instances in the bags as independently and identically distributed. However, the instances in a bag are rarely independent, and therefore a better performance can be expected if the instances are treated in an non-i.i.d. way that exploits the relations among instances. In this paper, we propose a simple yet effective multi-instance learning method, which regards each bag as a graph and uses a specific kernel to distinguish the graphs by considering the features of the nodes as well as the features of the edges that convey some relations among instances. The effectiveness of the proposed method is validated by experiments.
Weakly Supervised Learning with Region and Box-level Annotations for Salient Instance Segmentation2020-08-18   ${\displaystyle \cong }$
Salient instance segmentation is a new challenging task that received widespread attention in saliency detection area. Due to the limited scale of the existing dataset and the high mask annotations cost, it is difficult to train a salient instance neural network completely. In this paper, we appeal to train a salient instance segmentation framework by a weakly supervised source without resorting to laborious labeling. We present a cyclic global context salient instance segmentation network (CGCNet), which is supervised by the combination of the binary salient regions and bounding boxes from the existing saliency detection datasets. For a precise pixel-level location, a global feature refining layer is introduced that dilates the context features of each salient instance to the global context in the image. Meanwhile, a labeling updating scheme is embedded in the proposed framework to online update the weak annotations for next iteration. Experiment results demonstrate that the proposed end-to-end network trained by weakly supervised annotations can be competitive to the existing fully supervised salient instance segmentation methods. Without bells and whistles, our proposed method achieves a mask AP of 57.13%, which outperforms the best fully supervised methods and establishes new states of the art for weakly supervised salient instance segmentation.
Weakly-Supervised Action Localization with Expectation-Maximization Multi-Instance Learning2020-03-31   ${\displaystyle \cong }$
Weakly-supervised action localization problem requires training a model to localize the action segments in the video given only video level action label. It can be solved under the Multiple Instance Learning (MIL) framework, where a bag (video) contains multiple instances (action segments). Since only the bag's label is known, the main challenge is to assign which key instances within the bag trigger the bag's label. Most previous models use an attention-based approach. These models use attention to generate the bag's representation from instances and then train it via bag's classification. In this work, we explicitly model the key instances assignment as a hidden variable and adopt an Expectation-Maximization framework. We derive two pseudo-label generation schemes to model the E and M process and iteratively optimize the likelihood lower bound. We also show that previous attention-based models implicitly violate the MIL assumptions that instances in negative bags should be uniformly negative. In comparison, Our EM-MIL approach more accurately models these assumptions. Our model achieves state-of-the-art performance on two standard benchmarks, THUMOS14 and ActivityNet1.2, and shows the superiority of detecting relative complete action boundary in videos containing multiple actions.
Multi-typed Objects Multi-view Multi-instance Multi-label Learning2020-10-06   ${\displaystyle \cong }$
Multi-typed objects Multi-view Multi-instance Multi-label Learning (M4L) deals with interconnected multi-typed objects (or bags) that are made of diverse instances, represented with heterogeneous feature views and annotated with a set of non-exclusive but semantically related labels. M4L is more general and powerful than the typical Multi-view Multi-instance Multi-label Learning (M3L), which only accommodates single-typed bags and lacks the power to jointly model the naturally interconnected multi-typed objects in the physical world. To combat with this novel and challenging learning task, we develop a joint matrix factorization based solution (M4L-JMF). Particularly, M4L-JMF firstly encodes the diverse attributes and multiple inter(intra)-associations among multi-typed bags into respective data matrices, and then jointly factorizes these matrices into low-rank ones to explore the composite latent representation of each bag and its instances (if any). In addition, it incorporates a dispatch and aggregation term to distribute the labels of bags to individual instances and reversely aggregate the labels of instances to their affiliated bags in a coherent manner. Experimental results on benchmark datasets show that M4L-JMF achieves significantly better results than simple adaptions of existing M3L solutions on this novel problem.
Multi-Evidence Filtering and Fusion for Multi-Label Classification, Object Detection and Semantic Segmentation Based on Weakly Supervised Learning2018-02-25   ${\displaystyle \cong }$
Supervised object detection and semantic segmentation require object or even pixel level annotations. When there exist image level labels only, it is challenging for weakly supervised algorithms to achieve accurate predictions. The accuracy achieved by top weakly supervised algorithms is still significantly lower than their fully supervised counterparts. In this paper, we propose a novel weakly supervised curriculum learning pipeline for multi-label object recognition, detection and semantic segmentation. In this pipeline, we first obtain intermediate object localization and pixel labeling results for the training images, and then use such results to train task-specific deep networks in a fully supervised manner. The entire process consists of four stages, including object localization in the training images, filtering and fusing object instances, pixel labeling for the training images, and task-specific network training. To obtain clean object instances in the training images, we propose a novel algorithm for filtering, fusing and classifying object instances collected from multiple solution mechanisms. In this algorithm, we incorporate both metric learning and density-based clustering to filter detected object instances. Experiments show that our weakly supervised pipeline achieves state-of-the-art results in multi-label image classification as well as weakly supervised object detection and very competitive results in weakly supervised semantic segmentation on MS-COCO, PASCAL VOC 2007 and PASCAL VOC 2012.
Dual-stream Maximum Self-attention Multi-instance Learning2020-06-09   ${\displaystyle \cong }$
Multi-instance learning (MIL) is a form of weakly supervised learning where a single class label is assigned to a bag of instances while the instance-level labels are not available. Training classifiers to accurately determine the bag label and instance labels is a challenging but critical task in many practical scenarios, such as computational histopathology. Recently, MIL models fully parameterized by neural networks have become popular due to the high flexibility and superior performance. Most of these models rely on attention mechanisms that assign attention scores across the instance embeddings in a bag and produce the bag embedding using an aggregation operator. In this paper, we proposed a dual-stream maximum self-attention MIL model (DSMIL) parameterized by neural networks. The first stream deploys a simple MIL max-pooling while the top-activated instance embedding is determined and used to obtain self-attention scores across instance embeddings in the second stream. Different from most of the previous methods, the proposed model jointly learns an instance classifier and a bag classifier based on the same instance embeddings. The experiments results show that our method achieves superior performance compared to the best MIL methods and demonstrates state-of-the-art performance on benchmark MIL datasets.
Confidence-Constrained Maximum Entropy Framework for Learning from Multi-Instance Data2016-03-06   ${\displaystyle \cong }$
Multi-instance data, in which each object (bag) contains a collection of instances, are widespread in machine learning, computer vision, bioinformatics, signal processing, and social sciences. We present a maximum entropy (ME) framework for learning from multi-instance data. In this approach each bag is represented as a distribution using the principle of ME. We introduce the concept of confidence-constrained ME (CME) to simultaneously learn the structure of distribution space and infer each distribution. The shared structure underlying each density is used to learn from instances inside each bag. The proposed CME is free of tuning parameters. We devise a fast optimization algorithm capable of handling large scale multi-instance data. In the experimental section, we evaluate the performance of the proposed approach in terms of exact rank recovery in the space of distributions and compare it with the regularized ME approach. Moreover, we compare the performance of CME with Multi-Instance Learning (MIL) state-of-the-art algorithms and show a comparable performance in terms of accuracy with reduced computational complexity.
Sparse Network Inversion for Key Instance Detection in Multiple Instance Learning2020-09-07   ${\displaystyle \cong }$
Multiple Instance Learning (MIL) involves predicting a single label for a bag of instances, given positive or negative labels at bag-level, without accessing to label for each instance in the training phase. Since a positive bag contains both positive and negative instances, it is often required to detect positive instances (key instances) when a set of instances is categorized as a positive bag. The attention-based deep MIL model is a recent advance in both bag-level classification and key instance detection (KID). However, if the positive and negative instances in a positive bag are not clearly distinguishable, the attention-based deep MIL model has limited KID performance as the attention scores are skewed to few positive instances. In this paper, we present a method to improve the attention-based deep MIL model in the task of KID. The main idea is to use the neural network inversion to find which instances made contribution to the bag-level prediction produced by the trained MIL model. Moreover, we incorporate a sparseness constraint into the neural network inversion, leading to the sparse network inversion which is solved by the proximal gradient method. Numerical experiments on an MNIST-based image MIL dataset and two real-world histopathology datasets verify the validity of our method, demonstrating the KID performance is significantly improved while the performance of bag-level prediction is maintained.
Towards Coarse and Fine-grained Multi-Graph Multi-Label Learning2020-12-19   ${\displaystyle \cong }$
Multi-graph multi-label learning (\textsc{Mgml}) is a supervised learning framework, which aims to learn a multi-label classifier from a set of labeled bags each containing a number of graphs. Prior techniques on the \textsc{Mgml} are developed based on transfering graphs into instances and focus on learning the unseen labels only at the bag level. In this paper, we propose a \textit{coarse} and \textit{fine-grained} Multi-graph Multi-label (cfMGML) learning framework which directly builds the learning model over the graphs and empowers the label prediction at both the \textit{coarse} (aka. bag) level and \textit{fine-grained} (aka. graph in each bag) level. In particular, given a set of labeled multi-graph bags, we design the scoring functions at both graph and bag levels to model the relevance between the label and data using specific graph kernels. Meanwhile, we propose a thresholding rank-loss objective function to rank the labels for the graphs and bags and minimize the hamming-loss simultaneously at one-step, which aims to addresses the error accumulation issue in traditional rank-loss algorithms. To tackle the non-convex optimization problem, we further develop an effective sub-gradient descent algorithm to handle high-dimensional space computation required in cfMGML. Experiments over various real-world datasets demonstrate cfMGML achieves superior performance than the state-of-arts algorithms.
Audio Event and Scene Recognition: A Unified Approach using Strongly and Weakly Labeled Data2017-02-18   ${\displaystyle \cong }$
In this paper we propose a novel learning framework called Supervised and Weakly Supervised Learning where the goal is to learn simultaneously from weakly and strongly labeled data. Strongly labeled data can be simply understood as fully supervised data where all labeled instances are available. In weakly supervised learning only data is weakly labeled which prevents one from directly applying supervised learning methods. Our proposed framework is motivated by the fact that a small amount of strongly labeled data can give considerable improvement over only weakly supervised learning. The primary problem domain focus of this paper is acoustic event and scene detection in audio recordings. We first propose a naive formulation for leveraging labeled data in both forms. We then propose a more general framework for Supervised and Weakly Supervised Learning (SWSL). Based on this general framework, we propose a graph based approach for SWSL. Our main method is based on manifold regularization on graphs in which we show that the unified learning can be formulated as a constraint optimization problem which can be solved by iterative concave-convex procedure (CCCP). Our experiments show that our proposed framework can address several concerns of audio content analysis using weakly labeled data.
A method on selecting reliable samples based on fuzziness in positive and unlabeled learning2019-03-26   ${\displaystyle \cong }$
Traditional semi-supervised learning uses only labeled instances to train a classifier and then this classifier is utilized to classify unlabeled instances, while sometimes there are only positive instances which are elements of the target concept are available in the labeled set. Our research in this paper the design of learning algorithms from positive and unlabeled instances only. Among all the semi-supervised positive and unlabeled learning methods, it is a fundamental step to extract useful information from unlabeled instances. In this paper, we design a novel framework to take advantage of valid information in unlabeled instances. In essence, this framework mainly includes that (1) selects reliable negative instances through the fuzziness of the instances; (2) chooses new positive instances based on the fuzziness of the instances to expand the initial positive set, and we named these new instances as reliable positive instances; (3) uses data editing technique to filter out noise points with high fuzziness. The effectiveness of the presented algorithm is verified by comparative experiments on UCI dataset.
Deep Multiple Instance Feature Learning via Variational Autoencoder2018-07-06   ${\displaystyle \cong }$
We describe a novel weakly supervised deep learning framework that combines both the discriminative and generative models to learn meaningful representation in the multiple instance learning (MIL) setting. MIL is a weakly supervised learning problem where labels are associated with groups of instances (referred as bags) instead of individual instances. To address the essential challenge in MIL problems raised from the uncertainty of positive instances label, we use a discriminative model regularized by variational autoencoders (VAEs) to maximize the differences between latent representations of all instances and negative instances. As a result, the hidden layer of the variational autoencoder learns meaningful representation. This representation can effectively be used for MIL problems as illustrated by better performance on the standard benchmark datasets comparing to the state-of-the-art approaches. More importantly, unlike most related studies, the proposed framework can be easily scaled to large dataset problems, as illustrated by the audio event detection and segmentation task. Visualization also confirms the effectiveness of the latent representation in discriminating positive and negative classes.
Theory and Algorithms for Shapelet-based Multiple-Instance Learning2020-06-12   ${\displaystyle \cong }$
We propose a new formulation of Multiple-Instance Learning (MIL), in which a unit of data consists of a set of instances called a bag. The goal is to find a good classifier of bags based on the similarity with a "shapelet" (or pattern), where the similarity of a bag with a shapelet is the maximum similarity of instances in the bag. In previous work, some of the training instances are chosen as shapelets with no theoretical justification. In our formulation, we use all possible, and thus infinitely many shapelets, resulting in a richer class of classifiers. We show that the formulation is tractable, that is, it can be reduced through Linear Programming Boosting (LPBoost) to Difference of Convex (DC) programs of finite (actually polynomial) size. Our theoretical result also gives justification to the heuristics of some of the previous work. The time complexity of the proposed algorithm highly depends on the size of the set of all instances in the training sample. To apply to the data containing a large number of instances, we also propose a heuristic option of the algorithm without the loss of the theoretical guarantee. Our empirical study demonstrates that our algorithm uniformly works for Shapelet Learning tasks on time-series classification and various MIL tasks with comparable accuracy to the existing methods. Moreover, we show that the proposed heuristics allow us to achieve the result with reasonable computational time.
Cluster-Based Learning from Weakly Labeled Bags in Digital Pathology2018-11-28   ${\displaystyle \cong }$
To alleviate the burden of gathering detailed expert annotations when training deep neural networks, we propose a weakly supervised learning approach to recognize metastases in microscopic images of breast lymph nodes. We describe an alternative training loss which clusters weakly labeled bags in latent space to inform relevance of patch-instances during training of a convolutional neural network. We evaluate our method on the Camelyon dataset which contains high-resolution digital slides of breast lymph nodes, where labels are provided at the image-level and only subsets of patches are made available during training.
A Visual Mining Approach to Improved Multiple-Instance Learning2020-12-14   ${\displaystyle \cong }$
Multiple-instance learning (MIL) is a paradigm of machine learning that aims to classify a set (bag) of objects (instances), assigning labels only to the bags. This problem is often addressed by selecting an instance to represent each bag, transforming a MIL problem into a standard supervised learning. Visualization can be a useful tool to assess learning scenarios by incorporating the users' knowledge into the classification process. Considering that multiple-instance learning is a paradigm that cannot be handled by current visualization techniques, we propose a multiscale tree-based visualization to support MIL. The first level of the tree represents the bags, and the second level represents the instances belonging to each bag, allowing the user to understand the data in an intuitive way. In addition, we propose two new instance selection methods for MIL, which help the user to improve the model even further. Our methods are also able to handle both binary and multiclass scenarios. In our experiments, SVM was used to build the classifiers. With support of the MILTree layout, the initial classification model was updated by changing the training set - composed by the prototype instances. Experimental results validate the effectiveness of our approach, showing that visual mining by MILTree can help users in exploring and improving models in MIL scenarios, and that our instance selection methods over-perform current available alternatives in most cases.
Distill-to-Label: Weakly Supervised Instance Labeling Using Knowledge Distillation2019-07-26   ${\displaystyle \cong }$
Weakly supervised instance labeling using only image-level labels, in lieu of expensive fine-grained pixel annotations, is crucial in several applications including medical image analysis. In contrast to conventional instance segmentation scenarios in computer vision, the problems that we consider are characterized by a small number of training images and non-local patterns that lead to the diagnosis. In this paper, we explore the use of multiple instance learning (MIL) to design an instance label generator under this weakly supervised setting. Motivated by the observation that an MIL model can handle bags of varying sizes, we propose to repurpose an MIL model originally trained for bag-level classification to produce reliable predictions for single instances, i.e., bags of size $1$. To this end, we introduce a novel regularization strategy based on virtual adversarial training for improving MIL training, and subsequently develop a knowledge distillation technique for repurposing the trained MIL model. Using empirical studies on colon cancer and breast cancer detection from histopathological images, we show that the proposed approach produces high-quality instance-level prediction and significantly outperforms state-of-the MIL methods.
CAMEL: A Weakly Supervised Learning Framework for Histopathology Image Segmentation2019-08-28   ${\displaystyle \cong }$
Histopathology image analysis plays a critical role in cancer diagnosis and treatment. To automatically segment the cancerous regions, fully supervised segmentation algorithms require labor-intensive and time-consuming labeling at the pixel level. In this research, we propose CAMEL, a weakly supervised learning framework for histopathology image segmentation using only image-level labels. Using multiple instance learning (MIL)-based label enrichment, CAMEL splits the image into latticed instances and automatically generates instance-level labels. After label enrichment, the instance-level labels are further assigned to the corresponding pixels, producing the approximate pixel-level labels and making fully supervised training of segmentation models possible. CAMEL achieves comparable performance with the fully supervised approaches in both instance-level classification and pixel-level segmentation on CAMELYON16 and a colorectal adenoma dataset. Moreover, the generality of the automatic labeling methodology may benefit future weakly supervised learning studies for histopathology image analysis.
Weakly Supervised Scalable Audio Content Analysis2016-06-12   ${\displaystyle \cong }$
Audio Event Detection is an important task for content analysis of multimedia data. Most of the current works on detection of audio events is driven through supervised learning approaches. We propose a weakly supervised learning framework which can make use of the tremendous amount of web multimedia data with significantly reduced annotation effort and expense. Specifically, we use several multiple instance learning algorithms to show that audio event detection through weak labels is feasible. We also propose a novel scalable multiple instance learning algorithm and show that its competitive with other multiple instance learning algorithms for audio event detection tasks.