# Search ICLR 2019

Searching papers submitted to ICLR 2019 can be painful. You might want to know which paper uses technique X, dataset D, or cites author ME. Unfortunately, search is limited to titles, abstracts, and keywords, missing the actual contents of the paper. This Frankensteinian search has returned from 2018 to help scour the papers of ICLR by ripping out their souls using pdftotext.

Good luck! Warranty's not included :)

Need random search inspiration..? Grab something from the list of all tags! ^_^
How about: language generation, locality sensitive hashing, object representations, 3d scene understanding, mitigation ..?

Sanity Disclaimer: As you stare at the continuous stream of ICLR and arXiv papers, don't lose confidence or feel overwhelmed. This isn't a competition, it's a search for knowledge. You and your work are valuable and help carve out the path for progress in our field :)

### "semi-supervised methods" has 11 results

##### A Guider Network for Multi-Dual Learning

No tl;dr =[

A large amount of parallel data is needed to train a strong neural machine translation (NMT) system. This is a major challenge for low-resource languages. Building on recent work on unsupervised and semi-supervised methods, we propose a multi-dual learning framework to improve the performance of NMT by using an almost infinite amount of available monolingual data and some parallel data of other languages. Since our framework involves multiple languages and components, we further propose a timing optimization method that uses reinforcement learning (RL) to optimally schedule the different components in order to avoid imbalanced training. Experimental results demonstrate the validity of our model, and confirm its superiority to existing dual learning methods.

##### BLISS in Non-Isometric Embedding Spaces

tl;dr A novel method to test for isometry between word embedding spaces, and a semi-supervised method for learning better mappings between them

Recent work on bilingual lexicon induction (BLI) has frequently depended either on aligned bilingual lexicons or on distribution matching, often with an assumption about the isometry of the two spaces. We propose a technique to quantitatively estimate this assumption of the isometry between two embedding spaces and empirically show that this assumption weakens as the languages in question become increasingly etymologically distant. We then propose Bilingual Lexicon Induction with Semi-Supervision (BLISS) --- a novel semi-supervised approach that relaxes the isometric assumption while leveraging both limited aligned bilingual lexicons and a larger set of unaligned word embeddings, as well as a novel hubness filtering technique. Our proposed method improves over strong baselines for 11 of 14 on the MUSE dataset, particularly for languages whose embedding spaces do not appear to be isometric. In addition, we also show that adding supervision stabilizes the learning procedure, and is effective even with minimal supervision.

##### Zero-shot Dual Machine Translation

tl;dr A multilingual NMT model with reinforcement learning (dual learning) aiming to improve zero-shot translation directions.

Neural Machine Translation (NMT) systems rely on large amounts of parallel data.This is a major challenge for low-resource languages. Building on recent work onunsupervised and semi-supervised methods, we present an approach that combineszero-shot and dual learning. The latter relies on reinforcement learning, to exploitthe duality of the machine translation task, and requires only monolingual datafor the target language pair. Experiments on the UN corpus show that a zero-shotdual system, trained on English-French and English-Spanish, outperforms by largemargins a standard NMT system in zero-shot translation performance on Spanish-French (both directions). We also evaluate onnewstest2014. These experimentsshow that the zero-shot dual method outperforms the LSTM-based unsupervisedNMT system proposed in (Lample et al., 2018b), on the en→fr task, while onthe fr→en task it outperforms both the LSTM-based and the Transformers-basedunsupervised NMT systems.

##### UNSUPERVISED MONOCULAR DEPTH ESTIMATION WITH CLEAR BOUNDARIES

tl;dr This paper propose a mask method which solves the previous blurred results of unsupervised monocular depth estimation caused by occlusion

Unsupervised monocular depth estimation has made great progress after deep learning is involved. Training with binocular stereo images is considered as a good option as the data can be easily obtained. However, the depth or disparity prediction results show poor performance for the object boundaries. The main reason is related to the handling of occlusion areas during the training. In this paper, we propose a novel method to overcome this issue. Exploiting disparity maps property, we generate an occlusion mask to block the back-propagation of the occlusion areas during image warping. We also design new networks with flipped stereo images to induce the networks to learn occluded boundaries. It shows that our method achieves clearer boundaries and better evaluation results on KITTI driving dataset and Virtual KITTI dataset.

##### Multi-class classification without multi-class labels

No tl;dr =[

This work presents a new strategy for multi-class classification that requires no class-specific labels, but instead leverages pairwise similarity between examples, which is a weaker form of annotation. The proposed method, meta classification learning, optimizes a binary classifier for pairwise similarity prediction and through this process learns a multi-class classifier as a submodule. We formulate this approach, present a probabilistic graphical model for it, and derive a surprisingly simple loss function that can be used to learn neural network-based models. We then demonstrate that this same framework generalizes to the supervised, unsupervised cross-task, and semi-supervised settings. Our method is evaluated against state of the art in all three learning paradigms and shows a superior or comparable accuracy, providing evidence that learning multi-class classification without multi-class labels is a viable learning option.

##### Unsupervised Graph Embedding using Dynamic Routing Between Capsules

No tl;dr =[

An important task in learning representations for graph-structured data is to learn node embeddings which are then used for node classification. Recent models, however, suffer from limitations in exploiting graph information in such a way that relative positions of nodes are preserved. In this paper, we propose a novel unsupervised embedding model, named CapsG, which is, to our best of knowledge, the first model using dynamic routing between capsules to overcome these limitations. Our CapsG is constructed with two capsule layers, wherein the first layer aims to encapsulate the raw features of nodes, while the second layer produces a vector output used to infer node embedding. Experimental results show that our proposed CapsG produces new state-of-the-art results on Cora, Citeseer and POS, and obtains very competitive results on Pubmed, PPI and BlogCatalog in comparison with existing state-of-the-art unsupervised and semi-supervised graph embedding models.

##### LEARNING TO PROPAGATE LABELS: TRANSDUCTIVE PROPAGATION NETWORK FOR FEW-SHOT LEARNING

tl;dr We propose a novel meta-learning framework for transductive inference that classifies the entire test set at once to alleviate the low-data problem.

The goal of few-shot learning is to learn a classifier that generalizes well even when trained with a limited number of training instances per class. The recently introduced meta-learning approaches tackle this problem by learning a generic classifier across a large number of multiclass classification tasks and generalizing the model to a new task. Yet, even with such meta-learning, the low-data problem in the novel classification task still remains. In this paper, we propose Transductive Propagation Network (TPN), a novel meta-learning framework for transductive inference that classifies the entire test set at once to alleviate the low-data problem. Specifically, we propose to learn to propagate labels from labeled instances to unlabeled test instances, by learning a graph construction module that exploits the manifold structure in the data. TPN jointly learns both the parameters of feature embedding and the graph construction in an end-to-end manner. We validate TPN on multiple benchmark datasets, on which it largely outperforms existing few-shot learning approaches and achieves the state-of-the-art results.

##### Label Propagation Networks

tl;dr Neural net for graph-based semi-supervised learning; revisits the classics and propagates *labels* rather than feature representations

Graph networks have recently attracted considerable interest, and in particular in the context of semi-supervised learning. These methods typically work by generating node representations that are propagated throughout a given weighted graph. Here we argue that for semi-supervised learning, it is more natural to consider propagating labels in the graph instead. Towards this end, we propose a differentiable neural version of the classic Label Propagation (LP) algorithm. This formulation can be used for learning edge weights, unlike other methods where weights are set heuristically. Starting from a layer implementing a single iteration of LP, we proceed by adding several important non-linear steps that significantly enhance the label-propagating mechanism. Experiments in two distinct settings demonstrate the utility of our approach.

##### Few-shot Classification on Graphs with Structural Regularized GCNs

No tl;dr =[

We consider the fundamental problem of semi-supervised node classification in attributed graphs with a focus on \emph{few-shot} learning. Here, we propose Structural Regularized Graph Convolutional Networks (SRGCN), novel neural network architectures extending the well-known GCN structures by stacking transposed convolutional layers for reconstruction of input features. We add a reconstruction error term in the loss function as a regularizer. Unlike standard regularization such as $L_1$ or $L_2$, which controls the model complexity by including a penalty term depends solely on parameters, our regularization function is parameterized by a trainable neural network whose structure depends on the topology of the underlying graph. The new approach effectively addresses the shortcomings of previous graph convolution-based techniques for learning classifiers in the few-shot regime and significantly improves generalization performance over original GCNs when the number of labeled samples is insufficient. Experimental studies on three challenging benchmarks demonstrate that the proposed approach has matched state-of-the-art results and can improve classification accuracies by a notable margin when there are very few examples from each class.

##### There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average

tl;dr Consistency-based models for semi-supervised learning do not converge to a single point but continue to explore a diverse set of plausible solutions on the perimeter of a flat region. Weight averaging helps improve generalization performance.

Presently the most successful approaches to semi-supervised learning are based on consistency regularization, whereby a model is trained to be robust to small perturbations of its inputs and parameters. The consistency loss dramatically improves generalization performance over supervised-only training; however, we show that SGD struggles to converge on the consistency loss and continues to make large steps that lead to changes in predictions on the test data. We show that averaging weights can significantly improve their generalization performance. Motivated by these observations, we propose to train consistency-based methods with Stochastic Weight Averaging (SWA), a recent approach which averages weights along the trajectory of SGD with a modified learning rate schedule. We also propose fast-SWA, which further accelerates convergence by averaging multiple points within each cycle of a cyclical learning rate schedule. With weight averaging, we achieve the best known semi-supervised results on CIFAR-10 and CIFAR-100 over many different settings of training labels. For example, we achieve 5.0% error on CIFAR-10 with only 4000 labels, compared to the previous best result in the literature of 6.3%.

##### CGNF: Conditional Graph Neural Fields

No tl;dr =[

Graph convolutional networks have achieved tremendous success in the tasks of graph node classification. These models could learn a better node representation through encoding the graph structure and node features. However, the correlation between the node labels are not considered. In this paper, we propose a novel architecture for graph node classification, named conditional graph neural fields (CGNF). By integrating the conditional random fields (CRF) in the graph convolutional networks, we explicitly model a joint probability of the entire set of node labels, thus taking advantage of neighborhood label information in the node label prediction task. Our model could have both the representation capacity of graph neural networks and the prediction power of CRFs. Experiments on several graph datasets demonstrate effectiveness of CGNF.