Research papers and code for "Yongxin Yang":
Most contemporary multi-task learning methods assume linear models. This setting is considered shallow in the era of deep learning. In this paper, we present a new deep multi-task representation learning framework that learns cross-task sharing structure at every layer in a deep network. Our approach is based on generalising the matrix factorisation techniques explicitly or implicitly used by many conventional MTL algorithms to tensor factorisation, to realise automatic learning of end-to-end knowledge sharing in deep networks. This is in contrast to existing deep learning approaches that need a user-defined multi-task sharing strategy. Our approach applies to both homogeneous and heterogeneous MTL. Experiments demonstrate the efficacy of our deep multi-task representation learning in terms of both higher accuracy and fewer design choices.

* 9 pages, Accepted to ICLR 2017 Conference Track. This is a conference version of the paper. For the multi-domain learning part (not in this version), please refer to https://arxiv.org/pdf/1605.06391v1.pdf
Click to Read Paper and Get Code
Most visual recognition methods implicitly assume the data distribution remains unchanged from training to testing. However, in practice domain shift often exists, where real-world factors such as lighting and sensor type change between train and test, and classifiers do not generalise from source to target domains. It is impractical to train separate models for all possible situations because collecting and labelling the data is expensive. Domain adaptation algorithms aim to ameliorate domain shift, allowing a model trained on a source to perform well on a different target domain. However, even for the setting of unsupervised domain adaptation, where the target domain is unlabelled, collecting data for every possible target domain is still costly. In this paper, we propose a new domain adaptation method that has no need to access either data or labels of the target domain when it can be described by a parametrised vector and there exits several related source domains within the same parametric space. It greatly reduces the burden of data collection and annotation, and our experiments show some promising results.

* Accepted to BMVC 2015 Workshop on Differential Geometry in Computer Vision (DIFF-CV)
Click to Read Paper and Get Code
We propose a framework for training multiple neural networks simultaneously. The parameters from all models are regularised by the tensor trace norm, so that each neural network is encouraged to reuse others' parameters if possible -- this is the main motivation behind multi-task learning. In contrast to many deep multi-task learning models, we do not predefine a parameter sharing strategy by specifying which layers have tied parameters. Instead, our framework considers sharing for all shareable layers, and the sharing strategy is learned in a data-driven way.

* Submission to Workshop track - ICLR 2017
Click to Read Paper and Get Code
Multi-domain learning aims to benefit from simultaneously learning across several different but related domains. In this chapter, we propose a single framework that unifies multi-domain learning (MDL) and the related but better studied area of multi-task learning (MTL). By exploiting the concept of a \emph{semantic descriptor} we show how our framework encompasses various classic and recent MDL/MTL algorithms as special cases with different semantic descriptor encodings. As a second contribution, we present a higher order generalisation of this framework, capable of simultaneous multi-task-multi-domain learning. This generalisation has two mathematically equivalent views in multi-linear algebra and gated neural networks respectively. Moreover, by exploiting the semantic descriptor, it provides neural networks the capability of zero-shot learning (ZSL), where a classifier is generated for an unseen class without any training data; as well as zero-shot domain adaptation (ZSDA), where a model is generated for an unseen domain without any training data. In practice, this framework provides a powerful yet easy to implement method that can be flexibly applied to MTL, MDL, ZSL and ZSDA.

* Invited book chapter
Click to Read Paper and Get Code
In this paper, we provide a new neural-network based perspective on multi-task learning (MTL) and multi-domain learning (MDL). By introducing the concept of a semantic descriptor, this framework unifies MDL and MTL as well as encompassing various classic and recent MTL/MDL algorithms by interpreting them as different ways of constructing semantic descriptors. Our interpretation provides an alternative pipeline for zero-shot learning (ZSL), where a model for a novel class can be constructed without training data. Moreover, it leads to a new and practically relevant problem setting of zero-shot domain adaptation (ZSDA), which is the analogous to ZSL but for novel domains: A model for an unseen domain can be generated by its semantic descriptor. Experiments across this range of problems demonstrate that our framework outperforms a variety of alternatives.

* 9 pages, Accepted to ICLR 2015 Conference Track
Click to Read Paper and Get Code
In this paper, we propose a gated deep neural network model to predict implied volatility surfaces. Conventional financial conditions and empirical evidence related to the implied volatility are incorporated into the neural network architecture design and calibration including no static arbitrage, boundaries, asymptotic slope and volatility smile. They are also satisfied empirically by the option data on the S&P 500 over a ten years period. Our proposed model outperforms the widely used surface stochastic volatility inspired model on the mean average percentage error in both in-sample and out-of-sample datasets. The research of this study has a fundamental methodological contribution to the emerging trend of applying the state-of-the-art information technology into business studies as our model provides a framework of integrating data-driven machine learning algorithms with financial theories and this framework can be easily extended and applied to solve other problems in finance or other business fields.

* 11 pages, working paper
Click to Read Paper and Get Code
Today's AI still faces two major challenges. One is that in most industries, data exists in the form of isolated islands. The other is the strengthening of data privacy and security. We propose a possible solution to these challenges: secure federated learning. Beyond the federated learning framework first proposed by Google in 2016, we introduce a comprehensive secure federated learning framework, which includes horizontal federated learning, vertical federated learning and federated transfer learning. We provide definitions, architectures and applications for the federated learning framework, and provide a comprehensive survey of existing works on this subject. In addition, we propose building data networks among organizations based on federated mechanisms as an effective solution to allow knowledge to be shared without compromising user privacy.

* ACM Transactions on Intelligent Systems and Technology (TIST) Volume 10 Issue 2, Article No. 12, January 2019
Click to Read Paper and Get Code
We propose a neural network approach to price EU call options that significantly outperforms some existing pricing models and comes with guarantees that its predictions are economically reasonable. To achieve this, we introduce a class of gated neural networks that automatically learn to divide-and-conquer the problem space for robust and accurate pricing. We then derive instantiations of these networks that are 'rational by design' in terms of naturally encoding a valid call option surface that enforces no arbitrage principles. This integration of human insight within data-driven learning provides significantly better generalisation in pricing performance due to the encoded inductive bias in the learning, guarantees sanity in the model's predictions, and provides econometrically useful byproduct such as risk neutral density.

* Accepted to AAAI 2017. 7 pages, 5 figures
Click to Read Paper and Get Code
As an instance-level recognition problem, person re-identification (ReID) relies on discriminative features, which not only capture different spatial scales but also encapsulate an arbitrary combination of multiple scales. We call these features of both homogeneous and heterogeneous scales omni-scale features. In this paper, a novel deep CNN is designed, termed Omni-Scale Network (OSNet), for omni-scale feature learning in ReID. This is achieved by designing a residual block composed of multiple convolutional feature streams, each detecting features at a certain scale. Importantly, a novel unified aggregation gate is introduced to dynamically fuse multi-scale features with input-dependent channel-wise weights. To efficiently learn spatial-channel correlations and avoid overfitting, the building block uses both pointwise and depthwise convolutions. By stacking such blocks layer-by-layer, our OSNet is extremely lightweight and can be trained from scratch on existing ReID benchmarks. Despite its small model size, our OSNet achieves state-of-the-art performance on six person-ReID datasets.

* Tech report
Click to Read Paper and Get Code
Deep neural networks have been proven powerful at processing perceptual data, such as images and audio. However for tabular data, tree-based models are more popular. A nice property of tree-based models is their natural interpretability. In this work, we present Deep Neural Decision Trees (DNDT) -- tree models realised by neural networks. A DNDT is intrinsically interpretable, as it is a tree. Yet as it is also a neural network (NN), it can be easily implemented in NN toolkits, and trained with gradient descent rather than greedy splitting. We evaluate DNDT on several tabular datasets, verify its efficacy, and investigate similarities and differences between DNDT and vanilla decision trees. Interestingly, DNDT self-prunes at both split and feature-level.

* presented at 2018 ICML Workshop on Human Interpretability in Machine Learning (WHI 2018), Stockholm, Sweden
Click to Read Paper and Get Code
The well known domain shift issue causes model performance to degrade when deployed to a new target domain with different statistics to training. Domain adaptation techniques alleviate this, but need some instances from the target domain to drive adaptation. Domain generalization is the recently topical problem of learning a model that generalizes to unseen domains out of the box, without accessing any target data. Various domain generalization approaches aim to train a domain-invariant feature extractor, typically by adding some manually designed losses. In this work, we propose a learning to learn approach, where the auxiliary loss that helps generalization is itself learned. This approach is conceptually simple and flexible, and leads to considerable improvement in robustness to domain shift. Beyond conventional domain generalization, we consider a more challenging setting of heterogeneous domain generalization, where the unseen domains do not share label space with the seen ones, and the goal is to train a feature which is useful off-the-shelf for novel data and novel categories. Experimental evaluation demonstrates that our method outperforms state-of-the-art solutions in both settings.

Click to Read Paper and Get Code
In this paper, a unified approach is presented to transfer learning that addresses several source and target domain label-space and annotation assumptions with a single model. It is particularly effective in handling a challenging case, where source and target label-spaces are disjoint, and outperforms alternatives in both unsupervised and semi-supervised settings. The key ingredient is a common representation termed Common Factorised Space. It is shared between source and target domains, and trained with an unsupervised factorisation loss and a graph-based loss. With a wide range of experiments, we demonstrate the flexibility, relevance and efficacy of our method, both in the challenging cases with disjoint label spaces, and in the more conventional cases such as unsupervised domain adaptation, where the source and target domains share the same label-sets.

* AAAI-19
Click to Read Paper and Get Code
We propose to model complex visual scenes using a non-parametric Bayesian model learned from weakly labelled images abundant on media sharing sites such as Flickr. Given weak image-level annotations of objects and attributes without locations or associations between them, our model aims to learn the appearance of object and attribute classes as well as their association on each object instance. Once learned, given an image, our model can be deployed to tackle a number of vision problems in a joint and coherent manner, including recognising objects in the scene (automatic object annotation), describing objects using their attributes (attribute prediction and association), and localising and delineating the objects (object detection and semantic segmentation). This is achieved by developing a novel Weakly Supervised Markov Random Field Stacked Indian Buffet Process (WS-MRF-SIBP) that models objects and attributes as latent factors and explicitly captures their correlations within and across superpixels. Extensive experiments on benchmark datasets demonstrate that our weakly supervised model significantly outperforms weakly supervised alternatives and is often comparable with existing strongly supervised models on a variety of tasks including semantic segmentation, automatic image annotation and retrieval based on object-attribute associations.

* Accepted in IEEE Transaction on Pattern Analysis and Machine Intelligence
Click to Read Paper and Get Code
When humans describe images they tend to use combinations of nouns and adjectives, corresponding to objects and their associated attributes respectively. To generate such a description automatically, one needs to model objects, attributes and their associations. Conventional methods require strong annotation of object and attribute locations, making them less scalable. In this paper, we model object-attribute associations from weakly labelled images, such as those widely available on media sharing sites (e.g. Flickr), where only image-level labels (either object or attributes) are given, without their locations and associations. This is achieved by introducing a novel weakly supervised non-parametric Bayesian model. Once learned, given a new image, our model can describe the image, including objects, attributes and their associations, as well as their locations and segmentation. Extensive experiments on benchmark datasets demonstrate that our weakly supervised model performs at par with strongly supervised models on tasks such as image description and retrieval based on object-attribute associations.

* 14 pages, Accepted to ECCV 2014
Click to Read Paper and Get Code
Domain shift refers to the well known problem that a model trained in one source domain performs poorly when applied to a target domain with different statistics. {Domain Generalization} (DG) techniques attempt to alleviate this issue by producing models which by design generalize well to novel testing domains. We propose a novel {meta-learning} method for domain generalization. Rather than designing a specific model that is robust to domain shift as in most previous DG work, we propose a model agnostic training procedure for DG. Our algorithm simulates train/test domain shift during training by synthesizing virtual testing domains within each mini-batch. The meta-optimization objective requires that steps to improve training domain performance should also improve testing domain performance. This meta-learning procedure trains models with good generalization ability to novel domains. We evaluate our method and achieve state of the art results on a recent cross-domain image classification benchmark, as well demonstrating its potential on two classic reinforcement learning tasks.

* 8 pages, 2 figures, under review of AAAI 2018
Click to Read Paper and Get Code
The problem of domain generalization is to learn from multiple training domains, and extract a domain-agnostic model that can then be applied to an unseen domain. Domain generalization (DG) has a clear motivation in contexts where there are target domains with distinct characteristics, yet sparse data for training. For example recognition in sketch images, which are distinctly more abstract and rarer than photos. Nevertheless, DG methods have primarily been evaluated on photo-only benchmarks focusing on alleviating the dataset bias where both problems of domain distinctiveness and data sparsity can be minimal. We argue that these benchmarks are overly straightforward, and show that simple deep learning baselines perform surprisingly well on them. In this paper, we make two main contributions: Firstly, we build upon the favorable domain shift-robust properties of deep learning methods, and develop a low-rank parameterized CNN model for end-to-end DG learning. Secondly, we develop a DG benchmark dataset covering photo, sketch, cartoon and painting domains. This is both more practically relevant, and harder (bigger domain shift) than existing benchmarks. The results show that our method outperforms existing DG alternatives, and our dataset provides a more significant DG challenge to drive future research.

* 9 pages, 4 figures, ICCV 2017
Click to Read Paper and Get Code
Deep convolutional neural networks have recently proven extremely effective for difficult face recognition problems in uncontrolled settings. To train such networks, very large training sets are needed with millions of labeled images. For some applications, such as near-infrared (NIR) face recognition, such large training datasets are not publicly available and difficult to collect. In this work, we propose a method to generate very large training datasets of synthetic images by compositing real face images in a given dataset. We show that this method enables to learn models from as few as 10,000 training images, which perform on par with models trained from 500,000 images. Using our approach we also obtain state-of-the-art results on the CASIA NIR-VIS2.0 heterogeneous face recognition dataset.

* IEEE TIP
Click to Read Paper and Get Code
We propose a novel and flexible approach to meta-learning for learning-to-learn from only a few examples. Our framework is motivated by actor-critic reinforcement learning, but can be applied to both reinforcement and supervised learning. The key idea is to learn a meta-critic: an action-value function neural network that learns to criticise any actor trying to solve any specified task. For supervised learning, this corresponds to the novel idea of a trainable task-parametrised loss generator. This meta-critic approach provides a route to knowledge transfer that can flexibly deal with few-shot and semi-supervised conditions for both reinforcement and supervised learning. Promising results are shown on both reinforcement and supervised learning problems.

* Technical report, 12 pages, 3 figures, 2 tables
Click to Read Paper and Get Code
Zero-shot learning has received increasing interest as a means to alleviate the often prohibitive expense of annotating training data for large scale recognition problems. These methods have achieved great success via learning intermediate semantic representations in the form of attributes and more recently, semantic word vectors. However, they have thus far been constrained to the single-label case, in contrast to the growing popularity and importance of more realistic multi-label data. In this paper, for the first time, we investigate and formalise a general framework for multi-label zero-shot learning, addressing the unique challenge therein: how to exploit multi-label correlation at test time with no training data for those classes? In particular, we propose (1) a multi-output deep regression model to project an image into a semantic word space, which explicitly exploits the correlations in the intermediate semantic layer of word vectors; (2) a novel zero-shot learning algorithm for multi-label data that exploits the unique compositionality property of semantic word vector representations; and (3) a transductive learning strategy to enable the regression model learned from seen classes to generalise well to unseen classes. Our zero-shot learning experiments on a number of standard multi-label datasets demonstrate that our method outperforms a variety of baselines.

* 12 pages, 6 figures, Accepted to BMVC 2014 (oral)
Click to Read Paper and Get Code
Few-shot deep learning is a topical challenge area for scaling visual recognition to open-ended growth in the space of categories to recognise. A promising line work towards realising this vision is deep networks that learn to match queries with stored training images. However, methods in this paradigm usually train a deep embedding followed by a single linear classifier. Our insight is that effective general-purpose matching requires discrimination with regards to features at multiple abstraction levels. We therefore propose a new framework termed Deep Comparison Network(DCN) that decomposes embedding learning into a sequence of modules, and pairs each with a relation module. The relation modules compute a non-linear metric to score the match using the corresponding embedding module's representation. To ensure that all embedding module's features are used, the relation modules are deeply supervised. Finally generalisation is further improved by a learned noise regulariser. The resulting network achieves state of the art performance on both miniImageNet and tieredImageNet, while retaining the appealing simplicity and efficiency of deep metric learning approaches.

* 10 pages, 5 figures
Click to Read Paper and Get Code