Research papers and code for "Kevin Yu":
Overcoming the visual barrier and developing "see-through vision" has been one of mankind's long-standing dreams. However, visible light cannot travel through opaque obstructions (e.g. walls). Unlike visible light, though, Radio Frequency (RF) signals penetrate many common building objects and reflect highly off humans. This project creates a breakthrough artificial intelligence methodology by which the skeletal structure of a human can be reconstructed with RF even through visual occlusion. In a novel procedural flow, video and RF data are first collected simultaneously using a co-located setup containing an RGB camera and RF antenna array transceiver. Next, the RGB video is processed with a Part Affinity Field computer-vision model to generate ground truth label locations for each keypoint in the human skeleton. Then, a collective deep-learning model consisting of a Residual Convolutional Neural Network, Region Proposal Network, and Recurrent Neural Network 1) extracts spatial features from RF images, 2) detects and crops out all people present in the scene, and 3) aggregates information over dozens of time-steps to piece together the various limbs that reflect signals back to the receiver at different times. A simulator is created to demonstrate the system. This project has impactful applications in medicine, military, search & rescue, and robotics. Especially during a fire emergency, neither visible light nor infrared thermal imaging can penetrate smoke or fire, but RF can. With over 1 million fires reported in the US per year, this technology could save thousands of lives and tens-of-thousands of injuries.

Click to Read Paper and Get Code
We study the problem of planning a tour for an energy-limited Unmanned Aerial Vehicle (UAV) to visit a set of sites in the least amount of time. We envision scenarios where the UAV can be recharged along the way either by landing on stationary recharging stations or on Unmanned Ground Vehicles (UGVs) acting as mobile recharging stations. This leads to a new variant of the Traveling Salesperson Problem (TSP) with mobile recharging stations. We present an algorithm that finds not only the order in which to visit the sites but also when and where to land on the charging stations to recharge. Our algorithm plans tours for the UGVs as well as determines best locations to place stationary charging stations. While the problems we study are NP-Hard, we present a practical solution using Generalized TSP that finds the optimal solution. If the UGVs are slower, the algorithm also finds the minimum number of UGVs required to support the UAV mission such that the UAV is not required to wait for the UGV. Our simulation results show that the running time is acceptable for reasonably sized instances in practice.

* 7 pages, 14 figures, ICRA2018 under review
Click to Read Paper and Get Code
Visual inspection of x-ray scattering images is a powerful technique for probing the physical structure of materials at the molecular scale. In this paper, we explore the use of deep learning to develop methods for automatically analyzing x-ray scattering images. In particular, we apply Convolutional Neural Networks and Convolutional Autoencoders for x-ray scattering image classification. To acquire enough training data for deep learning, we use simulation software to generate synthetic x-ray scattering images. Experiments show that deep learning methods outperform previously published methods by 10\% on synthetic and real datasets.

Click to Read Paper and Get Code
In recent years, more and more machine learning algorithms have been applied to odor recognition. These odor recognition algorithms usually assume that the training dataset is static. However, for some odor recognition tasks, the odor dataset is dynamically growing where not only the training samples but also the number of classes increase over time. Motivated by this concern, we proposed a deep nearest class mean (DNCM) model which combines the deep learning framework and nearest class mean (NCM) method. DNCM not only can leverage deep neural network to extract deep features, but also well suited for integrating new classes. Experiments demonstrate that the proposed DNCM model is effective and efficient for incremental odor classification, especially for new classes with only a small number of training examples.

* 15 pages, 6 figures
Click to Read Paper and Get Code
The analysis of the collaborative learning process is one of the growing fields of education research, which has many different analytic solutions. In this paper, we provided a new solution to improve automated collaborative learning analyses using deep neural networks. Instead of using self-reported questionnaires, which are subject to bias and noise, we automatically extract group-working information by object recognition results using Mask R-CNN method. This process is based on detecting the people and other objects from pictures and video clips of the collaborative learning process, then evaluate the mobile learning performance using the collaborative indicators. We tested our approach to automatically evaluate the group-work collaboration in a controlled study of thirty-three dyads while performing an anatomy body painting intervention. The results indicate that our approach recognizes the differences of collaborations among teams of treatment and control groups in the case study. This work introduces new methods for automated quality prediction of collaborations among human-human interactions using computer vision techniques.

* 6 pages, 4 Figures and a Table
Click to Read Paper and Get Code
Predicting odor's pleasantness simplifies the evaluation of odors and has the potential to be applied in perfumes and environmental monitoring industry. Classical algorithms for predicting odor's pleasantness generally use a manual feature extractor and an independent classifier. Manual designing a good feature extractor depend on expert knowledge and experience is the key to the accuracy of the algorithms. In order to circumvent this difficulty, we proposed a model for predicting odor's pleasantness by using convolutional neural network. In our model, the convolutional neural layers replace manual feature extractor and show better performance. The experiments show that the correlation between our model and human is over 90% on pleasantness rating. And our model has 99.9% accuracy in distinguishing between absolutely pleasant or unpleasant odors.

Click to Read Paper and Get Code
Infrared (IR) images are essential to improve the visibility of dark or camouflaged objects. Object recognition and segmentation based on a neural network using IR images provide more accuracy and insight than color visible images. But the bottleneck is the amount of relevant IR images for training. It is difficult to collect real-world IR images for special purposes, including space exploration, military and fire-fighting applications. To solve this problem, we created color visible and IR images using a Unity-based 3D game editor. These synthetically generated color visible and IR images were used to train cycle consistent adversarial networks (CycleGAN) to convert visible images to IR images. CycleGAN has the advantage that it does not require precisely matching visible and IR pairs for transformation training. In this study, we discovered that additional synthetic data can help improve CycleGAN performance. Neural network training using real data (N = 20) performed more accurate transformations than training using real (N = 10) and synthetic (N = 10) data combinations. The result indicates that the synthetic data cannot exceed the quality of the real data. Neural network training using real (N = 10) and synthetic (N = 100) data combinations showed almost the same performance as training using real data (N = 20). At least 10 times more synthetic data than real data is required to achieve the same performance. In summary, CycleGAN is used with synthetic data to improve the IR image conversion performance of visible images.

* 8 pages, 6 figures, SPIE
Click to Read Paper and Get Code
We present a neural network model - based on CNNs, RNNs and a novel attention mechanism - which achieves 84.2% accuracy on the challenging French Street Name Signs (FSNS) dataset, significantly outperforming the previous state of the art (Smith'16), which achieved 72.46%. Furthermore, our new method is much simpler and more general than the previous approach. To demonstrate the generality of our model, we show that it also performs well on an even more challenging dataset derived from Google Street View, in which the goal is to extract business names from store fronts. Finally, we study the speed/accuracy tradeoff that results from using CNN feature extractors of different depths. Surprisingly, we find that deeper is not always better (in terms of accuracy, as well as speed). Our resulting model is simple, accurate and fast, allowing it to be used at scale on a variety of challenging real-world text extraction problems.

* Updated references, added link to the source code
Click to Read Paper and Get Code
In this paper, we present a simple yet effective padding scheme that can be used as a drop-in module for existing convolutional neural networks. We call it partial convolution based padding, with the intuition that the padded region can be treated as holes and the original input as non-holes. Specifically, during the convolution operation, the convolution results are re-weighted near image borders based on the ratios between the padded area and the convolution sliding window area. Extensive experiments with various deep network models on ImageNet classification and semantic segmentation demonstrate that the proposed padding scheme consistently outperforms standard zero padding with better accuracy.

* 11 pages; code is available at https://github.com/NVIDIA/partialconv
Click to Read Paper and Get Code
Fluoroscopic X-ray guidance is a cornerstone for percutaneous orthopaedic surgical procedures. However, two-dimensional observations of the three-dimensional anatomy suffer from the effects of projective simplification. Consequently, many X-ray images from various orientations need to be acquired for the surgeon to accurately assess the spatial relations between the patient's anatomy and the surgical tools. In this paper, we present an on-the-fly surgical support system that provides guidance using augmented reality and can be used in quasi-unprepared operating rooms. The proposed system builds upon a multi-modality marker and simultaneous localization and mapping technique to co-calibrate an optical see-through head mounted display to a C-arm fluoroscopy system. Then, annotations on the 2D X-ray images can be rendered as virtual objects in 3D providing surgical guidance. We quantitatively evaluate the components of the proposed system, and finally, design a feasibility study on a semi-anthropomorphic phantom. The accuracy of our system was comparable to the traditional image-guided technique while substantially reducing the number of acquired X-ray images as well as procedure time. Our promising results encourage further research on the interaction between virtual and real objects, that we believe will directly benefit the proposed method. Further, we would like to explore the capabilities of our on-the-fly augmented reality support system in a larger study directed towards common orthopaedic interventions.

* J. Med. Imag. 5(2), 2018
* S. Andress, A. Johnson, M. Unberath, and A. Winkler have contributed equally and are listed in alphabetical order
Click to Read Paper and Get Code
We report the discussion session at the sixth international Genetic Improvement workshop, GI-2019 @ ICSE, which was held as part of the 41st ACM/IEEE International Conference on Software Engineering on Tuesday 28th May 2019. Topics included GI representations, the maintainability of evolved code, automated software testing, future areas of GI research, such as co-evolution, and existing GI tools and benchmarks.

* University College London, Computer Science
Click to Read Paper and Get Code
While recommendation systems generally observe user behavior passively, there has been an increased interest in directly querying users to learn their specific preferences. In such settings, considering queries at different levels of granularity to optimize user information acquisition is crucial to efficiently providing a good user experience. In this work, we study the active learning problem with multi-level user preferences within the collective matrix factorization (CMF) framework. CMF jointly captures multi-level user preferences with respect to items and relations between items (e.g., book genre, cuisine type), generally resulting in improved predictions. Motivated by finite-sample analysis of the CMF model, we propose a theoretically optimal active learning strategy based on the Fisher information matrix and use this to derive a realizable approximation algorithm for practical recommendations. Experiments are conducted using both the Yelp dataset directly and an illustrative synthetic dataset in the three settings of personalized active learning, cold-start recommendations, and noisy data -- demonstrating strong improvements over several widely used active learning methods.

Click to Read Paper and Get Code
We consider the problem of automatically generating textual paraphrases with modified attributes or stylistic properties, focusing on the setting without parallel data (Hu et al., 2017; Shen et al., 2017). This setting poses challenges for learning and evaluation. We show that the metric of post-transfer classification accuracy is insufficient on its own, and propose additional metrics based on semantic content preservation and fluency. For reliable evaluation, all three metric categories must be taken into account. We contribute new loss functions and training strategies to address the new metrics. Semantic preservation is addressed by adding a cyclic consistency loss and a loss based on paraphrase pairs, while fluency is improved by integrating losses based on style-specific language models. Automatic and manual evaluation show large improvements over the baseline method of Shen et al. (2017). Our hope is that these losses and metrics can be general and useful tools for a range of textual transfer settings without parallel corpora.

Click to Read Paper and Get Code
It is challenging to translate names and technical terms across languages with different alphabets and sound inventories. These items are commonly transliterated, i.e., replaced with approximate phonetic equivalents. For example, "computer" in English comes out as "konpyuutaa" in Japanese. Translating such items from Japanese back to English is even more challenging, and of practical interest, as transliterated items make up the bulk of text phrases not found in bilingual dictionaries. We describe and evaluate a method for performing backwards transliterations by machine. This method uses a generative model, incorporating several distinct stages in the transliteration process.

* 8 pages, postscript, to appear, ACL-97/EACL-97
Click to Read Paper and Get Code
We consider the problem of learning deep generative models from data. We formulate a method that generates an independent sample via a single feedforward pass through a multilayer perceptron, as in the recently proposed generative adversarial networks (Goodfellow et al., 2014). Training a generative adversarial network, however, requires careful optimization of a difficult minimax program. Instead, we utilize a technique from statistical hypothesis testing known as maximum mean discrepancy (MMD), which leads to a simple objective that can be interpreted as matching all orders of statistics between a dataset and samples from the model, and can be trained by backpropagation. We further boost the performance of this approach by combining our generative network with an auto-encoder network, using MMD to learn to generate codes that can then be decoded to produce samples. We show that the combination of these techniques yields excellent generative models compared to baseline approaches as measured on MNIST and the Toronto Face Database.

Click to Read Paper and Get Code
A key element in transfer learning is representation learning; if representations can be developed that expose the relevant factors underlying the data, then new tasks and domains can be learned readily based on mappings of these salient factors. We propose that an important aim for these representations are to be unbiased. Different forms of representation learning can be derived from alternative definitions of unwanted bias, e.g., bias to particular tasks, domains, or irrelevant underlying data dimensions. One very useful approach to estimating the amount of bias in a representation comes from maximum mean discrepancy (MMD) [5], a measure of distance between probability distributions. We are not the first to suggest that MMD can be a useful criterion in developing representations that apply across multiple domains or tasks [1]. However, in this paper we describe a number of novel applications of this criterion that we have devised, all based on the idea of developing unbiased representations. These formulations include: a standard domain adaptation framework; a method of learning invariant representations; an approach based on noise-insensitive autoencoders; and a novel form of generative model.

* Published in NIPS 2014 Workshop on Transfer and Multitask Learning, see http://nips.cc/Conferences/2014/Program/event.php?ID=4282
Click to Read Paper and Get Code
The recommendation to change breathing patterns from the mouth to the nose can have a significantly positive impact upon the general well being of the individual. We classify nasal and mouth breathing by using an acoustic sensor and intelligent signal processing techniques. The overall purpose is to investigate the possibility of identifying the differences in patterns between nasal and mouth breathing in order to integrate this information into a decision support system which will form the basis of a patient monitoring and motivational feedback system to recommend the change from mouth to nasal breathing. Our findings show that the breath pattern can be discriminated in certain places of the body both by visual spectrum analysis and with a Back Propagation neural network classifier. The sound file recoded from the sensor placed on the hollow in the neck shows the most promising accuracy which is as high as 90%.

* Hi, I just wish to withdraw this paper from the site until further notice. Thanks
Click to Read Paper and Get Code
We propose a new iris presentation attack detection method using three-dimensional features of an observed iris region estimated by photometric stereo. Our implementation uses a pair of iris images acquired by a common commercial iris sensor (LG 4000). No hardware modifications of any kind are required. Our approach should be applicable to any iris sensor that can illuminate the eye from two different directions. Each iris image in the pair is captured under near-infrared illumination at a different angle relative to the eye. Photometric stereo is used to estimate surface normal vectors in the non-occluded portions of the iris region. The variability of the normal vectors is used as the presentation attack detection score. This score is larger for a texture that is irregularly opaque and printed on a convex contact lens, and is smaller for an authentic iris texture. Thus the problem is formulated as binary classification into (a) an eye wearing textured contact lens and (b) the texture of an actual iris surface (possibly seen through a clear contact lens). Experiments were carried out on a database of approx. 2,900 iris image pairs acquired from approx. 100 subjects. Our method was able to correctly classify over 95% of samples when tested on contact lens brands unseen in training, and over 98% of samples when the contact lens brand was seen during training. The source codes of the method are made available to other researchers.

* Patent Pending. Paper accepted for WACV 2019, Hawaii, USA
Click to Read Paper and Get Code
One of the main benefits of a wrist-worn computer is its ability to collect a variety of physiological data in a minimally intrusive manner. Among these data, electrodermal activity (EDA) is readily collected and provides a window into a person's emotional and sympathetic responses. EDA data collected using a wearable wristband are easily influenced by motion artifacts (MAs) that may significantly distort the data and degrade the quality of analyses performed on the data if not identified and removed. Prior work has demonstrated that MAs can be successfully detected using supervised machine learning algorithms on a small data set collected in a lab setting. In this paper, we demonstrate that unsupervised learning algorithms perform competitively with supervised algorithms for detecting MAs on EDA data collected in both a lab-based setting and a real-world setting comprising about 23 hours of data. We also find, somewhat surprisingly, that incorporating accelerometer data as well as EDA improves detection accuracy only slightly for supervised algorithms and significantly degrades the accuracy of unsupervised algorithms.

* To appear at International Symposium on Wearable Computers (ISWC) 2017
Click to Read Paper and Get Code
We present a randomized primal-dual algorithm that solves the problem $\min_{x} \max_{y} y^\top A x$ to additive error $\epsilon$ in time $\mathrm{nnz}(A) + \sqrt{\mathrm{nnz}(A)n}/\epsilon$, for matrix $A$ with larger dimension $n$ and $\mathrm{nnz}(A)$ nonzero entries. This improves on Nemirovski's mirror-prox method by a factor of $\sqrt{\mathrm{nnz}(A)/n}$ and is faster than stochastic gradient methods in the accurate and/or sparse regime $\epsilon \le \sqrt{n/\mathrm{nnz}(A)}$. Our results hold for $x,y$ in the simplex (matrix games, linear programming) and for $x$ in an $\ell_2$ ball and $y$ in the simplex (perceptron / SVM, minimum enclosing ball). Our algorithm combines the mirror-prox method and a novel variance-reduced gradient estimator based on "sampling from the difference" between the current iterate and a reference point.

Click to Read Paper and Get Code