Models, code, and papers for "Yun Lu":

Deep Neural Networks for Pattern Recognition

Sep 25, 2018
Kyongsik Yun, Alexander Huyen, Thomas Lu

In the field of pattern recognition research, the method of using deep neural networks based on improved computing hardware recently attracted attention because of their superior accuracy compared to conventional methods. Deep neural networks simulate the human visual system and achieve human equivalent accuracy in image classification, object detection, and segmentation. This chapter introduces the basic structure of deep neural networks that simulate human neural networks. Then we identify the operational processes and applications of conditional generative adversarial networks, which are being actively researched based on the bottom-up and top-down mechanisms, the most important functions of the human visual perception process. Finally, recent developments in training strategies for effective learning of complex deep neural networks are addressed.


  Access Model/Code and Paper
Occluded object reconstruction for first responders with augmented reality glasses using conditional generative adversarial networks

Apr 20, 2018
Kyongsik Yun, Thomas Lu, Edward Chow

Firefighters suffer a variety of life-threatening risks, including line-of-duty deaths, injuries, and exposures to hazardous substances. Support for reducing these risks is important. We built a partially occluded object reconstruction method on augmented reality glasses for first responders. We used a deep learning based on conditional generative adversarial networks to train associations between the various images of flammable and hazardous objects and their partially occluded counterparts. Our system then reconstructed an image of a new flammable object. Finally, the reconstructed image was superimposed on the input image to provide "transparency". The system imitates human learning about the laws of physics through experience by learning the shape of flammable objects and the flame characteristics.

* SPIE 2018 

  Access Model/Code and Paper
Predicting Rapid Fire Growth (Flashover) Using Conditional Generative Adversarial Networks

Jan 30, 2018
Kyongsik Yun, Jessi Bustos, Thomas Lu

A flashover occurs when a fire spreads very rapidly through crevices due to intense heat. Flashovers present one of the most frightening and challenging fire phenomena to those who regularly encounter them: firefighters. Firefighters' safety and lives often depend on their ability to predict flashovers before they occur. Typical pre-flashover fire characteristics include dark smoke, high heat, and rollover ("angel fingers") and can be quantified by color, size, and shape. Using a color video stream from a firefighter's body camera, we applied generative adversarial neural networks for image enhancement. The neural networks were trained to enhance very dark fire and smoke patterns in videos and monitor dynamic changes in smoke and fire areas. Preliminary tests with limited flashover training videos showed that we predicted a flashover as early as 55 seconds before it occurred.

* 4 pages, 3 figures 

  Access Model/Code and Paper
A Novel and Efficient Tumor Detection Framework for Pancreatic Cancer via CT Images

Feb 11, 2020
Zhengdong Zhang, Shuai Li, Ziyang Wang, Yun Lu

As Deep Convolutional Neural Networks (DCNNs) have shown robust performance and results in medical image analysis, a number of deep-learning-based tumor detection methods were developed in recent years. Nowadays, the automatic detection of pancreatic tumors using contrast-enhanced Computed Tomography (CT) is widely applied for the diagnosis and staging of pancreatic cancer. Traditional hand-crafted methods only extract low-level features. Normal convolutional neural networks, however, fail to make full use of effective context information, which causes inferior detection results. In this paper, a novel and efficient pancreatic tumor detection framework aiming at fully exploiting the context information at multiple scales is designed. More specifically, the contribution of the proposed method mainly consists of three components: Augmented Feature Pyramid networks, Self-adaptive Feature Fusion and a Dependencies Computation (DC) Module. A bottom-up path augmentation to fully extract and propagate low-level accurate localization information is established firstly. Then, the Self-adaptive Feature Fusion can encode much richer context information at multiple scales based on the proposed regions. Finally, the DC Module is specifically designed to capture the interaction information between proposals and surrounding tissues. Experimental results achieve competitive performance in detection with the AUC of 0.9455, which outperforms other state-of-the-art methods to our best of knowledge, demonstrating the proposed framework can detect the tumor of pancreatic cancer efficiently and accurately.

* 5 pages, 5 figures 

  Access Model/Code and Paper
High-quality Ellipse Detection Based on Arc-support Line Segments

Oct 08, 2018
Changsheng Lu, Siyu Xia, Ming Shao, Yun Fu

Over the years many ellipse detection algorithms spring up and are studied broadly, while the critical issue of detecting ellipses accurately and efficiently in real-world images remains a challenge. In this paper, an accurate and efficient ellipse detector by arc-support line segments is proposed. The arc-support line segment simplifies the complicated expression of curves in an image while retains the general properties including convexity and polarity, which grounds the successful detection of ellipses. The arc-support groups are formed by iteratively and robustly linking the arc-support line segments that latently belong to a common ellipse at point statistics level. Afterward, two complementary approaches, namely, selecting the group with higher saliency to fit an ellipse, and searching all the valid paired arc-support groups, are utilized to generate the initial ellipse set, both locally and globally. In ellipse fitting step, a superposition principle for the fast ellipse fitting is developed to accelerate the process. Then, the ellipse candidates can be formulated by the hierarchical clustering of 5D parameter space of initial ellipse set. Finally, the salient ellipse candidates are selected as detections subject to the stringent and effective verification. Extensive experiments on three public datasets are implemented and our method achieves the best F-measure scores compared to the state-of-the-art methods.

* Due to the limit size of files in arXiv, the resolution of figures maybe not very high. Please contact me if you want the higher quality pdf 

  Access Model/Code and Paper
Causal Mechanism-based Model Construction

Jan 16, 2013
Tsai-Ching Lu, Marek J. Druzdzel, Tze-Yun Leong

We propose a framework for building graphical causal model that is based on the concept of causal mechanisms. Causal models are intuitive for human users and, more importantly, support the prediction of the effect of manipulation. We describe an implementation of the proposed framework as an interactive model construction module, ImaGeNIe, in SMILE (Structural Modeling, Inference, and Learning Engine) and in GeNIe (SMILE's Windows user interface).

* Appears in Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI2000) 

  Access Model/Code and Paper
Channel Attention and Multi-level Features Fusion for Single Image Super-Resolution

Oct 16, 2018
Yue Lu, Yun Zhou, Zhuqing Jiang, Xiaoqiang Guo, Zixuan Yang

Convolutional neural networks (CNNs) have demonstrated superior performance in super-resolution (SR). However, most CNN-based SR methods neglect the different importance among feature channels or fail to take full advantage of the hierarchical features. To address these issues, this paper presents a novel recursive unit. Firstly, at the beginning of each unit, we adopt a compact channel attention mechanism to adaptively recalibrate the channel importance of input features. Then, the multi-level features, rather than only deep-level features, are extracted and fused. Additionally, we find that it will force our model to learn more details by using the learnable upsampling method (i.e., transposed convolution) only on residual branch (instead of using it both on residual branch and identity branch) while using the bicubic interpolation on the other branch. Analytic experiments show that our method achieves competitive results compared with the state-of-the-art methods and maintains faster speed as well.

* 4 pages, 3 figures, Accepted as an oral presentation at VCIP 

  Access Model/Code and Paper
Automatic speech recognition for launch control center communication using recurrent neural networks with data augmentation and custom language model

Apr 24, 2018
Kyongsik Yun, Joseph Osborne, Madison Lee, Thomas Lu, Edward Chow

Transcribing voice communications in NASA's launch control center is important for information utilization. However, automatic speech recognition in this environment is particularly challenging due to the lack of training data, unfamiliar words in acronyms, multiple different speakers and accents, and conversational characteristics of speaking. We used bidirectional deep recurrent neural networks to train and test speech recognition performance. We showed that data augmentation and custom language models can improve speech recognition accuracy. Transcribing communications from the launch control center will help the machine analyze information and accelerate knowledge generation.

* SPIE 2018 

  Access Model/Code and Paper
Order-Preserving Abstractive Summarization for Spoken Content Based on Connectionist Temporal Classification

Nov 16, 2017
Bo-Ru Lu, Frank Shyu, Yun-Nung Chen, Hung-Yi Lee, Lin-shan Lee

Connectionist temporal classification (CTC) is a powerful approach for sequence-to-sequence learning, and has been popularly used in speech recognition. The central ideas of CTC include adding a label "blank" during training. With this mechanism, CTC eliminates the need of segment alignment, and hence has been applied to various sequence-to-sequence learning problems. In this work, we applied CTC to abstractive summarization for spoken content. The "blank" in this case implies the corresponding input data are less important or noisy; thus it can be ignored. This approach was shown to outperform the existing methods in term of ROUGE scores over Chinese Gigaword and MATBN corpora. This approach also has the nice property that the ordering of words or characters in the input documents can be better preserved in the generated summaries.

* Accepted by Interspeech 2017 

  Access Model/Code and Paper
Double-Head RCNN: Rethinking Classification and Localization for Object Detection

May 31, 2019
Yue Wu, Yinpeng Chen, Lu Yuan, Zicheng Liu, Lijuan Wang, Hongzhi Li, Yun Fu

Modern R-CNN based detectors apply a head to extract Region of Interest (RoI) features for both classification and localization tasks. In contrast, we found that these two tasks have opposite preferences towards two widely used head structures (i.e. fully connected head and convolution head). Specifically, the fully connected head is more suitable for the classification task, while the convolution head is more suitable for the localization task. Therefore, we propose a Double-Head method, which has a fully connected head focusing on classification and a convolution head to pay more attention to bounding box regression. In addition, we have two findings for the unfocused tasks (i.e. classification in the convolution head, and bounding box regression in the fully connected head): (a) adding classification to the convolution head is complementary to the classification in the fully connected head, and (b) bounding box regression provides auxiliary supervision for the fully connected head. Without bells and whistles, our method gains +3.5 and +2.8 AP on MS COCO dataset from Feature Pyramid Network (FPN) baselines with ResNet-50 and ResNet-101 backbones, respectively.


  Access Model/Code and Paper
Improved visible to IR image transformation using synthetic data augmentation with cycle-consistent adversarial networks

Apr 25, 2019
Kyongsik Yun, Kevin Yu, Joseph Osborne, Sarah Eldin, Luan Nguyen, Alexander Huyen, Thomas Lu

Infrared (IR) images are essential to improve the visibility of dark or camouflaged objects. Object recognition and segmentation based on a neural network using IR images provide more accuracy and insight than color visible images. But the bottleneck is the amount of relevant IR images for training. It is difficult to collect real-world IR images for special purposes, including space exploration, military and fire-fighting applications. To solve this problem, we created color visible and IR images using a Unity-based 3D game editor. These synthetically generated color visible and IR images were used to train cycle consistent adversarial networks (CycleGAN) to convert visible images to IR images. CycleGAN has the advantage that it does not require precisely matching visible and IR pairs for transformation training. In this study, we discovered that additional synthetic data can help improve CycleGAN performance. Neural network training using real data (N = 20) performed more accurate transformations than training using real (N = 10) and synthetic (N = 10) data combinations. The result indicates that the synthetic data cannot exceed the quality of the real data. Neural network training using real (N = 10) and synthetic (N = 100) data combinations showed almost the same performance as training using real data (N = 20). At least 10 times more synthetic data than real data is required to achieve the same performance. In summary, CycleGAN is used with synthetic data to improve the IR image conversion performance of visible images.

* 8 pages, 6 figures, SPIE 

  Access Model/Code and Paper
Rethinking Classification and Localization in R-CNN

Apr 13, 2019
Yue Wu, Yinpeng Chen, Lu Yuan, Zicheng Liu, Lijuan Wang, Hongzhi Li, Yun Fu

Modern R-CNN based detectors share the RoI feature extractor head for both classification and localization tasks, based upon the correlation between the two tasks. In contrast, we found that different head structures (i.e. fully connected head and convolution head) have opposite preferences towards these two tasks. Specifically, the fully connected head is more suitable for the classification task, while the convolution head is more suitable for the localization task. Therefore, we propose a double-head method to separate these two tasks into different heads (i.e. a fully connected head for classification and a convolution head for box regression). Without bells and whistles, our method gains +3.4 and +2.7 points mAP on MS COCO dataset from Feature Pyramid Network baselines with ResNet-50 and ResNet-101 backbones, respectively.


  Access Model/Code and Paper
Small Target Detection for Search and Rescue Operations using Distributed Deep Learning and Synthetic Data Generation

Apr 25, 2019
Kyongsik Yun, Luan Nguyen, Tuan Nguyen, Doyoung Kim, Sarah Eldin, Alexander Huyen, Thomas Lu, Edward Chow

It is important to find the target as soon as possible for search and rescue operations. Surveillance camera systems and unmanned aerial vehicles (UAVs) are used to support search and rescue. Automatic object detection is important because a person cannot monitor multiple surveillance screens simultaneously for 24 hours. Also, the object is often too small to be recognized by the human eye on the surveillance screen. This study used UAVs around the Port of Houston and fixed surveillance cameras to build an automatic target detection system that supports the US Coast Guard (USCG) to help find targets (e.g., person overboard). We combined image segmentation, enhancement, and convolution neural networks to reduce detection time to detect small targets. We compared the performance between the auto-detection system and the human eye. Our system detected the target within 8 seconds, but the human eye detected the target within 25 seconds. Our systems also used synthetic data generation and data augmentation techniques to improve target detection accuracy. This solution may help the search and rescue operations of the first responders in a timely manner.

* 6 pages, 4 figures, SPIE 

  Access Model/Code and Paper
DymSLAM:4D Dynamic Scene Reconstruction Based on Geometrical Motion Segmentation

Mar 10, 2020
Chenjie Wang, Bin Luo, Yun Zhang, Qing Zhao, Lu Yin, Wei Wang, Xin Su, Yajun Wang, Chengyuan Li

Most SLAM algorithms are based on the assumption that the scene is static. However, in practice, most scenes are dynamic which usually contains moving objects, these methods are not suitable. In this paper, we introduce DymSLAM, a dynamic stereo visual SLAM system being capable of reconstructing a 4D (3D + time) dynamic scene with rigid moving objects. The only input of DymSLAM is stereo video, and its output includes a dense map of the static environment, 3D model of the moving objects and the trajectories of the camera and the moving objects. We at first detect and match the interesting points between successive frames by using traditional SLAM methods. Then the interesting points belonging to different motion models (including ego-motion and motion models of rigid moving objects) are segmented by a multi-model fitting approach. Based on the interesting points belonging to the ego-motion, we are able to estimate the trajectory of the camera and reconstruct the static background. The interesting points belonging to the motion models of rigid moving objects are then used to estimate their relative motion models to the camera and reconstruct the 3D models of the objects. We then transform the relative motion to the trajectories of the moving objects in the global reference frame. Finally, we then fuse the 3D models of the moving objects into the 3D map of the environment by considering their motion trajectories to obtain a 4D (3D+time) sequence. DymSLAM obtains information about the dynamic objects instead of ignoring them and is suitable for unknown rigid objects. Hence, the proposed system allows the robot to be employed for high-level tasks, such as obstacle avoidance for dynamic objects. We conducted experiments in a real-world environment where both the camera and the objects were moving in a wide range.

* 8 pages 

  Access Model/Code and Paper
Coronary Artery Segmentation in Angiographic Videos Using A 3D-2D CE-Net

Mar 26, 2020
Lu Wang, Dong-xue Liang, Xiao-lei Yin, Jing Qiu, Zhi-yun Yang, Jun-hui Xing, Jian-zeng Dong, Zhao-yuan Ma

Coronary angiography is an indispensable assistive technique for cardiac interventional surgery. Segmentation and extraction of blood vessels from coronary angiography videos are very essential prerequisites for physicians to locate, assess and diagnose the plaques and stenosis in blood vessels. This article proposes a new video segmentation framework that can extract the clearest and most comprehensive coronary angiography images from a video sequence, thereby helping physicians to better observe the condition of blood vessels. This framework combines a 3D convolutional layer to extract spatial--temporal information from a video sequence and a 2D CE--Net to accomplish the segmentation task of an image sequence. The input is a few continuous frames of angiographic video, and the output is a mask of segmentation result. From the results of segmentation and extraction, we can get good segmentation results despite the poor quality of coronary angiography video sequences.


  Access Model/Code and Paper
Weakly-supervised 3D coronary artery reconstruction from two-view angiographic images

Mar 26, 2020
Lu Wang, Dong-xue Liang, Xiao-lei Yin, Jing Qiu, Zhi-yun Yang, Jun-hui Xing, Jian-zeng Dong, Zhao-yuan Ma

The reconstruction of three-dimensional models of coronary arteries is of great significance for the localization, evaluation and diagnosis of stenosis and plaque in the arteries, as well as for the assisted navigation of interventional surgery. In the clinical practice, physicians use a few angles of coronary angiography to capture arterial images, so it is of great practical value to perform 3D reconstruction directly from coronary angiography images. However, this is a very difficult computer vision task due to the complex shape of coronary blood vessels, as well as the lack of data set and key point labeling. With the rise of deep learning, more and more work is being done to reconstruct 3D models of human organs from medical images using deep neural networks. We propose an adversarial and generative way to reconstruct three dimensional coronary artery models, from two different views of angiographic images of coronary arteries. With 3D fully supervised learning and 2D weakly supervised learning schemes, we obtained reconstruction accuracies that outperform state-of-art techniques.


  Access Model/Code and Paper
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences

Apr 16, 2019
Kong Aik Lee, Ville Hautamaki, Tomi Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Hector Delgado, Jose Patino, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda, Trung Ngo Trong, Md Sahidullah, Fan Lu, Yun Tang, Ming Tu, Kah Kuan Teh, Huy Dat Tran, Kuruvachan K. George, Ivan Kukanov, Florent Desnous, Jichen Yang, Emre Yilmaz, Longting Xu, Jean-Francois Bonastre, Chenglin Xu, Zhi Hao Lim, Eng Siong Chng, Shivesh Ranjan, John H. L. Hansen, Massimiliano Todisco, Nicholas Evans

The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE). The latest edition of such joint submission was in SRE 2018, in which the I4U submission was among the best-performing systems. SRE'18 also marks the 10-year anniversary of I4U consortium into NIST SRE series of evaluation. The primary objective of the current paper is to summarize the results and lessons learned based on the twelve sub-systems and their fusion submitted to SRE'18. It is also our intention to present a shared view on the advancements, progresses, and major paradigm shifts that we have witnessed as an SRE participant in the past decade from SRE'08 to SRE'18. In this regard, we have seen, among others, a paradigm shift from supervector representation to deep speaker embedding, and a switch of research challenge from channel compensation to domain adaptation.

* 5 pages 

  Access Model/Code and Paper
xSense: Learning Sense-Separated Sparse Representations and Textual Definitions for Explainable Word Sense Networks

Sep 10, 2018
Ting-Yun Chang, Ta-Chung Chi, Shang-Chi Tsai, Yun-Nung Chen

Despite the success achieved on various natural language processing tasks, word embeddings are difficult to interpret due to the dense vector representations. This paper focuses on interpreting the embeddings for various aspects, including sense separation in the vector dimensions and definition generation. Specifically, given a context together with a target word, our algorithm first projects the target word embedding to a high-dimensional sparse vector and picks the specific dimensions that can best explain the semantic meaning of the target word by the encoded contextual information, where the sense of the target word can be indirectly inferred. Finally, our algorithm applies an RNN to generate the textual definition of the target word in the human readable form, which enables direct interpretation of the corresponding word embedding. This paper also introduces a large and high-quality context-definition dataset that consists of sense definitions together with multiple example sentences per polysemous word, which is a valuable resource for definition modeling and word sense disambiguation. The conducted experiments show the superior performance in BLEU score and the human evaluation test.


  Access Model/Code and Paper
Context Aware Machine Learning

Jan 19, 2019
Yun Zeng

We propose a principle for exploring context in machine learning models. Starting with a simple assumption that each observation may or may not depend on its context, a conditional probability distribution is decomposed into two parts: context-free and context-sensitive. Then by employing the log-linear word production model for relating random variables to their embedding space representation and making use of the convexity of natural exponential function, we show that the embedding of an observation can also be decomposed into a weighted sum of two vectors, representing its context-free and context-sensitive parts, respectively. This simple treatment of context provides a unified view of many existing deep learning models, leading to revisions of these models able to achieve significant performance boost. Specifically, our upgraded version of a recent sentence embedding model not only outperforms the original one by a large margin, but also leads to a new, principled approach for compositing the embeddings of bag-of-words features, as well as a new architecture for modeling attention in deep neural networks. More surprisingly, our new principle provides a novel understanding of the gates and equations defined by the long short term memory model, which also leads to a new model that is able to converge significantly faster and achieve much lower prediction errors. Furthermore, our principle also inspires a new type of generic neural network layer that better resembles real biological neurons than the traditional linear mapping plus nonlinear activation based architecture. Its multi-layer extension provides a new principle for deep neural networks which subsumes residual network (ResNet) as its special case, and its extension to convolutional neutral network model accounts for irrelevant input (e.g., background in an image) in addition to filtering.


  Access Model/Code and Paper
Efficiently Sampling Multiplicative Attribute Graphs Using a Ball-Dropping Process

Feb 28, 2012
Hyokun Yun, S. V. N. Vishwanathan

We introduce a novel and efficient sampling algorithm for the Multiplicative Attribute Graph Model (MAGM - Kim and Leskovec (2010)}). Our algorithm is \emph{strictly} more efficient than the algorithm proposed by Yun and Vishwanathan (2012), in the sense that our method extends the \emph{best} time complexity guarantee of their algorithm to a larger fraction of parameter space. Both in theory and in empirical evaluation on sparse graphs, our new algorithm outperforms the previous one. To design our algorithm, we first define a stochastic \emph{ball-dropping process} (BDP). Although a special case of this process was introduced as an efficient approximate sampling algorithm for the Kronecker Product Graph Model (KPGM - Leskovec et al. (2010)}), neither \emph{why} such an approximation works nor \emph{what} is the actual distribution this process is sampling from has been addressed so far to the best of our knowledge. Our rigorous treatment of the BDP enables us to clarify the rational behind a BDP approximation of KPGM, and design an efficient sampling algorithm for the MAGM.


  Access Model/Code and Paper