Models, code, and papers for "Dong In Kim":

Attentive Neural Network for Named Entity Recognition in Vietnamese

Oct 31, 2018
Ngan Dong, Kim Anh Nguyen

We propose an attentive neural network for the task of named entity recognition in Vietnamese. The proposed attentive neural model makes use of character-based language models and word embeddings to encode words as vector representations. A neural network architecture of encoder, attention, and decoder layers is then utilized to encode knowledge of input sentences and to label entity tags. The experimental results show that the proposed attentive neural network achieves the state-of-the-art results on the benchmark named entity recognition datasets in Vietnamese in comparison to both hand-crafted features based models and neural models.

  Click for Model/Code and Paper
Deep Neural Network for Real-Time Autonomous Indoor Navigation

Nov 26, 2015
Dong Ki Kim, Tsuhan Chen

Autonomous indoor navigation of Micro Aerial Vehicles (MAVs) possesses many challenges. One main reason is that GPS has limited precision in indoor environments. The additional fact that MAVs are not able to carry heavy weight or power consuming sensors, such as range finders, makes indoor autonomous navigation a challenging task. In this paper, we propose a practical system in which a quadcopter autonomously navigates indoors and finds a specific target, i.e., a book bag, by using a single camera. A deep learning model, Convolutional Neural Network (ConvNet), is used to learn a controller strategy that mimics an expert pilot's choice of action. We show our system's performance through real-time experiments in diverse indoor locations. To understand more about our trained network, we use several visualization techniques.

  Click for Model/Code and Paper
Adaptive Noise Reduction Scheme for Salt and Pepper

Jan 10, 2012
Tina Gebreyohannes, Dong-Yoon Kim

In this paper, a new adaptive noise reduction scheme for images corrupted by impulse noise is presented. The proposed scheme efficiently identifies and reduces salt and pepper noise. MAG (Mean Absolute Gradient) is used to identify pixels which are most likely corrupted by salt and pepper noise that are candidates for further median based noise reduction processing. Directional filtering is then applied after noise reduction to achieve a good tradeoff between detail preservation and noise removal. The proposed scheme can remove salt and pepper noise with noise density as high as 90% and produce better result in terms of qualitative and quantitative measures of images.

* 9 pages, 5 figures 

  Click for Model/Code and Paper
Satellite Image-based Localization via Learned Embeddings

Apr 04, 2017
Dong-Ki Kim, Matthew R. Walter

We propose a vision-based method that localizes a ground vehicle using publicly available satellite imagery as the only prior knowledge of the environment. Our approach takes as input a sequence of ground-level images acquired by the vehicle as it navigates, and outputs an estimate of the vehicle's pose relative to a georeferenced satellite image. We overcome the significant viewpoint and appearance variations between the images through a neural multi-view model that learns location-discriminative embeddings in which ground-level images are matched with their corresponding satellite view of the scene. We use this learned function as an observation model in a filtering framework to maintain a distribution over the vehicle's pose. We evaluate our method on different benchmark datasets and demonstrate its ability localize ground-level images in environments novel relative to training, despite the challenges of significant viewpoint and appearance variations.

* To be published in IEEE International Conference on Robotics and Automation (ICRA), 2017 

  Click for Model/Code and Paper
Baseline CNN structure analysis for facial expression recognition

Nov 14, 2016
Minchul Shin, Munsang Kim, Dong-Soo Kwon

We present a baseline convolutional neural network (CNN) structure and image preprocessing methodology to improve facial expression recognition algorithm using CNN. To analyze the most efficient network structure, we investigated four network structures that are known to show good performance in facial expression recognition. Moreover, we also investigated the effect of input image preprocessing methods. Five types of data input (raw, histogram equalization, isotropic smoothing, diffusion-based normalization, difference of Gaussian) were tested, and the accuracy was compared. We trained 20 different CNN models (4 networks x 5 data input types) and verified the performance of each network with test images from five different databases. The experiment result showed that a three-layer structure consisting of a simple convolutional and a max pooling layer with histogram equalization image input was the most efficient. We describe the detailed training procedure and analyze the result of the test accuracy based on considerable observation.

* 6 pages, RO-MAN2016 Conference 

  Click for Model/Code and Paper
Meta Continual Learning

Jun 11, 2018
Risto Vuorio, Dong-Yeon Cho, Daejoong Kim, Jiwon Kim

Using neural networks in practical settings would benefit from the ability of the networks to learn new tasks throughout their lifetimes without forgetting the previous tasks. This ability is limited in the current deep neural networks by a problem called catastrophic forgetting, where training on new tasks tends to severely degrade performance on previous tasks. One way to lessen the impact of the forgetting problem is to constrain parameters that are important to previous tasks to stay close to the optimal parameters. Recently, multiple competitive approaches for computing the importance of the parameters with respect to the previous tasks have been presented. In this paper, we propose a learning to optimize algorithm for mitigating catastrophic forgetting. Instead of trying to formulate a new constraint function ourselves, we propose to train another neural network to predict parameter update steps that respect the importance of parameters to the previous tasks. In the proposed meta-training scheme, the update predictor is trained to minimize loss on a combination of current and past tasks. We show experimentally that the proposed approach works in the continual learning setting.

  Click for Model/Code and Paper
Decoding Neural Responses in Mouse Visual Cortex through a Deep Neural Network

Oct 26, 2019
Asim Iqbal, Phil Dong, Christopher M Kim, Heeun Jang

Finding a code to unravel the population of neural responses that leads to a distinct animal behavior has been a long-standing question in the field of neuroscience. With the recent advances in machine learning, it is shown that the hierarchically Deep Neural Networks (DNNs) perform optimally in decoding unique features out of complex datasets. In this study, we utilize the power of a DNN to explore the computational principles in the mammalian brain by exploiting the Neuropixel data from Allen Brain Institute. We decode the neural responses from mouse visual cortex to predict the presented stimuli to the animal for natural (bear, trees, cheetah, etc.) and artificial (drifted gratings, orientated bars, etc.) classes. Our results indicate that neurons in mouse visual cortex encode the features of natural and artificial objects in a distinct manner, and such neural code is consistent across animals. We investigate this by applying transfer learning to train a DNN on the neural responses of a single animal and test its generalized performance across multiple animals. Within a single animal, DNN is able to decode the neural responses with as much as 100% classification accuracy. Across animals, this accuracy is reduced to 91%. This study demonstrates the potential of utilizing the DNN models as a computational framework to understand the neural coding principles in the mammalian brain.

* 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, 2019 

  Click for Model/Code and Paper
Image Captioning with Very Scarce Supervised Data: Adversarial Semi-Supervised Learning Approach

Sep 05, 2019
Dong-Jin Kim, Jinsoo Choi, Tae-Hyun Oh, In So Kweon

Constructing an organized dataset comprised of a large number of images and several captions for each image is a laborious task, which requires vast human effort. On the other hand, collecting a large number of images and sentences separately may be immensely easier. In this paper, we develop a novel data-efficient semi-supervised framework for training an image captioning model. We leverage massive unpaired image and caption data by learning to associate them. To this end, our proposed semi-supervised learning method assigns pseudo-labels to unpaired samples via Generative Adversarial Networks to learn the joint distribution of image and caption. To evaluate, we construct scarcely-paired COCO dataset, a modified version of MS COCO caption dataset. The empirical results show the effectiveness of our method compared to several strong baselines, especially when the amount of the paired samples are scarce.

* EMNLP 2019 

  Click for Model/Code and Paper
GRDN:Grouped Residual Dense Network for Real Image Denoising and GAN-based Real-world Noise Modeling

May 27, 2019
Dong-Wook Kim, Jae Ryun Chung, Seung-Won Jung

Recent research on image denoising has progressed with the development of deep learning architectures, especially convolutional neural networks. However, real-world image denoising is still very challenging because it is not possible to obtain ideal pairs of ground-truth images and real-world noisy images. Owing to the recent release of benchmark datasets, the interest of the image denoising community is now moving toward the real-world denoising problem. In this paper, we propose a grouped residual dense network (GRDN), which is an extended and generalized architecture of the state-of-the-art residual dense network (RDN). The core part of RDN is defined as grouped residual dense block (GRDB) and used as a building module of GRDN. We experimentally show that the image denoising performance can be significantly improved by cascading GRDBs. In addition to the network architecture design, we also develop a new generative adversarial network-based real-world noise modeling method. We demonstrate the superiority of the proposed methods by achieving the highest score in terms of both the peak signal-to-noise ratio and the structural similarity in the NTIRE2019 Real Image Denoising Challenge - Track 2:sRGB.

* To appear in CVPR 2019 workshop. The winners of the NTIRE2019 Challenge on Image Denoising Challenge: Track 2 sRGB 

  Click for Model/Code and Paper
Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning

Apr 01, 2019
Dong-Jin Kim, Jinsoo Choi, Tae-Hyun Oh, In So Kweon

Our goal in this work is to train an image captioning model that generates more dense and informative captions. We introduce "relational captioning," a novel image captioning task which aims to generate multiple captions with respect to relational information between objects in an image. Relational captioning is a framework that is advantageous in both diversity and amount of information, leading to image understanding based on relationships. Part-of speech (POS, i.e. subject-object-predicate categories) tags can be assigned to every English word. We leverage the POS as a prior to guide the correct sequence of words in a caption. To this end, we propose a multi-task triple-stream network (MTTSNet) which consists of three recurrent units for the respective POS and jointly performs POS prediction and captioning. We demonstrate more diverse and richer representations generated by the proposed model against several baselines and competing methods.

* CVPR 2019. Project page : 

  Click for Model/Code and Paper
Policy Distillation and Value Matching in Multiagent Reinforcement Learning

Mar 15, 2019
Samir Wadhwania, Dong-Ki Kim, Shayegan Omidshafiei, Jonathan P. How

Multiagent reinforcement learning algorithms (MARL) have been demonstrated on complex tasks that require the coordination of a team of multiple agents to complete. Existing works have focused on sharing information between agents via centralized critics to stabilize learning or through communication to increase performance, but do not generally look at how information can be shared between agents to address the curse of dimensionality in MARL. We posit that a multiagent problem can be decomposed into a multi-task problem where each agent explores a subset of the state space instead of exploring the entire state space. This paper introduces a multiagent actor-critic algorithm and method for combining knowledge from homogeneous agents through distillation and value-matching that outperforms policy distillation alone and allows further learning in both discrete and continuous action spaces.

* Submitted as a conference paper to IROS 2019 

  Click for Model/Code and Paper
Crossmodal Attentive Skill Learner

May 22, 2018
Shayegan Omidshafiei, Dong-Ki Kim, Jason Pazis, Jonathan P. How

This paper presents the Crossmodal Attentive Skill Learner (CASL), integrated with the recently-introduced Asynchronous Advantage Option-Critic (A2OC) architecture [Harb et al., 2017] to enable hierarchical reinforcement learning across multiple sensory inputs. We provide concrete examples where the approach not only improves performance in a single task, but accelerates transfer to new tasks. We demonstrate the attention mechanism anticipates and identifies useful latent features, while filtering irrelevant sensor modalities during execution. We modify the Arcade Learning Environment [Bellemare et al., 2013] to support audio queries, and conduct evaluations of crossmodal learning in the Atari 2600 game Amidar. Finally, building on the recent work of Babaeizadeh et al. [2017], we open-source a fast hybrid CPU-GPU implementation of CASL.

* International Conference on Autonomous Agents and Multiagent Systems (AAMAS) 2018, NIPS 2017 Deep Reinforcement Learning Symposium 

  Click for Model/Code and Paper
HMM Specialization with Selective Lexicalization

Dec 23, 1999
Jin-Dong Kim, Sang-Zoo Lee, Hae-Chang Rim

We present a technique which complements Hidden Markov Models by incorporating some lexicalized states representing syntactically uncommon words. Our approach examines the distribution of transitions, selects the uncommon words, and makes lexicalized states for the words. We performed a part-of-speech tagging experiment on the Brown corpus to evaluate the resultant language model and discovered that this technique improved the tagging accuracy by 0.21% at the 95% level of confidence.

* Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp.121-127, 1999 
* 7 pages, 6 figures 

  Click for Model/Code and Paper
Multi-Temporal Recurrent Neural Networks For Progressive Non-Uniform Single Image Deblurring With Incremental Temporal Training

Nov 18, 2019
Dongwon Park, Dong Un Kang, Jisoo Kim, Se Young Chun

Multi-scale (MS) approaches have been widely investigated for blind single image / video deblurring that sequentially recovers deblurred images in low spatial scale first and then in high spatial scale later with the output of lower scales. MS approaches have been effective especially for severe blurs induced by large motions in high spatial scale since those can be seen as small blurs in low spatial scale. In this work, we investigate alternative approach to MS, called multi-temporal (MT) approach, for non-uniform single image deblurring. We propose incremental temporal training with constructed MT level dataset from time-resolved dataset, develop novel MT-RNNs with recurrent feature maps, and investigate progressive single image deblurring over iterations. Our proposed MT methods outperform state-of-the-art MS methods on the GoPro dataset in PSNR with the smallest number of parameters.

* 10 pages, 8 figures, 6 tables, work in progress 

  Click for Model/Code and Paper
Discovery of Natural Language Concepts in Individual Units of CNNs

Feb 28, 2019
Seil Na, Yo Joong Choe, Dong-Hyun Lee, Gunhee Kim

Although deep convolutional networks have achieved improved performance in many natural language tasks, they have been treated as black boxes because they are difficult to interpret. Especially, little is known about how they represent language in their intermediate layers. In an attempt to understand the representations of deep convolutional networks trained on language tasks, we show that individual units are selectively responsive to specific morphemes, words, and phrases, rather than responding to arbitrary and uninterpretable patterns. In order to quantitatively analyze such an intriguing phenomenon, we propose a concept alignment method based on how units respond to the replicated text. We conduct analyses with different architectures on multiple datasets for classification and translation tasks and provide new insights into how deep models understand natural language.

* Published as a conference paper at ICLR 2019 

  Click for Model/Code and Paper
Lightweight 3D Human Pose Estimation Network Training Using Teacher-Student Learning

Jan 15, 2020
Dong-Hyun Hwang, Suntae Kim, Nicolas Monet, Hideki Koike, Soonmin Bae

We present MoVNect, a lightweight deep neural network to capture 3D human pose using a single RGB camera. To improve the overall performance of the model, we apply the teacher-student learning method based knowledge distillation to 3D human pose estimation. Real-time post-processing makes the CNN output yield temporally stable 3D skeletal information, which can be used in applications directly. We implement a 3D avatar application running on mobile in real-time to demonstrate that our network achieves both high accuracy and fast inference time. Extensive evaluations show the advantages of our lightweight model with the proposed training method over previous 3D pose estimation methods on the Human3.6M dataset and mobile devices.

* 10 pages 

  Click for Model/Code and Paper
Planning for target retrieval using a robotic manipulator in cluttered and occluded environments

Jul 09, 2019
Changjoo Nam, Jinhwi Lee, Younggil Cho, Jeongho Lee, Dong Hwan Kim, ChangHwan Kim

This paper presents planning algorithms for a robotic manipulator with a fixed base in order to grasp a target object in cluttered environments. We consider a configuration of objects in a confined space with a high density so no collision-free path to the target exists. The robot must relocate some objects to retrieve the target while avoiding collisions. For fast completion of the retrieval task, the robot needs to compute a plan optimizing an appropriate objective value directly related to the execution time of the relocation plan. We propose planning algorithms that aim to minimize the number of objects to be relocated. Our objective value is appropriate for the object retrieval task because grasping and releasing objects often dominate the total running time. In addition to the algorithm working in fully known and static environments, we propose algorithms that can deal with uncertain and dynamic situations incurred by occluded views. The proposed algorithms are shown to be complete and run in polynomial time. Our methods reduce the total running time significantly compared to a baseline method (e.g., 25.1% of reduction in a known static environment with 10 objects

* 8 pages, 14 figures 

  Click for Model/Code and Paper
Discriminative Few-Shot Learning Based on Directional Statistics

Jun 05, 2019
Junyoung Park, Subin Yi, Yongseok Choi, Dong-Yeon Cho, Jiwon Kim

Metric-based few-shot learning methods try to overcome the difficulty due to the lack of training examples by learning embedding to make comparison easy. We propose a novel algorithm to generate class representatives for few-shot classification tasks. As a probabilistic model for learned features of inputs, we consider a mixture of von Mises-Fisher distributions which is known to be more expressive than Gaussian in a high dimensional space. Then, from a discriminative classifier perspective, we get a better class representative considering inter-class correlation which has not been addressed by conventional few-shot learning algorithms. We apply our method to \emph{mini}ImageNet and \emph{tiered}ImageNet datasets, and show that the proposed approach outperforms other comparable methods in few-shot classification tasks.

  Click for Model/Code and Paper
End-to-end Lung Nodule Detection in Computed Tomography

Oct 02, 2018
Dufan Wu, Kyungsang Kim, Bin Dong, Georges El Fakhri, Quanzheng Li

Computer aided diagnostic (CAD) system is crucial for modern med-ical imaging. But almost all CAD systems operate on reconstructed images, which were optimized for radiologists. Computer vision can capture features that is subtle to human observers, so it is desirable to design a CAD system op-erating on the raw data. In this paper, we proposed a deep-neural-network-based detection system for lung nodule detection in computed tomography (CT). A primal-dual-type deep reconstruction network was applied first to convert the raw data to the image space, followed by a 3-dimensional convolutional neural network (3D-CNN) for the nodule detection. For efficient network training, the deep reconstruction network and the CNN detector was trained sequentially first, then followed by one epoch of end-to-end fine tuning. The method was evaluated on the Lung Image Database Consortium image collection (LIDC-IDRI) with simulated forward projections. With 144 multi-slice fanbeam pro-jections, the proposed end-to-end detector could achieve comparable sensitivity with the reference detector, which was trained and applied on the fully-sampled image data. It also demonstrated superior detection performance compared to detectors trained on the reconstructed images. The proposed method is general and could be expanded to most detection tasks in medical imaging.

* published at MLMI 2018 

  Click for Model/Code and Paper