Models, code, and papers for "Dongming Lu":
Recent studies using deep neural networks have shown remarkable success in style transfer especially for artistic and photo-realistic images. However, the approaches using global feature correlations fail to capture small, intricate textures and maintain correct texture scales of the artworks, and the approaches based on local patches are defective on global effect. In this paper, we present a novel feature pyramid fusion neural network, dubbed GLStyleNet, which sufficiently takes into consideration multi-scale and multi-level pyramid features by best aggregating layers across a VGG network, and performs style transfer hierarchically with multiple losses of different scales. Our proposed method retains high-frequency pixel information and low frequency construct information of images from two aspects: loss function constraint and feature fusion. Our approach is not only flexible to adjust the trade-off between content and style, but also controllable between global and local. Compared to state-of-the-art methods, our method can transfer not just large-scale, obvious style cues but also subtle, exquisite ones, and dramatically improves the quality of style transfer. We demonstrate the effectiveness of our approach on portrait style transfer, artistic style transfer, photo-realistic style transfer and Chinese ancient painting style transfer tasks. Experimental results indicate that our unified approach improves image style transfer quality over previous state-of-the-art methods, while also accelerating the whole process in a certain extent. Our code is available at https://github.com/EndyWon/GLStyleNet.
Unsupervised neural nets such as Restricted Boltzmann Machines(RBMs) and Deep Belif Networks(DBNs), are powerful in automatic feature extraction,unsupervised weight initialization and density estimation. In this paper,we demonstrate that the parameters of these neural nets can be dramatically reduced without affecting their performance. We describe a method to reduce the parameters required by RBM which is the basic building block for deep architectures. Further we propose an unsupervised sparse deep architectures selection algorithm to form sparse deep neural networks.Experimental results show that there is virtually no loss in either generative or discriminative performance.
Image style transfer is an underdetermined problem, where a large number of solutions can explain the same constraint (i.e., the content and style). Most current methods always produce visually identical outputs, which lack of diversity. Recently, some methods have introduced an alternative diversity loss to train the feed-forward networks for diverse outputs, but they still suffer from many issues. In this paper, we propose a simple yet effective method for diversified style transfer. Our method can produce diverse outputs for arbitrary styles by incorporating the whitening and coloring transforms (WCT) with a novel deep feature perturbation (DFP) operation, which uses an orthogonal random noise matrix to perturb the deep image features while keeping the original style information unchanged. In addition, our method is learning-free and could be easily integrated into many existing WCT-based methods and empower them to generate diverse results. Experimental results demonstrate that our method can greatly increase the diversity while maintaining the quality of stylization. And several new user studies show that users could obtain more satisfactory results through the diversified approaches based on our method.
We study the data space $D$ of any given data set $X$ and explain how functions and relations are defined over $D$. From $D$ and for a specific domain $\Delta$ we construct the information space $I$ of $X$ by interpreting variables, functions, and explicit relations over $D$ in $\Delta$ and by including other relations that $D$ implies under the interpretation in $\Delta$. Then from $I$ we build up the knowledge space $K$ of $X$ as the product of two spaces $K_T$ and $K_P$, where $K_T$ is obtained from $I$ by using the induction principle to generalize propositional relations to quantified relations, the deduction principle to generate new relations, and standard mechanisms to validate relations and $K_P$ is the space of specifications of methods with operational instructions which are valid in $K_T$. Through our construction of the three topological spaces the following key observation is made clear: the retrieval of information from the given data set for $\Delta$ consists essentially in mining domain objects and relations, and the discovery of knowledge from the retrieved information consists essentially in applying the induction and deduction principles to generate propositions, synthesizing and modeling the information to generate specifications of methods with operational instructions, and validating the propositions and specifications. Based on this observation, efficient approaches may be designed to discover profound knowledge automatically from simple data, as demonstrated by the result of our study in the case of geometry.
This paper describes a multimodal vision sensor that integrates three types of cameras, including a stereo camera, a polarization camera and a panoramic camera. Each sensor provides a specific dimension of information: the stereo camera measures depth per pixel, the polarization obtains the degree of polarization, and the panoramic camera captures a 360-degree landscape. Data fusion and advanced environment perception could be built upon the combination of sensors. Designed especially for autonomous driving, this vision sensor is shipped with a robust semantic segmentation network. In addition, we demonstrate how cross-modal enhancement could be achieved by registering the color image and the polarization image. An example of water hazard detection is given. To prove the multimodal vision sensor's compatibility with different devices, a brief runtime performance analysis is carried out.
We propose an approach to generate geometric theorems from electronic images of diagrams automatically. The approach makes use of techniques of Hough transform to recognize geometric objects and their labels and of numeric verification to mine basic geometric relations. Candidate propositions are generated from the retrieved information by using six strategies and geometric theorems are obtained from the candidates via algebraic computation. Experiments with a preliminary implementation illustrate the effectiveness and efficiency of the proposed approach for generating nontrivial theorems from images of diagrams. This work demonstrates the feasibility of automated discovery of profound geometric knowledge from simple image data and has potential applications in geometric knowledge management and education.
Since detecting and recognizing individual human or object are not adequate to understand the visual world, learning how humans interact with surrounding objects becomes a core technology. However, convolution operations are weak in depicting visual interactions between the instances since they only build blocks that process one local neighborhood at a time. To address this problem, we learn from human perception in observing HOIs to introduce a two-stage trainable reasoning mechanism, referred to as GID block. GID block breaks through the local neighborhoods and captures long-range dependency of pixels both in global-level and instance-level from the scene to help detecting interactions between instances. Furthermore, we conduct a multi-stream network called GID-Net, which is a human-object interaction detection framework consisting of a human branch, an object branch and an interaction branch. Semantic information in global-level and local-level are efficiently reasoned and aggregated in each of the branches. We have compared our proposed GID-Net with existing state-of-the-art methods on two public benchmarks, including V-COCO and HICO-DET. The results have showed that GID-Net outperforms the existing best-performing methods on both the above two benchmarks, validating its efficacy in detecting human-object interactions.
Recently, significant progresses have been made in object detection on common benchmarks (i.e., Pascal VOC). However, object detection in real world is still challenging due to the serious data imbalance. Images in real world are dominated by easy samples like the wide range of background and some easily recognizable objects, for example. Although two-stage detectors like Faster R-CNN achieved big successes in object detection due to the strategy of extracting region proposals by region proposal network, they show their poor adaption in real-world object detection as a result of without considering mining hard samples during extracting region proposals. To address this issue, we propose a Cascade framework of Region Proposal Networks, referred to as C-RPNs. The essence of C-RPNs is adopting multiple stages to mine hard samples while extracting region proposals and learn stronger classifiers. Meanwhile, a feature chain and a score chain are proposed to help learning more discriminative representations for proposals. Moreover, a loss function of cascade stages is designed to train cascade classifiers through backpropagation. Our proposed method has been evaluated on Pascal VOC and several challenging datasets like BSBDV 2017, CityPersons, etc. Our method achieves competitive results compared with the current state-of-the-arts and all-sided improvements in error analysis, validating its efficacy for detection in real world.
Panoramic images have advantages in information capacity and scene stability due to their large field of view (FoV). In this paper, we propose a method to synthesize a new dataset of panoramic image. We managed to stitch the images taken from different directions into panoramic images, together with their labeled images, to yield the panoramic semantic segmentation dataset denominated as SYNTHIA-PANO. For the purpose of finding out the effect of using panoramic images as training dataset, we designed and performed a comprehensive set of experiments. Experimental results show that using panoramic images as training data is beneficial to the segmentation result. In addition, it has been shown that by using panoramic images with a 180 degree FoV as training data the model has better performance. Furthermore, the model trained with panoramic images also has a better capacity to resist the image distortion.
Brain magnetic resonance (MR) segmentation for hydrocephalus patients is considered as a challenging work. Encoding the variation of the brain anatomical structures from different individuals cannot be easily achieved. The task becomes even more difficult especially when the image data from hydrocephalus patients are considered, which often have large deformations and differ significantly from the normal subjects. Here, we propose a novel strategy with hard and soft attention modules to solve the segmentation problems for hydrocephalus MR images. Our main contributions are three-fold: 1) the hard-attention module generates coarse segmentation map using multi-atlas-based method and the VoxelMorph tool, which guides subsequent segmentation process and improves its robustness; 2) the soft-attention module incorporates position attention to capture precise context information, which further improves the segmentation accuracy; 3) we validate our method by segmenting insula, thalamus and many other regions-of-interests (ROIs) that are critical to quantify brain MR images of hydrocephalus patients in real clinical scenario. The proposed method achieves much improved robustness and accuracy when segmenting all 17 consciousness-related ROIs with high variations for different subjects. To the best of our knowledge, this is the first work to employ deep learning for solving the brain segmentation problems of hydrocephalus patients.
Objective: Deformable brain MR image registration is challenging due to large inter-subject anatomical variation. For example, the highly complex cortical folding pattern makes it hard to accurately align corresponding cortical structures of individual images. In this paper, we propose a novel deep learning way to simplify the difficult registration problem of brain MR images. Methods: We train a morphological simplification network (MS-Net), which can generate a "simple" image with less anatomical details based on the "complex" input. With MS-Net, the complexity of the fixed image or the moving image under registration can be reduced gradually, thus building an individual (simplification) trajectory represented by MS-Net outputs. Since the generated images at the ends of the two trajectories (of the fixed and moving images) are so simple and very similar in appearance, they are easy to register. Thus, the two trajectories can act as a bridge to link the fixed and the moving images, and guide their registration. Results: Our experiments show that the proposed method can achieve highly accurate registration performance on different datasets (i.e., NIREP, LPBA, IBSR, CUMC, and MGH). Moreover, the method can be also easily transferred across diverse image datasets and obtain superior accuracy on surface alignment. Conclusion and Significance: We propose MS-Net as a powerful and flexible tool to simplify brain MR images and their registration. To our knowledge, this is the first work to simplify brain MR image registration by deep learning, instead of estimating deformation field directly.
Taxonomies are of great value to many knowledge-rich applications. As the manual taxonomy curation costs enormous human effects, automatic taxonomy construction is in great demand. However, most existing automatic taxonomy construction methods can only build hypernymy taxonomies wherein each edge is limited to expressing the "is-a" relation. Such a restriction limits their applicability to more diverse real-world tasks where the parent-child may carry different relations. In this paper, we aim to construct a task-guided taxonomy from a domain-specific corpus and allow users to input a "seed" taxonomy, serving as the task guidance. We propose an expansion-based taxonomy construction framework, namely HiExpan, which automatically generates key term list from the corpus and iteratively grows the seed taxonomy. Specifically, HiExpan views all children under each taxonomy node forming a coherent set and builds the taxonomy by recursively expanding all these sets. Furthermore, HiExpan incorporates a weakly-supervised relation extraction module to extract the initial children of a newly-expanded node and adjusts the taxonomy tree by optimizing its global structure. Our experiments on three real datasets from different domains demonstrate the effectiveness of HiExpan for building task-guided taxonomies.
Visual localization is an attractive problem that estimates the camera localization from database images based on the query image. It is a crucial task for various applications, such as autonomous vehicles, assistive navigation and augmented reality. The challenging issues of the task lie in various appearance variations between query and database images, including illumination variations, season variations, dynamic object variations and viewpoint variations. In order to tackle those challenges, Panoramic Annular Localizer into which panoramic annular lens and robust deep image descriptors are incorporated is proposed in this paper. The panoramic annular images captured by the single camera are processed and fed into the NetVLAD network to form the active deep descriptor, and sequential matching is utilized to generate the localization result. The experiments carried on the public datasets and in the field illustrate the validation of the proposed system.
Thermal ablation is a minimally invasive procedure for treat-ing small or unresectable tumors. Although CT is widely used for guiding ablation procedures, the contrast of tumors against surrounding normal tissues in CT images is often poor, aggravating the difficulty in accurate thermal ablation. In this paper, we propose a fast MR-CT image registration method to overlay a pre-procedural MR (pMR) image onto an intra-procedural CT (iCT) image for guiding the thermal ablation of liver tumors. By first using a Cycle-GAN model with mutual information constraint to generate synthesized CT (sCT) image from the cor-responding pMR, pre-procedural MR-CT image registration is carried out through traditional mono-modality CT-CT image registration. At the intra-procedural stage, a partial-convolution-based network is first used to inpaint the probe and its artifacts in the iCT image. Then, an unsupervised registration network is used to efficiently align the pre-procedural CT (pCT) with the inpainted iCT (inpCT) image. The final transformation from pMR to iCT is obtained by combining the two estimated transformations,i.e., (1) from the pMR image space to the pCT image space (through sCT) and (2) from the pCT image space to the iCT image space (through inpCT). Experimental results confirm that the proposed method achieves high registration accuracy with a very fast computational speed.