Models, code, and papers for "Hongwei Fan":
With the development of deep learning, the structure of convolution neural network is becoming more and more complex and the performance of object recognition is getting better. However, the classification mechanism of convolution neural networks is still an unsolved core problem. The main problem is that convolution neural networks have too many parameters, which makes it difficult to analyze them. In this paper, we design and train a convolution neural network based on the expression recognition, and explore the classification mechanism of the network. By using the Deconvolution visualization method, the extremum point of the convolution neural network is projected back to the pixel space of the original image, and we qualitatively verify that the trained expression recognition convolution neural network forms a detector for the specific facial action unit. At the same time, we design the distance function to measure the distance between the presence of facial feature unit and the maximal value of the response on the feature map of convolution neural network. The greater the distance, the more sensitive the feature map is to the facial feature unit. By comparing the maximum distance of all facial feature elements in the feature graph, the mapping relationship between facial feature element and convolution neural network feature map is determined. Therefore, we have verified that the convolution neural network has formed a detector for the facial Action unit in the training process to realize the expression recognition.
In this paper, we study automatic question generation, the task of creating questions from corresponding text passages where some certain spans of the text can serve as the answers. We propose an Extended Answer-aware Network (EAN) which is trained with Word-based Coverage Mechanism (WCM) and decodes with Uncertainty-aware Beam Search (UBS). The EAN represents the target answer by its surrounding sentence with an encoder, and incorporates the information of the extended answer into paragraph representation with gated paragraph-to-answer attention to tackle the problem of the inadequate representation of the target answer. To reduce undesirable repetition, the WCM penalizes repeatedly attending to the same words at different time-steps in the training stage. The UBS aims to seek a better balance between the model confidence in copying words from an input text paragraph and the confidence in generating words from a vocabulary. We conduct experiments on the SQuAD dataset, and the results show our approach achieves significant performance improvement.
Online hand gesture recognition (HGR) techniques are essential in augmented reality (AR) applications for enabling natural human-to-computer interaction and communication. In recent years, the consumer market for low-cost AR devices has been rapidly growing, while the technology maturity in this domain is still limited. Those devices are typical of low prices, limited memory, and resource-constrained computational units, which makes online HGR a challenging problem. To tackle this problem, we propose a lightweight and computationally efficient HGR framework, namely LE-HGR, to enable real-time gesture recognition on embedded devices with low computing power. We also show that the proposed method is of high accuracy and robustness, which is able to reach high-end performance in a variety of complicated interaction environments. To achieve our goal, we first propose a cascaded multi-task convolutional neural network (CNN) to simultaneously predict probabilities of hand detection and regress hand keypoint locations online. We show that, with the proposed cascaded architecture design, false-positive estimates can be largely eliminated. Additionally, an associated mapping approach is introduced to track the hand trace via the predicted locations, which addresses the interference of multi-handedness. Subsequently, we propose a trace sequence neural network (TraceSeqNN) to recognize the hand gesture by exploiting the motion features of the tracked trace. Finally, we provide a variety of experimental results to show that the proposed framework is able to achieve state-of-the-art accuracy with significantly reduced computational cost, which are the key properties for enabling real-time applications in low-cost commercial devices such as mobile devices and AR/VR headsets.
Due to computational and storage efficiencies of compact binary codes, hashing has been widely used for large-scale similarity search. Unfortunately, many existing hashing methods based on observed keyword features are not effective for short texts due to the sparseness and shortness. Recently, some researchers try to utilize latent topics of certain granularity to preserve semantic similarity in hash codes beyond keyword matching. However, topics of certain granularity are not adequate to represent the intrinsic semantic information. In this paper, we present a novel unified approach for short text Hashing using Multi-granularity Topics and Tags, dubbed HMTT. In particular, we propose a selection method to choose the optimal multi-granularity topics depending on the type of dataset, and design two distinct hashing strategies to incorporate multi-granularity topics. We also propose a simple and effective method to exploit tags to enhance the similarity of related texts. We carry out extensive experiments on one short text dataset as well as on one normal text dataset. The results demonstrate that our approach is effective and significantly outperforms baselines on several evaluation metrics.
With the rapid growth of knowledge, it shows a steady trend of knowledge fragmentization. Knowledge fragmentization manifests as that the knowledge related to a specific topic in a course is scattered in isolated and autonomous knowledge sources. We term the knowledge of a facet in a specific topic as a knowledge fragment. The problem of knowledge fragmentization brings two challenges: First, knowledge is scattered in various knowledge sources, which exerts users' considerable efforts to search for the knowledge of their interested topics, thereby leading to information overload. Second, learning dependencies which refer to the precedence relationships between topics in the learning process are concealed by the isolation and autonomy of knowledge sources, thus causing learning disorientation. To solve the knowledge fragmentization problem, we propose a novel knowledge organization model, knowledge forest, which consists of facet trees and learning dependencies. Facet trees can organize knowledge fragments with facet hyponymy to alleviate information overload. Learning dependencies can organize disordered topics to cope with learning disorientation. We conduct extensive experiments on three manually constructed datasets from the Data Structure, Data Mining, and Computer Network courses, and the experimental results show that knowledge forest can effectively organize knowledge fragments, and alleviate information overload and learning disorientation.
To promote the development of underwater robot picking in sea farms, we propose an underwater open-sea farm object detection dataset called UDD. Concretely, UDD consists of 3 categories (seacucumber, seaurchin, and scallop) with 2227 images. To the best of our knowledge, it's the first dataset collected in a real open-sea farm for underwater robot picking and we also propose a novel Poisson-blending-embedded Generative Adversarial Network (Poisson GAN) to overcome the class-imbalance and massive small objects issues in UDD. By utilizing Poisson GAN to change the number, position, even size of objects in UDD, we construct a large scale augmented dataset (AUDD) containing 18K images. Besides, in order to make the detector better adapted to the underwater picking environment, a dataset (Pre-trained dataset) for pre-training containing 590K images is also proposed. Finally, we design a lightweight network (UnderwaterNet) to address the problems that detecting small objects from cloudy underwater pictures and meeting the efficiency requirements in robots. Specifically, we design a depth-wise-convolution-based Multi-scale Contextual Features Fusion (MFF) block and a Multi-scale Blursampling (MBP) module to reduce the parameters of the network to 1.3M at 48FPS, without any loss on accuracy. Extensive experiments verify the effectiveness of the proposed UnderwaterNet, Poisson GAN, UDD, AUDD, and Pre-trained datasets.