Models, code, and papers for "Chengyao Chen":
To sustain engaging conversation, it is critical for chatbots to make good use of relevant knowledge. Equipped with a knowledge base, chatbots are able to extract conversation-related attributes and entities to facilitate context modeling and response generation. In this work, we distinguish the uses of attribute and entity and incorporate them into the encoder-decoder architecture in different manners. Based on the augmented architecture, our chatbot, namely Mike, is able to generate responses by referring to proper entities from the collected knowledge. To validate the proposed approach, we build a movie conversation corpus on which the proposed approach significantly outperforms other four knowledge-grounded models.
Panoramic video is a sort of video recorded at the same point of view to record the full scene. With the development of video surveillance and the requirement for 3D converged video surveillance in smart cities, CPU and GPU are required to possess strong processing abilities to make panoramic video. The traditional panoramic products depend on post processing, which results in high power consumption, low stability and unsatisfying performance in real time. In order to solve these problems,we propose a real-time panoramic video stitching framework.The framework we propose mainly consists of three algorithms, LORB image feature extraction algorithm, feature point matching algorithm based on LSH and GPU parallel video stitching algorithm based on CUDA.The experiment results show that the algorithm mentioned can improve the performance in the stages of feature extraction of images stitching and matching, the running speed of which is 11 times than that of the traditional ORB algorithm and 639 times than that of the traditional SIFT algorithm. Based on analyzing the GPU resources occupancy rate of each resolution image stitching, we further propose a stream parallel strategy to maximize the utilization of GPU resources. Compared with the L-ORB algorithm, the efficiency of this strategy is improved by 1.6-2.5 times, and it can make full use of GPU resources. The performance of the system accomplished in the paper is 29.2 times than that of the former embedded one, while the power dissipation is reduced to 10W.
The development of summarization research has been significantly hampered by the costly acquisition of reference summaries. This paper proposes an effective way to automatically collect large scales of news-related multi-document summaries with reference to social media's reactions. We utilize two types of social labels in tweets, i.e., hashtags and hyper-links. Hashtags are used to cluster documents into different topic sets. Also, a tweet with a hyper-link often highlights certain key points of the corresponding document. We synthesize a linked document cluster to form a reference summary which can cover most key points. To this aim, we adopt the ROUGE metrics to measure the coverage ratio, and develop an Integer Linear Programming solution to discover the sentence set reaching the upper bound of ROUGE. Since we allow summary sentences to be selected from both documents and high-quality tweets, the generated reference summaries could be abstractive. Both informativeness and readability of the collected summaries are verified by manual judgment. In addition, we train a Support Vector Regression summarizer on DUC generic multi-document summarization benchmarks. With the collected data as extra training resource, the performance of the summarizer improves a lot on all the test sets. We release this dataset for further research.
Wireless traffic prediction is a fundamental enabler to proactive network optimisation in 5G and beyond. Forecasting extreme demand spikes and troughs is essential to avoiding outages and improving energy efficiency. However, current forecasting methods predominantly focus on overall forecast performance and/or do not offer probabilistic uncertainty quantification. Here, we design a feature embedding (FE) kernel for a Gaussian Process (GP) model to forecast traffic demand. The FE kernel enables us to trade-off overall forecast accuracy against peak-trough accuracy. Using real 4G base station data, we compare its performance against both conventional GPs, ARIMA models, as well as demonstrate the uncertainty quantification output. The advantage over neural network (e.g. CNN, LSTM) models is that the probabilistic forecast uncertainty can directly feed into decision processes in self-organizing-network (SON) modules.
Unmanned aerial vehicles (UAVs) have increasingly been adopted for safety, security, and rescue missions, for which they need precise and reliable pose estimates relative to their environment. To ensure mission safety when relying on visual perception, it is essential to have an approach to assess the integrity of the visual localization solution. However, to the best of our knowledge, such an approach does not exist for optimization-based visual localization. Receiver autonomous integrity monitoring (RAIM) has been widely used in global navigation satellite systems (GNSS) applications such as automated aircraft landing. In this paper, we propose a novel approach inspired by RAIM to monitor the integrity of optimization-based visual localization and calculate the protection level of a state estimate, i.e. the largest possible translational error in each direction. We also propose a metric that quantitatively evaluates the performance of the error bounds. Finally, we validate the protection level using the EuRoC dataset and demonstrate that the proposed protection level provides a significantly more reliable bound than the commonly used $3\sigma$ method.
While previous researches in eye fixation prediction typically rely on integrating low-level features (e.g. color, edge) to form a saliency map, recently it has been found that the structural organization of these features into a proto-object representation can play a more significant role. In this work, we present a computational framework based on deep network to demonstrate that proto-object representations can be learned from low-resolution image patches from fixation regions. We advocate the use of low-resolution inputs in this work due to the following reasons: (1) Proto-objects are computed in parallel over an entire visual field (2) People can perceive or recognize objects well even it is in low resolution. (3) Fixations from lower resolution images can predict fixations on higher resolution images. In the proposed computational model, we extract multi-scale image patches on fixation regions from eye fixation datasets, resize them to low resolution and feed them into a hierarchical. With layer-wise unsupervised feature learning, we find that many proto-objects like features responsive to different shapes of object blobs are learned out. Visualizations also show that these features are selective to potential objects in the scene and the responses of these features work well in predicting eye fixations on the images when combined with learned weights.
Safe autonomous driving requires reliable 3D object detection-determining the 6 DoF pose and dimensions of objects of interest. Using stereo cameras to solve this task is a cost-effective alternative to the widely used LiDAR sensor. The current state-of-the-art for stereo 3D object detection takes the existing PSMNet stereo matching network, with no modifications, and converts the estimated disparities into a 3D point cloud, and feeds this point cloud into a LiDAR-based 3D object detector. The issue with existing stereo matching networks is that they are designed for disparity estimation, not 3D object detection; the shape and accuracy of object point clouds are not the focus. Stereo matching networks commonly suffer from inaccurate depth estimates at object boundaries, which we define as streaking, because background and foreground points are jointly estimated. Existing networks also penalize disparity instead of the estimated position of object point clouds in their loss functions. We propose a novel 2D box association and object-centric stereo matching method that only estimates the disparities of the objects of interest to address these two issues. Our method achieves state-of-the-art results on the KITTI 3D and BEV benchmarks.
This report summarises our method and validation results for the ISIC Challenge 2018 - Skin Lesion Analysis Towards Melanoma Detection - Task 1: Lesion Segmentation. We present a two-stage method for lesion segmentation with optimised training method and ensemble post-process. Our method achieves state-of-the-art performance on lesion segmentation and we win the first place in ISIC 2018 task1.