Discriminating targets moving against a cluttered background is a huge challenge, let alone detecting a target as small as one or a few pixels and tracking it in flight. In the fly's visual system, a class of specific neurons, called small target motion detectors (STMDs), have been identified as showing exquisite selectivity for small target motion. Some of the STMDs have also demonstrated directional selectivity which means these STMDs respond strongly only to their preferred motion direction. Directional selectivity is an important property of these STMD neurons which could contribute to tracking small targets such as mates in flight. However, little has been done on systematically modeling these directional selective STMD neurons. In this paper, we propose a directional selective STMD-based neural network (DSTMD) for small target detection in a cluttered background. In the proposed neural network, a new correlation mechanism is introduced for direction selectivity via correlating signals relayed from two pixels. Then, a lateral inhibition mechanism is implemented on the spatial field for size selectivity of STMD neurons. Extensive experiments showed that the proposed neural network not only is in accord with current biological findings, i.e. showing directional preferences, but also worked reliably in detecting small targets against cluttered backgrounds. Click to Read Paper
Small target motion detection is critical for insects to search for and track mates or prey which always appear as small dim speckles in the visual field. A class of specific neurons, called small target motion detectors (STMDs), has been characterized by exquisite sensitivity for small target motion. Understanding and analyzing visual pathway of STMD neurons are beneficial to design artificial visual systems for small target motion detection. Feedback loops have been widely identified in visual neural circuits and play an important role in target detection. However, if there exists a feedback loop in the STMD visual pathway or if a feedback loop could significantly improve the detection performance of STMD neurons, is unclear. In this paper, we propose a feedback neural network for small target motion detection against naturally cluttered backgrounds. In order to form a feedback loop, model output is temporally delayed and relayed to previous neural layer as feedback signal. Extensive experiments showed that the significant improvement of the proposed feedback neural network over the existing STMD-based models for small target motion detection. Click to Read Paper
A class of specialized neurons, called lobula plate tangential cells (LPTCs) has been shown to respond strongly to wide-field motion. The classic model, elementary motion detector (EMD) and its improved model, two-quadrant detector (TQD) have been proposed to simulate LPTCs. Although EMD and TQD can percept background motion, their outputs are so cluttered that it is difficult to discriminate actual motion direction of the background. In this paper, we propose a max operation mechanism to model a newly-found transmedullary neuron Tm9 whose physiological properties do not map onto EMD and TQD. This proposed max operation mechanism is able to improve the detection performance of TQD in cluttered background by filtering out irrelevant motion signals. We will demonstrate the functionality of this proposed mechanism in wide-field motion perception. Click to Read Paper
We propose a novel method to accelerate Lloyd's algorithm for K-Means clustering. Unlike previous acceleration approaches that reduce computational cost per iterations or improve initialization, our approach is focused on reducing the number of iterations required for convergence. This is achieved by treating the assignment step and the update step of Lloyd's algorithm as a fixed-point iteration, and applying Anderson acceleration, a well-established technique for accelerating fixed-point solvers. Classical Anderson acceleration utilizes m previous iterates to find an accelerated iterate, and its performance on K-Means clustering can be sensitive to choice of m and the distribution of samples. We propose a new strategy to dynamically adjust the value of m, which achieves robust and consistent speedups across different problem instances. Our method complements existing acceleration techniques, and can be combined with them to achieve state-of-the-art performance. We perform extensive experiments to evaluate the performance of the proposed method, where it outperforms other algorithms in 106 out of 120 test cases, and the mean decrease ratio of computational time is more than 33%. Click to Read Paper
This paper addresses the task of AMR-to-text generation by leveraging synchronous node replacement grammar. During training, graph-to-string rules are learned using a heuristic extraction algorithm. At test time, a graph transducer is applied to collapse input AMRs and generate output sentences. Evaluated on SemEval-2016 Task 8, our method gives a BLEU score of 25.62, which is the best reported so far. Click to Read Paper
The task of AMR-to-text generation is to generate grammatical text that sustains the semantic meaning for a given AMR graph. We at- tack the task by first partitioning the AMR graph into smaller fragments, and then generating the translation for each fragment, before finally deciding the order by solving an asymmetric generalized traveling salesman problem (AGTSP). A Maximum Entropy classifier is trained to estimate the traveling costs, and a TSP solver is used to find the optimized solution. The final model reports a BLEU score of 22.44 on the SemEval-2016 Task8 dataset. Click to Read Paper
In this paper, we present a subclass-representation approach that predicts the probability of a social image belonging to one particular class. We explore the co-occurrence of user-contributed tags to find subclasses with a strong connection to the top level class. We then project each image on to the resulting subclass space to generate a subclass representation for the image. The novelty of the approach is that subclass representations make use of not only the content of the photos themselves, but also information on the co-occurrence of their tags, which determines membership in both subclasses and top-level classes. The novelty is also that the images are classified into smaller classes, which have a chance of being more visually stable and easier to model. These subclasses are used as a latent space and images are represented in this space by their probability of relatedness to all of the subclasses. In contrast to approaches directly modeling each top-level class based on the image content, the proposed method can exploit more information for visually diverse classes. The approach is evaluated on a set of $2$ million photos with 10 classes, released by the Multimedia 2013 Yahoo! Large-scale Flickr-tag Image Classification Grand Challenge. Experiments show that the proposed system delivers sound performance for visually diverse classes compared with methods that directly model top classes. Click to Read Paper
Low-light is an inescapable element of our daily surroundings that greatly affects the efficiency of our vision. Research works on low-light has seen a steady growth, particularly in the field of image enhancement, but there is still a lack of a go-to database as benchmark. Besides, research fields that may assist us in low-light environments, such as object detection, has glossed over this aspect even though breakthroughs-after-breakthroughs had been achieved in recent years, most noticeably from the lack of low-light data (less than 2% of the total images) in successful public benchmark dataset such as PASCAL VOC, ImageNet, and Microsoft COCO. Thus, we propose the Exclusively Dark dataset to elevate this data drought, consisting exclusively of ten different types of low-light images (i.e. low, ambient, object, single, weak, strong, screen, window, shadow and twilight) captured in visible light only with image and object level annotations. Moreover, we share insightful findings in regards to the effects of low-light on the object detection task by analyzing visualizations of both hand-crafted and learned features. Most importantly, we found that the effects of low-light reaches far deeper into the features than can be solved by simple "illumination invariance'". It is our hope that this analysis and the Exclusively Dark dataset can encourage the growth in low-light domain researches on different fields. The Exclusively Dark dataset with its annotation is available at https://github.com/cs-chan/Exclusively-Dark-Image-Dataset Click to Read Paper
In current visual object tracking system, the CPU or GPU-based visual object tracking systems have high computational cost and consume a prohibitive amount of power. Therefore, in this paper, to reduce the computational burden of the Camshift algorithm, we propose a novel visual object tracking algorithm by exploiting the properties of the binary classifier and Kalman predictor. Moreover, we present a low-cost FPGA-based real-time object tracking hardware architecture. Extensive evaluations on OTB benchmark demonstrate that the proposed system has extremely compelling real-time, stability and robustness. The evaluation results show that the accuracy of our algorithm is about 48%, and the average speed is about 309 frames per second. Click to Read Paper
Time series prediction is of great significance in many applications and has attracted extensive attention from the data mining community. Existing work suggests that for many problems, the shape in the current time series may correlate an upcoming shape in the same or another series. Therefore, it is a promising strategy to associate two recurring patterns as a rule's antecedent and consequent: the occurrence of the antecedent can foretell the occurrence of the consequent, and the learned shape of consequent will give accurate predictions. Earlier work employs symbolization methods, but the symbolized representation maintains too little information of the original series to mine valid rules. The state-of-the-art work, though directly manipulating the series, fails to segment the series precisely for seeking antecedents/consequents, resulting in inaccurate rules in common scenarios. In this paper, we propose a novel motif-based rule discovery method, which utilizes motif discovery to accurately extract frequently occurring consecutive subsequences, i.e. motifs, as antecedents/consequents. It then investigates the underlying relationships between motifs by matching motifs as rule candidates and ranking them based on the similarities. Experimental results on real open datasets show that the proposed approach outperforms the baseline method by 23.9%. Furthermore, it extends the applicability from single time series to multiple ones. Click to Read Paper
In this paper, we propose PointSeg, a real-time end-to-end semantic segmentation method for road-objects based on spherical images. We take the spherical image, which is transformed from the 3D LiDAR point clouds, as input of the convolutional neural networks (CNNs) to predict the point-wise semantic map. To make PointSeg applicable on a mobile system, we build the model based on the light-weight network, SqueezeNet, with several improvements. It maintains a good balance between memory cost and prediction performance. Our model is trained on spherical images and label masks projected from the KITTI 3D object detection dataset. Experiments show that PointSeg can achieve competitive accuracy with 90fps on a single GPU 1080ti. which makes it quite compatible for autonomous driving applications. Click to Read Paper
Recently, the use of digital images in various fields is increasing rapidly. To increase the number of images stored and get faster transmission of them, it is necessary to reduce the size of these images. Single bitmap block truncation coding (SBBTC) schemes are compression techniques, which are used to generate a common bitmap to quantize the R, G and B planes in color image. As one of the traditional SBBTC schemes, weighted plane (W-plane) method is famous for its simplicity and low time consumption. However, the W-plane method also has poor performance in visual quality. This paper proposes an improved SBBTC scheme based on W-plane method using parallel computing and hill climbing algorithm. Compared with various schemes, the simulation results of the proposed scheme are better than that of the reference schemes in visual quality and time consumption. Click to Read Paper
Person Re-Identification (re-id) is a challenging task in computer vision, especially when there are limited training data from multiple camera views. In this paper, we pro- pose a deep learning based person re-identification method by transferring knowledge of mid-level attribute features and high-level classification features. Building on the idea that identity classification, attribute recognition and re- identification share the same mid-level semantic representations, they can be trained sequentially by fine-tuning one based on another. In our framework, we train identity classification and attribute recognition tasks from deep Convolutional Neural Network (dCNN) to learn person information. The information can be transferred to the person re-id task and improves its accuracy by a large margin. Further- more, a Long Short Term Memory(LSTM) based Recurrent Neural Network (RNN) component is extended by a spacial gate. This component is used in the re-id model to pay attention to certain spacial parts in each recurrent unit. Experimental results show that our method achieves 78.3% of rank-1 recognition accuracy on the CUHK03 benchmark. Click to Read Paper
Deep learning refers to the shining branch of machine learning that is based on learning levels of representations. Convolutional Neural Networks (CNN) is one kind of deep neural network. It can study concurrently. In this article, we gave a detailed analysis of the process of CNN algorithm both the forward process and back propagation. Then we applied the particular convolutional neural network to implement the typical face recognition problem by java. Then, a parallel strategy was proposed in section4. In addition, by measuring the actual time of forward and backward computing, we analysed the maximal speed up and parallel efficiency theoretically. Click to Read Paper
In density estimation task, maximum entropy model (Maxent) can effectively use reliable prior information via certain constraints, i.e., linear constraints without empirical parameters. However, reliable prior information is often insufficient, and the selection of uncertain constraints becomes necessary but poses considerable implementation complexity. Improper setting of uncertain constraints can result in overfitting or underfitting. To solve this problem, a generalization of Maxent, under Tsallis entropy framework, is proposed. The proposed method introduces a convex quadratic constraint for the correction of (expected) Tsallis entropy bias (TEB). Specifically, we demonstrate that the expected Tsallis entropy of sampling distributions is smaller than the Tsallis entropy of the underlying real distribution. This expected entropy reduction is exactly the (expected) TEB, which can be expressed by a closed-form formula and act as a consistent and unbiased correction. TEB indicates that the entropy of a specific sampling distribution should be increased accordingly. This entails a quantitative re-interpretation of the Maxent principle. By compensating TEB and meanwhile forcing the resulting distribution to be close to the sampling distribution, our generalized TEBC Maxent can be expected to alleviate the overfitting and underfitting. We also present a connection between TEB and Lidstone estimator. As a result, TEB-Lidstone estimator is developed by analytically identifying the rate of probability correction in Lidstone. Extensive empirical evaluation shows promising performance of both TEBC Maxent and TEB-Lidstone in comparison with various state-of-the-art density estimation methods. Click to Read Paper
Despite the remarkable success, deep learning models have shown to be vulnerable to the universal adversarial perturbation (UAP). The existing endeavors on UAP methods mainly focus on attacking the image classification models. Nevertheless, little attention has been paid to attacking image retrieval systems. In this paper, we make the first attempt for UAP attacking to deep feature based image retrieval. Concretely, attacking image retrieval is to make the retrieval system return more irrelevant images to the query at the top ranking list, whose key design is to corrupt the relationships among features. To this end, we propose a unified method to generate retrieval-based UAP to break the relationships between image features from point-wise, label-wise, and list-wise aspects. We further analyze the impact of the resizing operation in generating UAP, and thus provide a solution to attack high-performance retrieval systems with query resizing. We evaluate the proposed methods on four widely-used image retrieval datasets, i.e., Oxford5k and Paris6k with their revised versions, which lead to a significant performance drop in terms of different metrics, such as mAP, and mP@10. Finally, we test our attacking methods on the real-world visual search engine, i.e., Google Images, which demonstrates the potential of our methods. Click to Read Paper
In psychology, theory-driven researches are usually conducted with extensive laboratory experiments, yet rarely tested or disproved with big data. In this paper, we make use of 418K travel photos with traveler ratings to test the influential "broaden-and-build" theory, that suggests positive emotions broaden one's visual attention. The core hypothesis examined in this study is that positive emotion is associated with a wider attention, hence highly-rated sites would trigger wide-angle photographs. By analyzing travel photos, we find a strong correlation between a preference for wide-angle photos and the high rating of tourist sites on TripAdvisor. We are able to carry out this analysis through the use of deep learning algorithms to classify the photos into wide and narrow angles, and present this study as an exemplar of how big data and deep learning can be used to test laboratory findings in the wild. Click to Read Paper
We present two architectures for multi-task learning with neural sequence models. Our approach allows the relationships between different tasks to be learned dynamically, rather than using an ad-hoc pre-defined structure as in previous work. We adopt the idea from message-passing graph neural networks and propose a general \textbf{graph multi-task learning} framework in which different tasks can communicate with each other in an effective and interpretable way. We conduct extensive experiments in text classification and sequence labeling to evaluate our approach on multi-task learning and transfer learning. The empirical results show that our models not only outperform competitive baselines but also learn interpretable and transferable patterns across tasks. Click to Read Paper
PCANet, as one noticeable shallow network, employs the histogram representation for feature pooling. However, there are three main problems about this kind of pooling method. First, the histogram-based pooling method binarizes the feature maps and leads to inevitable discriminative information loss. Second, it is difficult to effectively combine other visual cues into a compact representation, because the simple concatenation of various visual cues leads to feature representation inefficiency. Third, the dimensionality of histogram-based output grows exponentially with the number of feature maps used. In order to overcome these problems, we propose a novel shallow network model, named as PCANet-II. Compared with the histogram-based output, the second order pooling not only provides more discriminative information by preserving both the magnitude and sign of convolutional responses, but also dramatically reduces the size of output features. Thus we combine the second order statistical pooling method with the shallow network, i.e., PCANet. Moreover, it is easy to combine other discriminative and robust cues by using the second order pooling. So we introduce the binary feature difference encoding scheme into our PCANet-II to further improve robustness. Experiments demonstrate the effectiveness and robustness of our proposed PCANet-II method. Click to Read Paper
Joint extraction of entities and relations is an important task in information extraction. To tackle this problem, we firstly propose a novel tagging scheme that can convert the joint extraction task to a tagging problem. Then, based on our tagging scheme, we study different end-to-end models to extract entities and their relations directly, without identifying entities and relations separately. We conduct experiments on a public dataset produced by distant supervision method and the experimental results show that the tagging based methods are better than most of the existing pipelined and joint learning methods. What's more, the end-to-end model proposed in this paper, achieves the best results on the public dataset. Click to Read Paper