In this paper, we present GASG21 (Grassmannian Adaptive Stochastic Gradient for $L_{2,1}$ norm minimization), an adaptive stochastic gradient algorithm to robustly recover the low-rank subspace from a large matrix. In the presence of column outliers, we reformulate the batch mode matrix $L_{2,1}$ norm minimization with rank constraint problem as a stochastic optimization approach constrained on Grassmann manifold. For each observed data vector, the low-rank subspace $\mathcal{S}$ is updated by taking a gradient step along the geodesic of Grassmannian. In order to accelerate the convergence rate of the stochastic gradient method, we choose to adaptively tune the constant step-size by leveraging the consecutive gradients. Furthermore, we demonstrate that with proper initialization, the K-subspaces extension, K-GASG21, can robustly cluster a large number of corrupted data vectors into a union of subspaces. Numerical experiments on synthetic and real data demonstrate the efficiency and accuracy of the proposed algorithms even with heavy column outliers corruption.

* 13 pages, 12 figures and 6 tables
Click to Read Paper
Due to that the existing traffic facilities can hardly be extended, developing traffic signal control methods is the most important way to improve the traffic efficiency of modern roundabouts. This paper proposes a novel traffic signal controller with two fuzzy layers for signalizing the roundabout. The outer layer of the controller computes urgency degrees of all the phase subsets and then activates the most urgent subset. This mechanism helps to instantly respond to the current traffic condition of the roundabout so as to improve real-timeness. The inner layer of the controller computes extension time of the current phase. If the extension value is larger than a threshold value, the current phase is maintained; otherwise the next phase in the running phase subset (selected by the outer layer) is activated. The inner layer adopts well-designed phase sequences, which helps to smooth the traffic flows and to avoid traffic jam. In general, the proposed traffic signal controller is capable of improving real-timeness as well as reducing traffic congestion. Moreover, an offline particle swarm optimization (PSO) algorithm is developed to optimize the membership functions adopted in the proposed controller. By using optimal membership functions, the performance of the controller can be further improved. Simulation results demonstrate that the proposed controller outperforms previous traffic signal controllers in terms of improving the traffic efficiency of modern roundabouts.

Click to Read Paper
Cluster analysis and outlier detection are strongly coupled tasks in data mining area. Cluster structure can be easily destroyed by few outliers; on the contrary, the outliers are defined by the concept of cluster, which are recognized as the points belonging to none of the clusters. However, most existing studies handle them separately. In light of this, we consider the joint cluster analysis and outlier detection problem, and propose the Clustering with Outlier Removal (COR) algorithm. Generally speaking, the original space is transformed into the binary space via generating basic partitions in order to define clusters. Then an objective function based Holoentropy is designed to enhance the compactness of each cluster with a few outliers removed. With further analyses on the objective function, only partial of the problem can be handled by K-means optimization. To provide an integrated solution, an auxiliary binary matrix is nontrivally introduced so that COR completely and efficiently solves the challenging problem via a unified K-means- - with theoretical supports. Extensive experimental results on numerous data sets in various domains demonstrate the effectiveness and efficiency of COR significantly over the rivals including K-means- - and other state-of-the-art outlier detection methods in terms of cluster validity and outlier detection. Some key factors in COR are further analyzed for practical use. Finally, an application on flight trajectory is provided to demonstrate the effectiveness of COR in the real-world scenario.

Click to Read Paper
Multiview learning problem refers to the problem of learning a classifier from multiple view data. In this data set, each data points is presented by multiple different views. In this paper, we propose a novel method for this problem. This method is based on two assumptions. The first assumption is that each data point has an intact feature vector, and each view is obtained by a linear transformation from the intact vector. The second assumption is that the intact vectors are discriminative, and in the intact space, we have a linear classifier to separate the positive class from the negative class. We define an intact vector for each data point, and a view-conditional transformation matrix for each view, and propose to reconstruct the multiple view feature vectors by the product of the corresponding intact vectors and transformation matrices. Moreover, we also propose a linear classifier in the intact space, and learn it jointly with the intact vectors. The learning problem is modeled by a minimization problem, and the objective function is composed of a Cauchy error estimator-based view-conditional reconstruction term over all data points and views, and a classification error term measured by hinge loss over all the intact vectors of all the data points. Some regularization terms are also imposed to different variables in the objective function. The minimization problem is solve by an iterative algorithm using alternate optimization strategy and gradient descent algorithm. The proposed algorithm shows it advantage in the compression to other multiview learning algorithms on benchmark data sets.

Click to Read Paper
Stochastic gradient methods are dominant in nonconvex optimization especially for deep models but have low asymptotical convergence due to the fixed smoothness. To address this problem, we propose a simple yet effective method for improving stochastic gradient methods named predictive local smoothness (PLS). First, we create a convergence condition to build a learning rate which varies adaptively with local smoothness. Second, the local smoothness can be predicted by the latest gradients. Third, we use the adaptive learning rate to update the stochastic gradients for exploring linear convergence rates. By applying the PLS method, we implement new variants of three popular algorithms: PLS-stochastic gradient descent (PLS-SGD), PLS-accelerated SGD (PLS-AccSGD), and PLS-AMSGrad. Moreover, we provide much simpler proofs to ensure their linear convergence. Empirical results show that the variants have better performance gains than the popular algorithms, such as, faster convergence and alleviating explosion and vanish of gradients.

* 14 pages, 7 figures
Click to Read Paper
Recently, deep convolutional neural networks (CNNs) have obtained promising results in image processing tasks including super-resolution (SR). However, most CNN-based SR methods treat low-resolution (LR) inputs and features equally across channels, rarely notice the loss of information flow caused by the activation function and fail to leverage the representation ability of CNNs. In this letter, we propose a novel single-image super-resolution (SISR) algorithm named Wider Channel Attention Network (WCAN) for remote sensing images. Firstly, the channel attention mechanism is used to adaptively recalibrate the importance of each channel at the middle of the wider attention block (WAB). Secondly, we propose the Local Memory Connection (LMC) to enhance the information flow. Finally, the features within each WAB are fused to take advantage of the network's representation capability and further improve information and gradient flow. Analytic experiments on a public remote sensing data set (UC Merced) show that our WCAN achieves better accuracy and visual improvements against most state-of-the-art methods.

* This work is proposed for remote sensing images, but the idea of the whole paper do not foucs on the characteristics of remote sensing images. The content of the article does not match the title. In this case, we want to do some experiments on the natural images to verify the three tricks in our work
Click to Read Paper
In this paper, we propose a novel method for highly efficient follicular segmentation of thyroid cytopathological WSIs. Firstly, we propose a hybrid segmentation architecture, which integrates a classifier into Deeplab V3 by adding a branch. A large amount of the WSI segmentation time is saved by skipping the irrelevant areas using the classification branch. Secondly, we merge the low scale fine features into the original atrous spatial pyramid pooling (ASPP) in Deeplab V3 to accurately represent the details in cytopathological images. Thirdly, our hybrid model is trained by a criterion-oriented adaptive loss function, which leads the model converging much faster. Experimental results on a collection of thyroid patches demonstrate that the proposed model reaches 80.9% on the segmentation accuracy. Besides, 93% time is reduced for the WSI segmentation by using our proposed method, and the WSI-level accuracy achieves 53.4%.

Click to Read Paper
Bronchoscopy inspection as a follow-up procedure from the radiological imaging plays a key role in lung disease diagnosis and determining treatment plans for the patients. Doctors needs to make a decision whether to biopsy the patients timely when performing bronchoscopy. However, the doctors also needs to be very selective with biopsies as biopsies may cause uncontrollable bleeding of the lung tissue which is life-threaten. To help doctors to be more selective on biopsies and provide a second opinion on diagnosis, in this work, we propose a computer-aided diagnosis (CAD) system for lung diseases including cancers and tuberculosis (TB). The system is developed based on transfer learning. We propose a novel transfer learning method: sentential fine-tuning . Compared to traditional fine-tuning methods, our methods achieves the best performance. We obtained a overall accuracy of 77.0% a dataset of 81 normal cases, 76 tuberculosis cases and 277 lung cancer cases while the other traditional transfer learning methods achieve an accuracy of 73% and 68%. . The detection accuracy of our method for cancers, TB and normal cases are 87%, 54% and 91% respectively. This indicates that the CAD system has potential to improve lung disease diagnosis accuracy in bronchoscopy and it also might be used to be more selective with biopsies.

Click to Read Paper
Future works in scientific articles are valuable for researchers and they can guide researchers to new research directions or ideas. In this paper, we mine the future works in scientific articles in order to 1) provide an insight for future work analysis and 2) facilitate researchers to search and browse future works in a research area. First, we study the problem of future work extraction and propose a regular expression based method to address the problem. Second, we define four different categories for the future works by observing the data and investigate the multi-class future work classification problem. Third, we apply the extraction method and the classification model to a paper dataset in the computer science field and conduct a further analysis of the future works. Finally, we design a prototype system to search and demonstrate the future works mined from the scientific papers. Our evaluation results show that our extraction method can get high precision and recall values and our classification model can also get good results and it outperforms several baseline models. Further analysis of the future work sentences also indicates interesting results.

Click to Read Paper
Predicting the popularity of online videos is important for video streaming content providers. This is a challenging problem because of the following two reasons. First, the problem is both "wide" and "deep". That is, it not only depends on a wide range of features, but also be highly non-linear and complex. Second, multiple competitors may be involved. In this paper, we propose a general prediction model using the multi-task learning (MTL) module and the relation network (RN) module, where MTL can reduce over-fitting and RN can model the relations of multiple competitors. Experimental results show that our proposed approach significantly increases the accuracy on predicting the total view counts of TV series with RN and MTL modules.

Click to Read Paper
This paper presents a generalized integrated framework of semi-automatic surgical template design. Several algorithms were implemented including the mesh segmentation, offset surface generation, collision detection, ruled surface generation, etc., and a special software named TemDesigner was developed. With a simple user interface, a customized template can be semi- automatically designed according to the preoperative plan. Firstly, mesh segmentation with signed scalar of vertex is utilized to partition the inner surface from the input surface mesh based on the indicated point loop. Then, the offset surface of the inner surface is obtained through contouring the distance field of the inner surface, and segmented to generate the outer surface. Ruled surface is employed to connect inner and outer surfaces. Finally, drilling tubes are generated according to the preoperative plan through collision detection and merging. It has been applied to the template design for various kinds of surgeries, including oral implantology, cervical pedicle screw insertion, iliosacral screw insertion and osteotomy, demonstrating the efficiency, functionality and generality of our method.

* Scientific Reports 6, Article number: 20280, 2016
* 18 pages, 16 figures, 2 tables, 36 references
Click to Read Paper
The cloud radio access network (C-RAN) is a promising paradigm to meet the stringent requirements of the fifth generation (5G) wireless systems. Meanwhile, wireless traffic prediction is a key enabler for C-RANs to improve both the spectrum efficiency and energy efficiency through load-aware network managements. This paper proposes a scalable Gaussian process (GP) framework as a promising solution to achieve large-scale wireless traffic prediction in a cost-efficient manner. Our contribution is three-fold. First, to the best of our knowledge, this paper is the first to empower GP regression with the alternating direction method of multipliers (ADMM) for parallel hyper-parameter optimization in the training phase, where such a scalable training framework well balances the local estimation in baseband units (BBUs) and information consensus among BBUs in a principled way for large-scale executions. Second, in the prediction phase, we fuse local predictions obtained from the BBUs via a cross-validation based optimal strategy, which demonstrates itself to be reliable and robust for general regression tasks. Moreover, such a cross-validation based optimal fusion strategy is built upon a well acknowledged probabilistic model to retain the valuable closed-form GP inference properties. Third, we propose a C-RAN based scalable wireless prediction architecture, where the prediction accuracy and the time consumption can be balanced by tuning the number of the BBUs according to the real-time system demands. Experimental results show that our proposed scalable GP model can outperform the state-of-the-art approaches considerably, in terms of wireless traffic prediction performance.

Click to Read Paper
This paper proposes a novel object detection framework named Grid R-CNN, which adopts a grid guided localization mechanism for accurate object detection. Different from the traditional regression based methods, the Grid R-CNN captures the spatial information explicitly and enjoys the position sensitive property of fully convolutional architecture. Instead of using only two independent points, we design a multi-point supervision formulation to encode more clues in order to reduce the impact of inaccurate prediction of specific points. To take the full advantage of the correlation of points in a grid, we propose a two-stage information fusion strategy to fuse feature maps of neighbor grid points. The grid guided localization approach is easy to be extended to different state-of-the-art detection frameworks. Grid R-CNN leads to high quality object localization, and experiments demonstrate that it achieves a 4.1% AP gain at IoU=0.8 and a 10.0% AP gain at IoU=0.9 on COCO benchmark compared to Faster R-CNN with Res50 backbone and FPN architecture.

Click to Read Paper
In this paper, we present a hierarchical path planning framework called SG-RL (subgoal graphs-reinforcement learning), to plan rational paths for agents maneuvering in continuous and uncertain environments. By "rational", we mean (1) efficient path planning to eliminate first-move lags; (2) collision-free and smooth for agents with kinematic constraints satisfied. SG-RL works in a two-level manner. At the first level, SG-RL uses a geometric path-planning method, i.e., Simple Subgoal Graphs (SSG), to efficiently find optimal abstract paths, also called subgoal sequences. At the second level, SG-RL uses an RL method, i.e., Least-Squares Policy Iteration (LSPI), to learn near-optimal motion-planning policies which can generate kinematically feasible and collision-free trajectories between adjacent subgoals. The first advantage of the proposed method is that SSG can solve the limitations of sparse reward and local minima trap for RL agents; thus, LSPI can be used to generate paths in complex environments. The second advantage is that, when the environment changes slightly (i.e., unexpected obstacles appearing), SG-RL does not need to reconstruct subgoal graphs and replan subgoal sequences using SSG, since LSPI can deal with uncertainties by exploiting its generalization ability to handle changes in environments. Simulation experiments in representative scenarios demonstrate that, compared with existing methods, SG-RL can work well on large-scale maps with relatively low action-switching frequencies and shorter path lengths, and SG-RL can deal with small changes in environments. We further demonstrate that the design of reward functions and the types of training environments are important factors for learning feasible policies.

* 20 pages
Click to Read Paper
While the volume of scholarly publications has increased at a frenetic pace, accessing and consuming the useful candidate papers, in very large digital libraries, is becoming an essential and challenging task for scholars. Unfortunately, because of language barrier, some scientists (especially the junior ones or graduate students who do not master other languages) cannot efficiently locate the publications hosted in a foreign language repository. In this study, we propose a novel solution, cross-language citation recommendation via Hierarchical Representation Learning on Heterogeneous Graph (HRLHG), to address this new problem. HRLHG can learn a representation function by mapping the publications, from multilingual repositories, to a low-dimensional joint embedding space from various kinds of vertexes and relations on a heterogeneous graph. By leveraging both global (task specific) plus local (task independent) information as well as a novel supervised hierarchical random walk algorithm, the proposed method can optimize the publication representations by maximizing the likelihood of locating the important cross-language neighborhoods on the graph. Experiment results show that the proposed method can not only outperform state-of-the-art baseline models, but also improve the interpretability of the representation model for cross-language citation recommendation task.

* The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR 2018), 635--644
Click to Read Paper
The current trend of pushing CNNs deeper with convolutions has created a pressing demand to achieve higher compression gains on CNNs where convolutions dominate the computation and parameter amount (e.g., GoogLeNet, ResNet and Wide ResNet). Further, the high energy consumption of convolutions limits its deployment on mobile devices. To this end, we proposed a simple yet effective scheme for compressing convolutions though applying k-means clustering on the weights, compression is achieved through weight-sharing, by only recording $K$ cluster centers and weight assignment indexes. We then introduced a novel spectrally relaxed $k$-means regularization, which tends to make hard assignments of convolutional layer weights to $K$ learned cluster centers during re-training. We additionally propose an improved set of metrics to estimate energy consumption of CNN hardware implementations, whose estimation results are verified to be consistent with previously proposed energy estimation tool extrapolated from actual hardware measurements. We finally evaluated Deep $k$-Means across several CNN models in terms of both compression ratio and energy consumption reduction, observing promising results without incurring accuracy loss. The code is available at https://github.com/Sandbox3aster/Deep-K-Means

* Accepted by ICML 2018
Click to Read Paper
With the development of depth cameras such as Kinect and Intel Realsense, RGB-D based human detection receives continuous research attention due to its usage in a variety of applications. In this paper, we propose a new Multi-Glimpse LSTM (MG-LSTM) network, in which multi-scale contextual information is sequentially integrated to promote the human detection performance. Furthermore, we propose a feature fusion strategy based on our MG-LSTM network to better incorporate the RGB and depth information. To the best of our knowledge, this is the first attempt to utilize LSTM structure for RGB-D based human detection. Our method achieves superior performance on two publicly available datasets.

* ICIP 2017 Oral
Click to Read Paper
Most of the existing medicine recommendation systems that are mainly based on electronic medical records (EMRs) are significantly assisting doctors to make better clinical decisions benefiting both patients and caregivers. Even though the growth of EMRs is at a lighting fast speed in the era of big data, content limitations in EMRs restrain the existed recommendation systems to reflect relevant medical facts, such as drug-drug interactions. Many medical knowledge graphs that contain drug-related information, such as DrugBank, may give hope for the recommendation systems. However, the direct use of these knowledge graphs in the systems suffers from robustness caused by the incompleteness of the graphs. To address these challenges, we stand on recent advances in graph embedding learning techniques and propose a novel framework, called Safe Medicine Recommendation (SMR), in this paper. Specifically, SMR first constructs a high-quality heterogeneous graph by bridging EMRs (MIMIC-III) and medical knowledge graphs (ICD-9 ontology and DrugBank). Then, SMR jointly embeds diseases, medicines, patients, and their corresponding relations into a shared lower dimensional space. Finally, SMR uses the embeddings to decompose the medicine recommendation into a link prediction process while considering the patient's diagnoses and adverse drug reactions. To our best knowledge, SMR is the first to learn embeddings of a patient-disease-medicine graph for medicine recommendation in the world. Extensive experiments on real datasets are conducted to evaluate the effectiveness of proposed framework.

* 8 pages, 3 figures, 5 tables
Click to Read Paper
Detecting activities in untrimmed videos is an important but challenging task. The performance of existing methods remains unsatisfactory, e.g., they often meet difficulties in locating the beginning and end of a long complex action. In this paper, we propose a generic framework that can accurately detect a wide variety of activities from untrimmed videos. Our first contribution is a novel proposal scheme that can efficiently generate candidates with accurate temporal boundaries. The other contribution is a cascaded classification pipeline that explicitly distinguishes between relevance and completeness of a candidate instance. On two challenging temporal activity detection datasets, THUMOS14 and ActivityNet, the proposed framework significantly outperforms the existing state-of-the-art methods, demonstrating superior accuracy and strong adaptivity in handling activities with various temporal structures.

Click to Read Paper