Models, code, and papers for "Kazuhiro Fukui":

In this paper, we propose a method for image-set classification based on convex cone models, focusing on the effectiveness of convolutional neural network (CNN) features as inputs. CNN features have non-negative values when using the rectified linear unit as an activation function. This naturally leads us to model a set of CNN features by a convex cone and measure the geometric similarity of convex cones for classification. To establish this framework, we sequentially define multiple angles between two convex cones by repeating the alternating least squares method and then define the geometric similarity between the cones using the obtained angles. Moreover, to enhance our method, we introduce a discriminant space, maximizing the between-class variance (gaps) and minimizes the within-class variance of the projected convex cones onto the discriminant space, similar to a Fisher discriminant analysis. Finally, classification is based on the similarity between projected convex cones. The effectiveness of the proposed method was demonstrated experimentally using a private, multi-view hand shape dataset and two public databases.

In this paper, we propose a method for image-set classification based on convex cone models. Image set classification aims to classify a set of images, which were usually obtained from video frames or multi-view cameras, into a target object. To accurately and stably classify a set, it is essential to represent structural information of the set accurately. There are various representative image features, such as histogram based features, HLAC, and Convolutional Neural Network (CNN) features. We should note that most of them have non-negativity and thus can be effectively represented by a convex cone. This leads us to introduce the convex cone representation to image-set classification. To establish a convex cone based framework, we mathematically define multiple angles between two convex cones, and then define the geometric similarity between the cones using the angles. Moreover, to enhance the framework, we introduce a discriminant space that maximizes the between-class variance (gaps) and minimizes the within-class variance of the projected convex cones onto the discriminant space, similar to the Fisher discriminant analysis. Finally, the classification is performed based on the similarity between projected convex cones. The effectiveness of the proposed method is demonstrated experimentally by using five databases: CMU PIE dataset, ETH-80, CMU Motion of Body dataset, Youtube Celebrity dataset, and a private database of multi-view hand shapes.

Text classification has become indispensable due to the rapid increase of text in digital form. Over the past three decades, efforts have been made to approach this task using various learning algorithms and statistical models based on bag-of-words (BOW) features. Despite its simple implementation, BOW features lack semantic meaning representation. To solve this problem, neural networks started to be employed to learn word vectors, such as the word2vec. Word2vec embeds word semantic structure into vectors, where the angle between vectors indicates the meaningful similarity between words. To measure the similarity between texts, we propose the novel concept of word subspace, which can represent the intrinsic variability of features in a set of word vectors. Through this concept, it is possible to model text from word vectors while holding semantic information. To incorporate the word frequency directly in the subspace model, we further extend the word subspace to the term-frequency (TF) weighted word subspace. Based on these new concepts, text classification can be performed under the mutual subspace method (MSM) framework. The validity of our modeling is shown through experiments on the Reuters text database, comparing the results to various state-of-art algorithms.

Planar markers are useful in robotics and computer vision for mapping and localisation. Given a detected marker in an image, a frequent task is to estimate the 6DOF pose of the marker relative to the camera, which is an instance of planar pose estimation (PPE). Although there are mature techniques, PPE suffers from a fundamental ambiguity problem, in that there can be more than one plausible pose solutions for a PPE instance. Especially when localisation of the marker corners is noisy, it is often difficult to disambiguate the pose solutions based on reprojection error alone. Previous methods choose between the possible solutions using a heuristic criteria, or simply ignore ambiguous markers. We propose to resolve the ambiguities by examining the consistencies of a set of markers across multiple views. Our specific contributions include a novel rotation averaging formulation that incorporates long-range dependencies between possible marker orientation solutions that arise from PPE ambiguities. We analyse the combinatorial complexity of the problem, and develop a novel lifted algorithm to effectively resolve marker pose ambiguities, without discarding any marker observations. Results on real and synthetic data show that our method is able to handle highly ambiguous inputs, and provides more accurate and/or complete marker-based mapping and localisation.

The increasing use of multiple sensors requires more efficient methods to represent and classify multi-dimensional data, since these applications produce a large amount of data, demanding modern techniques for data processing. Considering these observations, we present in this paper a new method for multi-dimensional data classification which relies on two premises: 1) multi-dimensional data are usually represented by tensors, due to benefits from multilinear algebra and the established tensor factorization methods; and 2) this kind of data can be described by a subspace lying within a vector space. Subspace representation has been consistently employed for pattern-set recognition, and its tensor representation counterpart is also available in the literature. However, traditional methods do not employ discriminative information of the tensors, which degrades the classification accuracy. In this scenario, generalized difference subspace (GDS) may provide an enhanced subspace representation by reducing data redundancy and revealing discriminative structures. Since GDS is not able to directly handle tensor data, we propose a new projection called n-mode GDS, which efficiently handles tensor data. In addition, n-mode Fisher score is introduced as a class separability index and an improved metric based on the geodesic distance is provided to measure the similarity between tensor data. To confirm the advantages of the proposed method, we address the problem of representing and classifying tensor data for gesture and action recognition. The experimental results have shown that the proposed approach outperforms methods commonly used in the literature without adopting pre-trained models or transfer learning.