Models, code, and papers for "Aurobinda Routray":
Algorithms for accurate localization of pupil centre is essential for gaze tracking in real world conditions. Most of the algorithms fail in real world conditions like illumination variations, contact lenses, glasses, eye makeup, motion blur, noise, etc. We propose a new algorithm which improves the detection rate in real world conditions. The proposed algorithm uses both edges as well as intensity information along with a candidate filtering approach to identify the best pupil candidate. A simple tracking scheme has also been added which improves the processing speed. The algorithm has been evaluated in Labelled Pupil in the Wild (LPW) dataset, largest in its class which contains real world conditions. The proposed algorithm outperformed the state of the art algorithms while achieving real-time performance.
This paper presents a framework for recognition of human activity from egocentric video and eye tracking data obtained from a head-mounted eye tracker. Three channels of information such as eye movement, ego-motion, and visual features are combined for the classification of activities. Image features were extracted using a pre-trained convolutional neural network. Eye and ego-motion are quantized, and the windowed histograms are used as the features. The combination of features obtains better accuracy for activity classification as compared to individual features.
Iris centre localization in low-resolution visible images is a challenging problem in computer vision community due to noise, shadows, occlusions, pose variations, eye blinks, etc. This paper proposes an efficient method for determining iris centre in low-resolution images in the visible spectrum. Even low-cost consumer-grade webcams can be used for gaze tracking without any additional hardware. A two-stage algorithm is proposed for iris centre localization. The proposed method uses geometrical characteristics of the eye. In the first stage, a fast convolution based approach is used for obtaining the coarse location of iris centre (IC). The IC location is further refined in the second stage using boundary tracing and ellipse fitting. The algorithm has been evaluated in public databases like BioID, Gi4E and is found to outperform the state of the art methods.
Estimation eye gaze direction is useful in various human-computer interaction tasks. Knowledge of gaze direction can give valuable information regarding users point of attention. Certain patterns of eye movements known as eye accessing cues are reported to be related to the cognitive processes in the human brain. We propose a real-time framework for the classification of eye gaze direction and estimation of eye accessing cues. In the first stage, the algorithm detects faces using a modified version of the Viola-Jones algorithm. A rough eye region is obtained using geometric relations and facial landmarks. The eye region obtained is used in the subsequent stage to classify the eye gaze direction. A convolutional neural network is employed in this work for the classification of eye gaze direction. The proposed algorithm was tested on Eye Chimera database and found to outperform state of the art methods. The computational complexity of the algorithm is very less in the testing phase. The algorithm achieved an average frame rate of 24 fps in the desktop environment.
This paper proposes a novel framework for the use of eye movement patterns for biometric applications. Eye movements contain abundant information about cognitive brain functions, neural pathways, etc. In the proposed method, eye movement data is classified into fixations and saccades. Features extracted from fixations and saccades are used by a Gaussian Radial Basis Function Network (GRBFN) based method for biometric authentication. A score fusion approach is adopted to classify the data in the output layer. In the evaluation stage, the algorithm has been tested using two types of stimuli: random dot following on a screen and text reading. The results indicate the strength of eye movement pattern as a biometric modality. The algorithm has been evaluated on BioEye 2015 database and found to outperform all the other methods. Eye movements are generated by a complex oculomotor plant which is very hard to spoof by mechanical replicas. Use of eye movement dynamics along with iris recognition technology may lead to a robust counterfeit-resistant person identification system.
This thesis describes the development of fast algorithms for the computation of PERcentage CLOSure of eyes (PERCLOS) and Saccadic Ratio (SR). PERCLOS and SR are two ocular parameters reported to be measures of alertness levels in human beings. PERCLOS is the percentage of time in which at least 80% of the eyelid remains closed over the pupil. Saccades are fast and simultaneous movement of both the eyes in the same direction. SR is the ratio of peak saccadic velocity to the saccadic duration. This thesis addresses the issues of image based estimation of PERCLOS and SR, prevailing in the literature such as illumination variation, poor illumination conditions, head rotations etc. In this work, algorithms for real-time PERCLOS computation has been developed and implemented on an embedded platform. The platform has been used as a case study for assessment of loss of attention in automotive drivers. The SR estimation has been carried out offline as real-time implementation requires high frame rates of processing which is difficult to achieve due to hardware limitations. The accuracy in estimation of the loss of attention using PERCLOS and SR has been validated using brain signals, which are reported to be an authentic cue for estimating the state of alertness in human beings. The major contributions of this thesis include database creation, design and implementation of fast algorithms for estimating PERCLOS and SR on embedded computing platforms.
The alertness level of drivers can be estimated with the use of computer vision based methods. The level of fatigue can be found from the value of PERCLOS. It is the ratio of closed eye frames to the total frames processed. The main objective of the thesis is the design and implementation of real-time algorithms for measurement of PERCLOS. In this work we have developed a real-time system which is able to process the video onboard and to alarm the driver in case the driver is in alert. For accurate estimation of PERCLOS the frame rate should be greater than 4 and accuracy should be greater than 90%. For eye detection we have used mainly two approaches Haar classifier based method and Principal Component Analysis (PCA) based method for day time. During night time active Near Infra Red (NIR) illumination is used. Local Binary Pattern (LBP) histogram based method is used for the detection of eyes at night time. The accuracy rate of the algorithms was found to be more than 90% at frame rates more than 5 fps which was suitable for the application.
Facial expression recognition has many potential applications which has attracted the attention of researchers in the last decade. Feature extraction is one important step in expression analysis which contributes toward fast and accurate expression recognition. This paper represents an approach of combining the shape and appearance features to form a hybrid feature vector. We have extracted Pyramid of Histogram of Gradients (PHOG) as shape descriptors and Local Binary Patterns (LBP) as appearance features. The proposed framework involves a novel approach of extracting hybrid features from active facial patches. The active facial patches are located on the face regions which undergo a major change during different expressions. After detection of facial landmarks, the active patches are localized and hybrid features are calculated from these patches. The use of small parts of face instead of the whole face for extracting features reduces the computational cost and prevents the over-fitting of the features for classification. By using linear discriminant analysis, the dimensionality of the feature is reduced which is further classified by using the support vector machine (SVM). The experimental results on two publicly available databases show promising accuracy in recognizing all expression classes.
Extraction of discriminative features from salient facial patches plays a vital role in effective facial expression recognition. The accurate detection of facial landmarks improves the localization of the salient patches on face images. This paper proposes a novel framework for expression recognition by using appearance features of selected facial patches. A few prominent facial patches, depending on the position of facial landmarks, are extracted which are active during emotion elicitation. These active patches are further processed to obtain the salient patches which contain discriminative features for classification of each pair of expressions, thereby selecting different facial patches as salient for different pair of expression classes. One-against-one classification method is adopted using these features. In addition, an automated learning-free facial landmark detection technique has been proposed, which achieves similar performances as that of other state-of-art landmark detection methods, yet requires significantly less execution time. The proposed method is found to perform well consistently in different resolutions, hence, providing a solution for expression recognition in low resolution images. Experiments on CK+ and JAFFE facial expression databases show the effectiveness of the proposed system.
Hyperspectral images (HSI) contain a wealth of information over hundreds of contiguous spectral bands, making it possible to classify materials through subtle spectral discrepancies. However, the classification of this rich spectral information is accompanied by the challenges like high dimensionality, singularity, limited training samples, lack of labeled data samples, heteroscedasticity and nonlinearity. To address these challenges, we propose a semi-supervised graph based dimensionality reduction method named `semi-supervised spatial spectral regularized manifold local scaling cut' (S3RMLSC). The underlying idea of the proposed method is to exploit the limited labeled information from both the spectral and spatial domains along with the abundant unlabeled samples to facilitate the classification task by retaining the original distribution of the data. In S3RMLSC, a hierarchical guided filter (HGF) is initially used to smoothen the pixels of the HSI data to preserve the spatial pixel consistency. This step is followed by the construction of linear patches from the nonlinear manifold by using the maximal linear patch (MLP) criterion. Then the inter-patch and intra-patch dissimilarity matrices are constructed in both spectral and spatial domains by regularized manifold local scaling cut (RMLSC) and neighboring pixel manifold local scaling cut (NPMLSC) respectively. Finally, we obtain the projection matrix by optimizing the updated semi-supervised spatial-spectral between-patch and total-patch dissimilarity. The effectiveness of the proposed DR algorithm is illustrated with publicly available real-world HSI datasets.
This paper presents a database of human faces for persons wearing spectacles. The database consists of images of faces having significant variations with respect to illumination, head pose, skin color, facial expressions and sizes, and nature of spectacles. The database contains data of 60 subjects. This database is expected to be a precious resource for the development and evaluation of algorithms for face detection, eye detection, head tracking, eye gaze tracking, etc., for subjects wearing spectacles. As such, this can be a valuable contribution to the computer vision community.
Face detection is an essential step in many computer vision applications like surveillance, tracking, medical analysis, facial expression analysis etc. Several approaches have been made in the direction of face detection. Among them, Haar-like features based method is a robust method. In spite of the robustness, Haar - like features work with some limitations. However, with some simple modifications in the algorithm, its performance can be made faster and more robust. The present work refers to the increase in speed of operation of the original algorithm by down sampling the frames and its analysis with different scale factors. It also discusses the detection of tilted faces using an affine transformation of the input image.
Dimensionality reduction (DR) methods have attracted extensive attention to provide discriminative information and reduce the computational burden of the hyperspectral image (HSI) classification. However, the DR methods face many challenges due to limited training samples with high dimensional spectra. To address this issue, a graph-based spatial and spectral regularized local scaling cut (SSRLSC) for DR of HSI data is proposed. The underlying idea of the proposed method is to utilize the information from both the spectral and spatial domains to achieve better classification accuracy than its spectral domain counterpart. In SSRLSC, a guided filter is initially used to smoothen and homogenize the pixels of the HSI data in order to preserve the pixel consistency. This is followed by generation of between-class and within-class dissimilarity matrices in both spectral and spatial domains by regularized local scaling cut (RLSC) and neighboring pixel local scaling cut (NPLSC) respectively. Finally, we obtain the projection matrix by optimizing the updated spatial-spectral between-class and total-class dissimilarity. The effectiveness of the proposed DR algorithm is illustrated with two popular real-world HSI datasets.
The lack of proper class discrimination among the Hyperspectral (HS) data points poses a potential challenge in HS classification. To address this issue, this paper proposes an optimal geometry-aware transformation for enhancing the classification accuracy. The underlying idea of this method is to obtain a linear projection matrix by solving a nonlinear objective function based on the intrinsic geometrical structure of the data. The objective function is constructed to quantify the discrimination between the points from dissimilar classes on the projected data space. Then the obtained projection matrix is used to linearly map the data to more discriminative space. The effectiveness of the proposed transformation is illustrated with three benchmark real-world HS data sets. The experiments reveal that the classification and dimensionality reduction methods on the projected discriminative space outperform their counterpart in the original space.
In this paper, we propose an L1 normalized graph based dimensionality reduction method for Hyperspectral images, called as L1-Scaling Cut (L1-SC). The underlying idea of this method is to generate the optimal projection matrix by retaining the original distribution of the data. Though L2-norm is generally preferred for computation, it is sensitive to noise and outliers. However, L1-norm is robust to them. Therefore, we obtain the optimal projection matrix by maximizing the ratio of between-class dispersion to within-class dispersion using L1-norm. Furthermore, an iterative algorithm is described to solve the optimization problem. The experimental results of the HSI classification confirm the effectiveness of the proposed L1-SC method on both noisy and noiseless data.
Feature selection has been studied widely in the literature. However, the efficacy of the selection criteria for low sample size applications is neglected in most cases. Most of the existing feature selection criteria are based on the sample similarity. However, the distance measures become insignificant for high dimensional low sample size (HDLSS) data. Moreover, the variance of a feature with a few samples is pointless unless it represents the data distribution efficiently. Instead of looking at the samples in groups, we evaluate their efficiency based on pairwise fashion. In our investigation, we noticed that considering a pair of samples at a time and selecting the features that bring them closer or put them far away is a better choice for feature selection. Experimental results on benchmark data sets demonstrate the effectiveness of the proposed method with low sample size, which outperforms many other state-of-the-art feature selection methods.
This paper presents a novel pre-processing scheme to improve the prediction of sand fraction from multiple seismic attributes such as seismic impedance, amplitude and frequency using machine learning and information filtering. The available well logs along with the 3-D seismic data have been used to benchmark the proposed pre-processing stage using a methodology which primarily consists of three steps: pre-processing, training and post-processing. An Artificial Neural Network (ANN) with conjugate-gradient learning algorithm has been used to model the sand fraction. The available sand fraction data from the high resolution well logs has far more information content than the low resolution seismic attributes. Therefore, regularization schemes based on Fourier Transform (FT), Wavelet Decomposition (WD) and Empirical Mode Decomposition (EMD) have been proposed to shape the high resolution sand fraction data for effective machine learning. The input data sets have been segregated into training, testing and validation sets. The test results are primarily used to check different network structures and activation function performances. Once the network passes the testing phase with an acceptable performance in terms of the selected evaluators, the validation phase follows. In the validation stage, the prediction model is tested against unseen data. The network yielding satisfactory performance in the validation stage is used to predict lithological properties from seismic attributes throughout a given volume. Finally, a post-processing scheme using 3-D spatial filtering is implemented for smoothing the sand fraction in the volume. Prediction of lithological properties using this framework is helpful for Reservoir Characterization.
Facial expression analysis is one of the popular fields of research in human computer interaction (HCI). It has several applications in next generation user interfaces, human emotion analysis, behavior and cognitive modeling. In this paper, a facial expression classification algorithm is proposed which uses Haar classifier for face detection purpose, Local Binary Patterns (LBP) histogram of different block sizes of a face image as feature vectors and classifies various facial expressions using Principal Component Analysis (PCA). The algorithm is implemented in real time for expression classification since the computational complexity of the algorithm is small. A customizable approach is proposed for facial expression analysis, since the various expressions and intensity of expressions vary from person to person. The system uses grayscale frontal face images of a person to classify six basic emotions namely happiness, sadness, disgust, fear, surprise and anger.
Emotions are best way of communicating information; and sometimes it carry more information than words. Recently, there has been a huge interest in automatic recognition of human emotion because of its wide spread application in security, surveillance, marketing, advertisement, and human-computer interaction. To communicate with a computer in a natural way, it will be desirable to use more natural modes of human communication based on voice, gestures and facial expressions. In this paper, a holistic approach for facial expression recognition is proposed which captures the variation in facial features in temporal domain and classifies the sequence of images in different emotions. The proposed method uses Haar-like features to detect face in an image. The dimensionality of the eigenspace is reduced using Principal Component Analysis (PCA). By projecting the subsequent face images into principal eigen directions, the variation pattern of the obtained weight vector is modeled to classify it into different emotions. Owing to the variations of expressions for different people and its intensity, a person specific method for emotion recognition is followed. Using the gray scale images of the frontal face, the system is able to classify four basic emotions such as happiness, sadness, surprise, and anger.
In this paper, a modified algorithm for the detection of nasal and temporal eye corners is presented. The algorithm is a modification of the Santos and Proenka Method. In the first step, we detect the face and the eyes using classifiers based on Haar-like features. We then segment out the sclera, from the detected eye region. From the segmented sclera, we segment out an approximate eyelid contour. Eye corner candidates are obtained using Harris and Stephens corner detector. We introduce a post-pruning of the Eye corner candidates to locate the eye corners, finally. The algorithm has been tested on Yale, JAFFE databases as well as our created database.