Models, code, and papers for "Prerana Mukherjee":

AnimePose: Multi-person 3D pose estimation and animation

Feb 06, 2020
Laxman Kumarapu, Prerana Mukherjee

3D animation of humans in action is quite challenging as it involves using a huge setup with several motion trackers all over the person's body to track the movements of every limb. This is time-consuming and may cause the person discomfort in wearing exoskeleton body suits with motion sensors. In this work, we present a trivial yet effective solution to generate 3D animation of multiple persons from a 2D video using deep learning. Although significant improvement has been achieved recently in 3D human pose estimation, most of the prior works work well in case of single person pose estimation and multi-person pose estimation is still a challenging problem. In this work, we firstly propose a supervised multi-person 3D pose estimation and animation framework namely AnimePose for a given input RGB video sequence. The pipeline of the proposed system consists of various modules: i) Person detection and segmentation, ii) Depth Map estimation, iii) Lifting 2D to 3D information for person localization iv) Person trajectory prediction and human pose tracking. Our proposed system produces comparable results on previous state-of-the-art 3D multi-person pose estimation methods on publicly available datasets MuCo-3DHP and MuPoTS-3D datasets and it also outperforms previous state-of-the-art human pose tracking methods by a significant margin of 11.7% performance gain on MOTA score on Posetrack 2018 dataset.

* arXiv admin note: text overlap with arXiv:1907.11346 by other authors 

  Click for Model/Code and Paper
Nrityantar: Pose oblivious Indian classical dance sequence classification system

Dec 13, 2018
Vinay Kaushik, Prerana Mukherjee, Brejesh Lall

In this paper, we attempt to advance the research work done in human action recognition to a rather specialized application namely Indian Classical Dance (ICD) classification. The variation in such dance forms in terms of hand and body postures, facial expressions or emotions and head orientation makes pose estimation an extremely challenging task. To circumvent this problem, we construct a pose-oblivious shape signature which is fed to a sequence learning framework. The pose signature representation is done in two-fold process. First, we represent person-pose in first frame of a dance video using symmetric Spatial Transformer Networks (STN) to extract good person object proposals and CNN-based parallel single person pose estimator (SPPE). Next, the pose basis are converted to pose flows by assigning a similarity score between successive poses followed by non-maximal suppression. Instead of feeding a simple chain of joints in the sequence learner which generally hinders the network performance we constitute a feature vector of the normalized distance vectors, flow, angles between anchor joints which captures the adjacency configuration in the skeletal pattern. Thus, the kinematic relationship amongst the body joints across the frames using pose estimation helps in better establishing the spatio-temporal dependencies. We present an exhaustive empirical evaluation of state-of-the-art deep network based methods for dance classification on ICD dataset.

* Eleventh Indian Conference on Computer Vision, Graphics and Image Processing 2018 

  Click for Model/Code and Paper
Object cosegmentation using deep Siamese network

Mar 08, 2018
Prerana Mukherjee, Brejesh Lall, Snehith Lattupally

Object cosegmentation addresses the problem of discovering similar objects from multiple images and segmenting them as foreground simultaneously. In this paper, we propose a novel end-to-end pipeline to segment the similar objects simultaneously from relevant set of images using supervised learning via deep-learning framework. We experiment with multiple set of object proposal generation techniques and perform extensive numerical evaluations by training the Siamese network with generated object proposals. Similar objects proposals for the test images are retrieved using the ANNOY (Approximate Nearest Neighbor) library and deep semantic segmentation is performed on them. Finally, we form a collage from the segmented similar objects based on the relative importance of the objects.

* Appears in International Conference on Pattern Recognition and Artificial Intelligence (ICPRAI), 2018 

  Click for Model/Code and Paper
SalProp: Salient object proposals via aggregated edge cues

Jun 14, 2017
Prerana Mukherjee, Brejesh Lall, Sarvaswa Tandon

In this paper, we propose a novel object proposal generation scheme by formulating a graph-based salient edge classification framework that utilizes the edge context. In the proposed method, we construct a Bayesian probabilistic edge map to assign a saliency value to the edgelets by exploiting low level edge features. A Conditional Random Field is then learned to effectively combine these features for edge classification with object/non-object label. We propose an objectness score for the generated windows by analyzing the salient edge density inside the bounding box. Extensive experiments on PASCAL VOC 2007 dataset demonstrate that the proposed method gives competitive performance against 10 popular generic object detection techniques while using fewer number of proposals.

* 5 pages, 4 figures, accepted at ICIP 2017 

  Click for Model/Code and Paper
Benchmarking KAZE and MCM for Multiclass Classification

May 20, 2015
Siddharth Srivastava, Prerana Mukherjee, Brejesh Lall

In this paper, we propose a novel approach for feature generation by appropriately fusing KAZE and SIFT features. We then use this feature set along with Minimal Complexity Machine(MCM) for object classification. We show that KAZE and SIFT features are complementary. Experimental results indicate that an elementary integration of these techniques can outperform the state-of-the-art approaches.


  Click for Model/Code and Paper
Aerial multi-object tracking by detection using deep association networks

Sep 04, 2019
Ajit Jadhav, Prerana Mukherjee, Vinay Kaushik, Brejesh Lall

A lot a research is focused on object detection and it has achieved significant advances with deep learning techniques in recent years. Inspite of the existing research, these algorithms are not usually optimal for dealing with sequences or images captured by drone-based platforms, due to various challenges such as view point change, scales, density of object distribution and occlusion. In this paper, we develop a model for detection of objects in drone images using the VisDrone2019 DET dataset. Using the RetinaNet model as our base, we modify the anchor scales to better handle the detection of dense distribution and small size of the objects. We explicitly model the channel interdependencies by using "Squeeze-and-Excitation" (SE) blocks that adaptively recalibrates channel-wise feature responses. This helps to bring significant improvements in performance at a slight additional computational cost. Using this architecture for object detection, we build a custom DeepSORT network for object detection on the VisDrone2019 MOT dataset by training a custom Deep Association network for the algorithm.


  Click for Model/Code and Paper
VayuAnukulani: Adaptive Memory Networks for Air Pollution Forecasting

Apr 08, 2019
Divyam Madaan, Radhika Dua, Prerana Mukherjee, Brejesh Lall

Air pollution is the leading environmental health hazard globally due to various sources which include factory emissions, car exhaust and cooking stoves. As a precautionary measure, air pollution forecast serves as the basis for taking effective pollution control measures, and accurate air pollution forecasting has become an important task. In this paper, we forecast fine-grained ambient air quality information for 5 prominent locations in Delhi based on the historical and real-time ambient air quality and meteorological data reported by Central Pollution Control board. We present VayuAnukulani system, a novel end-to-end solution to predict air quality for next 24 hours by estimating the concentration and level of different air pollutants including nitrogen dioxide ($NO_2$), particulate matter ($PM_{2.5}$ and $PM_{10}$) for Delhi. Extensive experiments on data sources obtained in Delhi demonstrate that the proposed adaptive attention based Bidirectional LSTM Network outperforms several baselines for classification and regression models. The accuracy of the proposed adaptive system is $\sim 15 - 20\%$ better than the same offline trained model. We compare the proposed methodology on several competing baselines, and show that the network outperforms conventional methods by $\sim 3 - 5 \%$.


  Click for Model/Code and Paper
Enhanced Characterness for Text Detection in the Wild

Dec 04, 2017
Aarushi Agrawal, Prerana Mukherjee, Siddharth Srivastava, Brejesh Lall

Text spotting is an interesting research problem as text may appear at any random place and may occur in various forms. Moreover, ability to detect text opens the horizons for improving many advanced computer vision problems. In this paper, we propose a novel language agnostic text detection method utilizing edge enhanced Maximally Stable Extremal Regions in natural scenes by defining strong characterness measures. We show that a simple combination of characterness cues help in rejecting the non text regions. These regions are further fine-tuned for rejecting the non-textual neighbor regions. Comprehensive evaluation of the proposed scheme shows that it provides comparative to better generalization performance to the traditional methods for this task.


  Click for Model/Code and Paper
Object Classification using Ensemble of Local and Deep Features

Dec 04, 2017
Siddharth Srivastava, Prerana Mukherjee, Brejesh Lall, Kamlesh Jaiswal

In this paper we propose an ensemble of local and deep features for object classification. We also compare and contrast effectiveness of feature representation capability of various layers of convolutional neural network. We demonstrate with extensive experiments for object classification that the representation capability of features from deep networks can be complemented with information captured from local features. We also find out that features from various deep convolutional networks encode distinctive characteristic information. We establish that, as opposed to conventional practice, intermediate layers of deep networks can augment the classification capabilities of features obtained from fully connected layers.

* Accepted for publication at Ninth International Conference on Advances in Pattern Recognition 

  Click for Model/Code and Paper
A Light weight and Hybrid Deep Learning Model based Online Signature Verification

Jul 09, 2019
Chandra Sekhar V., Anoushka Doctor, Prerana Mukherjee, Viswanath Pulabaigiri

The augmented usage of deep learning-based models for various AI related problems are as a result of modern architectures of deeper length and the availability of voluminous interpreted datasets. The models based on these architectures require huge training and storage cost, which makes them inefficient to use in critical applications like online signature verification (OSV) and to deploy in resource constraint devices. As a solution, in this work, our contribution is two-fold. 1) An efficient dimensionality reduction technique, to reduce the number of features to be considered and 2) a state-of-the-art model CNN-LSTM based hybrid architecture for online signature verification. Thorough experiments on the publicly available datasets MCYT, SUSIG, SVC confirms that the proposed model achieves better accuracy even with as low as one training sample. The proposed models yield state-of-the-art performance in various categories of all the three datasets.

* accepted in ICDAR-WML: The 2nd International Workshop on Machine Learning 2019 

  Click for Model/Code and Paper
OSVNet: Convolutional Siamese Network for Writer Independent Online Signature Verification

May 21, 2019
Chandra Sekhar, Prerana Mukherjee, Devanur S Guru, Viswanath Pulabaigari

Online signature verification (OSV) is one of the most challenging tasks in writer identification and digital forensics. Owing to the large intra-individual variability, there is a critical requirement to accurately learn the intra-personal variations of the signature to achieve higher classification accuracy. To achieve this, in this paper, we propose an OSV framework based on deep convolutional Siamese network (DCSN). DCSN automatically extracts robust feature descriptions based on metric-based loss function which decreases intra-writer variability (Genuine-Genuine) and increases inter-individual variability (Genuine-Forgery) and directs the DCSN for effective discriminative representation learning for online signatures and extend it for one shot learning framework. Comprehensive experimentation conducted on three widely accepted benchmark datasets MCYT-100 (DB1), MCYT-330 (DB2) and SVC-2004-Task2 demonstrate the capability of our framework to distinguish the genuine and forgery samples. Experimental results confirm the efficiency of deep convolutional Siamese network based OSV by achieving a lower error rate as compared to many recent and state-of-the art OSV techniques.

* accepted in International Conference on Document Analysis and Recognition (ICDAR 2019), University of Technology Sydney (UTS), Australia 

  Click for Model/Code and Paper
Multi-level Attention network using text, audio and video for Depression Prediction

Sep 03, 2019
Anupama Ray, Siddharth Kumar, Rutvik Reddy, Prerana Mukherjee, Ritu Garg

Depression has been the leading cause of mental-health illness worldwide. Major depressive disorder (MDD), is a common mental health disorder that affects both psychologically as well as physically which could lead to loss of lives. Due to the lack of diagnostic tests and subjectivity involved in detecting depression, there is a growing interest in using behavioural cues to automate depression diagnosis and stage prediction. The absence of labelled behavioural datasets for such problems and the huge amount of variations possible in behaviour makes the problem more challenging. This paper presents a novel multi-level attention based network for multi-modal depression prediction that fuses features from audio, video and text modalities while learning the intra and inter modality relevance. The multi-level attention reinforces overall learning by selecting the most influential features within each modality for the decision making. We perform exhaustive experimentation to create different regression models for audio, video and text modalities. Several fusions models with different configurations are constructed to understand the impact of each feature and modality. We outperform the current baseline by 17.52% in terms of root mean squared error.

* in Proceedings of the 9th International Workshop on Audio/Visual Emotion Challenge, AVEC 2019, ACM Multimedia Workshop, Nice, France 

  Click for Model/Code and Paper
Online Signature Verification Based on Writer Specific Feature Selection and Fuzzy Similarity Measure

May 21, 2019
Chandra Sekhar V, Prerana Mukherjee, D. S. Guru, Viswanath Pulabaigari

Online Signature Verification (OSV) is a widely used biometric attribute for user behavioral characteristic verification in digital forensics. In this manuscript, owing to large intra-individual variability, a novel method for OSV based on an interval symbolic representation and a fuzzy similarity measure grounded on writer specific parameter selection is proposed. The two parameters, namely, writer specific acceptance threshold and optimal feature set to be used for authenticating the writer are selected based on minimum equal error rate (EER) attained during parameter fixation phase using the training signature samples. This is in variation to current techniques for OSV, which are primarily writer independent, in which a common set of features and acceptance threshold are chosen. To prove the robustness of our system, we have exhaustively assessed our system with four standard datasets i.e. MCYT-100 (DB1), MCYT-330 (DB2), SUSIG-Visual corpus and SVC-2004- Task2. Experimental outcome confirms the effectiveness of fuzzy similarity metric-based writer dependent parameter selection for OSV by achieving a lower error rate as compared to many recent and state-of-the art OSV models.

* accepted in Applications of Computer Vision and Pattern Recognition to Media Forensics, CVPRW, 2019, Long Beach, California 

  Click for Model/Code and Paper
Attentional networks for music generation

Feb 06, 2020
Gullapalli Keerti, A N Vaishnavi, Prerana Mukherjee, A Sree Vidya, Gattineni Sai Sreenithya, Deeksha Nayab

Realistic music generation has always remained as a challenging problem as it may lack structure or rationality. In this work, we propose a deep learning based music generation method in order to produce old style music particularly JAZZ with rehashed melodic structures utilizing a Bi-directional Long Short Term Memory (Bi-LSTM) Neural Network with Attention. Owing to the success in modelling long-term temporal dependencies in sequential data and its success in case of videos, Bi-LSTMs with attention serve as the natural choice and early utilization in music generation. We validate in our experiments that Bi-LSTMs with attention are able to preserve the richness and technical nuances of the music performed.


  Click for Model/Code and Paper
DSAL-GAN: Denoising based Saliency Prediction with Generative Adversarial Networks

Apr 02, 2019
Prerana Mukherjee, Manoj Sharma, Megh Makwana, Ajay Pratap Singh, Avinash Upadhyay, Akkshita Trivedi, Brejesh Lall, Santanu Chaudhury

Synthesizing high quality saliency maps from noisy images is a challenging problem in computer vision and has many practical applications. Samples generated by existing techniques for saliency detection cannot handle the noise perturbations smoothly and fail to delineate the salient objects present in the given scene. In this paper, we present a novel end-to-end coupled Denoising based Saliency Prediction with Generative Adversarial Network (DSAL-GAN) framework to address the problem of salient object detection in noisy images. DSAL-GAN consists of two generative adversarial-networks (GAN) trained end-to-end to perform denoising and saliency prediction altogether in a holistic manner. The first GAN consists of a generator which denoises the noisy input image, and in the discriminator counterpart we check whether the output is a denoised image or ground truth original image. The second GAN predicts the saliency maps from raw pixels of the input denoised image using a data-driven metric based on saliency prediction method with adversarial loss. Cycle consistency loss is also incorporated to further improve salient region prediction. We demonstrate with comprehensive evaluation that the proposed framework outperforms several baseline saliency models on various performance benchmarks.


  Click for Model/Code and Paper