Models, code, and papers for "Dinesh Acharya":

Covariance Pooling For Facial Expression Recognition

May 13, 2018
Dinesh Acharya, Zhiwu Huang, Danda Paudel, Luc Van Gool

Classifying facial expressions into different categories requires capturing regional distortions of facial landmarks. We believe that second-order statistics such as covariance is better able to capture such distortions in regional facial fea- tures. In this work, we explore the benefits of using a man- ifold network structure for covariance pooling to improve facial expression recognition. In particular, we first employ such kind of manifold networks in conjunction with tradi- tional convolutional networks for spatial pooling within in- dividual image feature maps in an end-to-end deep learning manner. By doing so, we are able to achieve a recognition accuracy of 58.14% on the validation set of Static Facial Expressions in the Wild (SFEW 2.0) and 87.0% on the vali- dation set of Real-World Affective Faces (RAF) Database. Both of these results are the best results we are aware of. Besides, we leverage covariance pooling to capture the tem- poral evolution of per-frame features for video-based facial expression recognition. Our reported results demonstrate the advantage of pooling image-set features temporally by stacking the designed manifold network of covariance pool-ing on top of convolutional network layers.

  Access Model/Code and Paper
A New Similarity Measure for Taxonomy Based on Edge Counting

Nov 20, 2012
Manjula Shenoy. K, K. C. Shet, U. Dinesh Acharya

This paper introduces a new similarity measure based on edge counting in a taxonomy like WorldNet or Ontology. Measurement of similarity between text segments or concepts is very useful for many applications like information retrieval, ontology matching, text mining, and question answering and so on. Several measures have been developed for measuring similarity between two concepts: out of these we see that the measure given by Wu and Palmer [1] is simple, and gives good performance. Our measure is based on their measure but strengthens it. Wu and Palmer [1] measure has a disadvantage that it does not consider how far the concepts are semantically. In our measure we include the shortest path between the concepts and the depth of whole taxonomy together with the distances used in Wu and Palmer [1]. Also the measure has following disadvantage i.e. in some situations, the similarity of two elements of an IS-A ontology contained in the neighborhood exceeds the similarity value of two elements contained in the same hierarchy. Our measure introduces a penalization factor for this case based upon shortest length between the concepts and depth of whole taxonomy.

  Access Model/Code and Paper
Wasserstein Divergence for GANs

Sep 05, 2018
Jiqing Wu, Zhiwu Huang, Janine Thoma, Dinesh Acharya, Luc Van Gool

In many domains of computer vision, generative adversarial networks (GANs) have achieved great success, among which the family of Wasserstein GANs (WGANs) is considered to be state-of-the-art due to the theoretical contributions and competitive qualitative performance. However, it is very challenging to approximate the $k$-Lipschitz constraint required by the Wasserstein-1 metric~(W-met). In this paper, we propose a novel Wasserstein divergence~(W-div), which is a relaxed version of W-met and does not require the $k$-Lipschitz constraint. As a concrete application, we introduce a Wasserstein divergence objective for GANs~(WGAN-div), which can faithfully approximate W-div through optimization. Under various settings, including progressive growing training, we demonstrate the stability of the proposed WGAN-div owing to its theoretical and practical advantages over WGANs. Also, we study the quantitative and visual performance of WGAN-div on standard image synthesis benchmarks of computer vision, showing the superior performance of WGAN-div compared to the state-of-the-art methods.

* accepted by eccv_2018, correct minor errors 

  Access Model/Code and Paper
Improving Video Generation for Multi-functional Applications

Mar 14, 2018
Bernhard Kratzwald, Zhiwu Huang, Danda Pani Paudel, Acharya Dinesh, Luc Van Gool

In this paper, we aim to improve the state-of-the-art video generative adversarial networks (GANs) with a view towards multi-functional applications. Our improved video GAN model does not separate foreground from background nor dynamic from static patterns, but learns to generate the entire video clip conjointly. Our model can thus be trained to generate - and learn from - a broad set of videos with no restriction. This is achieved by designing a robust one-stream video generation architecture with an extension of the state-of-the-art Wasserstein GAN framework that allows for better convergence. The experimental results show that our improved video GAN model outperforms state-of-theart video generative models on multiple challenging datasets. Furthermore, we demonstrate the superiority of our model by successfully extending it to three challenging problems: video colorization, video inpainting, and future prediction. To the best of our knowledge, this is the first work using GANs to colorize and inpaint video clips.

  Access Model/Code and Paper
Sliced Wasserstein Generative Models

Apr 13, 2019
Jiqing Wu, Zhiwu Huang, Dinesh Acharya, Wen Li, Janine Thoma, Danda Pani Paudel, Luc Van Gool

In generative modeling, the Wasserstein distance (WD) has emerged as a useful metric to measure the discrepancy between generated and real data distributions. Unfortunately, it is challenging to approximate the WD of high-dimensional distributions. In contrast, the sliced Wasserstein distance (SWD) factorizes high-dimensional distributions into their multiple one-dimensional marginal distributions and is thus easier to approximate. In this paper, we introduce novel approximations of the primal and dual SWD. Instead of using a large number of random projections, as it is done by conventional SWD approximation methods, we propose to approximate SWDs with a small number of parameterized orthogonal projections in an end-to-end deep learning fashion. As concrete applications of our SWD approximations, we design two types of differentiable SWD blocks to equip modern generative frameworks---Auto-Encoders (AE) and Generative Adversarial Networks (GAN). In the experiments, we not only show the superiority of the proposed generative models on standard image synthesis benchmarks, but also demonstrate the state-of-the-art performance on challenging high resolution image and video generation in an unsupervised manner.

* This paper is submitted to arxiv twice, thus withdraw one of the versions. See arXiv:1706.02631 instead 

  Access Model/Code and Paper
DensePeds: Pedestrian Tracking in Dense Crowds Using Front-RVO and Sparse Features

Jul 27, 2019
Rohan Chandra, Uttaran Bhattacharya, Aniket Bera, Dinesh Manocha

We present a pedestrian tracking algorithm, DensePeds, that tracks individuals in highly dense crowds (greater than 2 pedestrians per square meter). Our approach is designed for videos captured from front-facing or elevated cameras. We present a new motion model called Front-RVO (FRVO) for predicting pedestrian movements in dense situations using collision avoidance constraints and combine it with state-of-the-art Mask R-CNN to compute sparse feature vectors that reduce the loss of pedestrian tracks (false negatives). We evaluate DensePeds on the standard MOT benchmarks as well as a new dense crowd dataset. In practice, our approach is 4.5 times faster than prior tracking algorithms on the MOT benchmark and we are state-of-the-art in dense crowd videos by over 2.6% on the absolute scale on average.

* added more refs 

  Access Model/Code and Paper
TraPHic: Trajectory Prediction in Dense and Heterogeneous Traffic Using Weighted Interactions

Dec 13, 2018
Rohan Chandra, Uttaran Bhattacharya, Aniket Bera, Dinesh Manocha

We present a new algorithm for predicting the near-term trajectories of road-agents in dense traffic videos. Our approach is designed for heterogeneous traffic, where the road-agents may correspond to buses, cars, scooters, bicycles, or pedestrians. We model the interactions between different road-agents using a novel LSTM-CNN hybrid network for trajectory prediction. In particular, we take into account heterogeneous interactions that implicitly accounts for the varying shapes, dynamics, and behaviors of different road agents. In addition, we model horizon-based interactions which are used to implicitly model the driving behavior of each road-agent. We evaluate the performance of our prediction algorithm, TraPHic, on the standard datasets and also introduce a new dense, heterogeneous traffic dataset corresponding to urban Asian videos and agent trajectories. We outperform state-of-the-art methods on dense traffic datasets by 30%.

* 8 pages 

  Access Model/Code and Paper
Mechanism Design for Cost Optimal PAC Learning in the Presence of Strategic Noisy Annotators

Oct 16, 2012
Dinesh Garg, Sourangshu Bhattacharya, S. Sundararajan, Shirish Shevade

We consider the problem of Probably Approximate Correct (PAC) learning of a binary classifier from noisy labeled examples acquired from multiple annotators (each characterized by a respective classification noise rate). First, we consider the complete information scenario, where the learner knows the noise rates of all the annotators. For this scenario, we derive sample complexity bound for the Minimum Disagreement Algorithm (MDA) on the number of labeled examples to be obtained from each annotator. Next, we consider the incomplete information scenario, where each annotator is strategic and holds the respective noise rate as a private information. For this scenario, we design a cost optimal procurement auction mechanism along the lines of Myerson's optimal auction design framework in a non-trivial manner. This mechanism satisfies incentive compatibility property, thereby facilitating the learner to elicit true noise rates of all the annotators.

* Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2012) 

  Access Model/Code and Paper
Emotions Don't Lie: A Deepfake Detection Method using Audio-Visual Affective Cues

Mar 17, 2020
Trisha Mittal, Uttaran Bhattacharya, Rohan Chandra, Aniket Bera, Dinesh Manocha

We present a learning-based multimodal method for detecting real and deepfake videos. To maximize information for learning, we extract and analyze the similarity between the two audio and visual modalities from within the same video. Additionally, we extract and compare affective cues corresponding to emotion from the two modalities within a video to infer whether the input video is "real" or "fake". We propose a deep learning network, inspired by the Siamese network architecture and the triplet loss. To validate our model, we report the AUC metric on two large-scale, audio-visual deepfake detection datasets, DeepFake-TIMIT Dataset and DFDC. We compare our approach with several SOTA deepfake detection methods and report per-video AUC of 84.4% on the DFDC and 96.6% on the DF-TIMIT datasets, respectively.

  Access Model/Code and Paper
CMetric: A Driving Behavior Measure Using Centrality Functions

Mar 09, 2020
Rohan Chandra, Uttaran Bhattacharya, Trisha Mittal, Aniket Bera, Dinesh Manocha

We present a new measure, CMetric, to classify driver behaviors using centrality functions. Our formulation combines concepts from computational graph theory and social traffic psychology to quantify and classify the behavior of human drivers. CMetric is used to compute the probability of a vehicle executing a driving style, as well as the intensity used to execute the style. Our approach is designed for realtime autonomous driving applications, where the trajectory of each vehicle or road-agent is extracted from a video. We compute a dynamic geometric graph (DGG) based on the positions and proximity of the road-agents and centrality functions corresponding to closeness and degree. These functions are used to compute the CMetric based on style likelihood and style intensity estimates. Our approach is general and makes no assumption about traffic density, heterogeneity, or how driving behaviors change over time. We present efficient techniques to compute CMetric and demonstrate its performance on well-known autonomous driving datasets. We evaluate the accuracy of CMetric and compare with ground truth behavior labels and with that of a human observer by performing a user study over over a long vehicle trajectory.

* Submitted to IROS 2020 

  Access Model/Code and Paper
M3ER: Multiplicative Multimodal Emotion Recognition Using Facial, Textual, and Speech Cues

Nov 22, 2019
Trisha Mittal, Uttaran Bhattacharya, Rohan Chandra, Aniket Bera, Dinesh Manocha

We present M3ER, a learning-based method for emotion recognition from multiple input modalities. Our approach combines cues from multiple co-occurring modalities (such as face, text, and speech) and also is more robust than other methods to sensor noise in any of the individual modalities. M3ER models a novel, data-driven multiplicative fusion method to combine the modalities, which learn to emphasize the more reliable cues and suppress others on a per-sample basis. By introducing a check step which uses Canonical Correlational Analysis to differentiate between ineffective and effective modalities, M3ER is robust to sensor noise. M3ER also generates proxy features in place of the ineffectual modalities. We demonstrate the efficiency of our network through experimentation on two benchmark datasets, IEMOCAP and CMU-MOSEI. We report a mean accuracy of 82.7% on IEMOCAP and 89.0% on CMU-MOSEI, which, collectively, is an improvement of about 5% over prior work.

  Access Model/Code and Paper
RoadTrack: Realtime Tracking of Road Agents in Dense and Heterogeneous Environments

Jul 21, 2019
Rohan Chandra, Uttaran Bhattacharya, Tanmay Randhavane, Aniket Bera, Dinesh Manocha

We present a realtime tracking algorithm, RoadTrack, to track heterogeneous road-agents in dense traffic videos. Our approach is designed for traffic scenarios that consist of different road-agents such as pedestrians, two-wheelers, cars, buses, etc. sharing the road. We use the tracking-by-detection approach where we track a road-agent by matching the appearance or bounding box region in the current frame with the predicted bounding box region propagated from the previous frame. RoadTrack uses a novel motion model called the Simultaneous Collision Avoidance and Interaction (SimCAI) model to predict the motion of road-agents by modeling collision avoidance and interactions between the road-agents for the next frame. We demonstrate the advantage of RoadTrack on a dataset of dense traffic videos and observe an accuracy of 75.8% on this dataset, outperforming prior state-of-the-art tracking algorithms by at least 5.2%. RoadTrack operates in realtime at approximately 30 fps and is at least 4 times faster than prior tracking algorithms on standard tracking datasets.

* revisions include re-organization of the paper, new theoretical results, additional experiments 

  Access Model/Code and Paper
RobustTP: End-to-End Trajectory Prediction for Heterogeneous Road-Agents in Dense Traffic with Noisy Sensor Inputs

Jul 20, 2019
Rohan Chandra, Uttaran Bhattacharya, Christian Roncal, Aniket Bera, Dinesh Manocha

We present RobustTP, an end-to-end algorithm for predicting future trajectories of road-agents in dense traffic with noisy sensor input trajectories obtained from RGB cameras (either static or moving) through a tracking algorithm. In this case, we consider noise as the deviation from the ground truth trajectory. The amount of noise depends on the accuracy of the tracking algorithm. Our approach is designed for dense heterogeneous traffic, where the road agents corresponding to a mixture of buses, cars, scooters, bicycles, or pedestrians. RobustTP is an approach that first computes trajectories using a combination of a non-linear motion model and a deep learning-based instance segmentation algorithm. Next, these noisy trajectories are trained using an LSTM-CNN neural network architecture that models the interactions between road-agents in dense and heterogeneous traffic. Our trajectory prediction algorithm outperforms state-of-the-art methods for end-to-end trajectory prediction using sensor inputs. We achieve an improvement of upto 18% in average displacement error and an improvement ofup to 35.5% in final displacement error at the end of the prediction window (5 seconds) over the next best method. All experiments were set up on an Nvidia TiTan Xp GPU. Additionally, we release a software framework, TrackNPred. The framework consists of implementations of state-of-the-art tracking and trajectory prediction methods and tools to benchmark and evaluate them on real-world dense traffic datasets.

  Access Model/Code and Paper
RoadTrack: Tracking Road Agents in Dense and Heterogeneous Environments

Jun 25, 2019
Rohan Chandra, Uttaran Bhattacharya, Tanmay Randhavane, Aniket Bera, Dinesh Manocha

We present an algorithm to track traffic agents in dense videos. Our approach is designed for heterogeneous traffic scenarios that consist of different agents such as pedestrians, two-wheelers, cars, buses etc. sharing the road. We present a novel Heterogeneous Traffic Motion and Interaction model (HTMI) to predict the motion of agents by modeling collision avoidance and interactions between the agents. We implement HTMI within the tracking-by-detection paradigm and use background subtracted representations of traffic agents to extract binary tensors for accurate tracking. We highlight the performance on a dense traffic videos and observe an accuracy of 75.8%. We observe up to 4 times speedup over prior tracking algorithms on standard traffic datasets.

  Access Model/Code and Paper
EmotiCon: Context-Aware Multimodal Emotion Recognition using Frege's Principle

Mar 14, 2020
Trisha Mittal, Pooja Guhan, Uttaran Bhattacharya, Rohan Chandra, Aniket Bera, Dinesh Manocha

We present EmotiCon, a learning-based algorithm for context-aware perceived human emotion recognition from videos and images. Motivated by Frege's Context Principle from psychology, our approach combines three interpretations of context for emotion recognition. Our first interpretation is based on using multiple modalities(e.g. faces and gaits) for emotion recognition. For the second interpretation, we gather semantic context from the input image and use a self-attention-based CNN to encode this information. Finally, we use depth maps to model the third interpretation related to socio-dynamic interactions and proximity among agents. We demonstrate the efficiency of our network through experiments on EMOTIC, a benchmark dataset. We report an Average Precision (AP) score of 35.48 across 26 classes, which is an improvement of 7-8 over prior methods. We also introduce a new dataset, GroupWalk, which is a collection of videos captured in multiple real-world settings of people walking. We report an AP of 65.83 across 4 categories on GroupWalk, which is also an improvement over prior methods.

  Access Model/Code and Paper
The Liar's Walk: Detecting Deception with Gait and Gesture

Dec 20, 2019
Tanmay Randhavane, Uttaran Bhattacharya, Kyra Kapsaskis, Kurt Gray, Aniket Bera, Dinesh Manocha

We present a data-driven deep neural algorithm for detecting deceptive walking behavior using nonverbal cues like gaits and gestures. We conducted an elaborate user study, where we recorded many participants performing tasks involving deceptive walking. We extract the participants' walking gaits as series of 3D poses. We annotate various gestures performed by participants during their tasks. Based on the gait and gesture data, we train an LSTM-based deep neural network to obtain deep features. Finally, we use a combination of psychology-based gait, gesture, and deep features to detect deceptive walking with an accuracy of 93.4%. This is an improvement of 16.1% over handcrafted gait and gesture features and an improvement of 5.9% and 10.1% over classifiers based on the state-of-the-art emotion and action classification algorithms, respectively. Additionally, we present a novel dataset, DeceptiveWalk, that contains gaits and gestures with their associated deception labels. To the best of our knowledge, ours is the first algorithm to detect deceptive behavior using non-verbal cues of gait and gesture.

* 9 pages, 7 Figures 

  Access Model/Code and Paper