Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials
Oct 20, 2012
Philipp Krähenbühl, Vladlen Koltun
Most state-of-the-art techniques for multi-class image segmentation and labeling use conditional random fields defined over pixels or image regions. While region-level models often feature dense pairwise connectivity, pixel-level models are considerably larger and have only permitted sparse graph structures. In this paper, we consider fully connected CRF models defined on the complete set of pixels in an image. The resulting graphs have billions of edges, making traditional inference algorithms impractical. Our main contribution is a highly efficient approximate inference algorithm for fully connected CRF models in which the pairwise edge potentials are defined by a linear combination of Gaussian kernels. Our experiments demonstrate that dense connectivity at the pixel level substantially improves segmentation and labeling accuracy.
Oct 20, 2012
Philipp Krähenbühl, Vladlen Koltun
* Advances in Neural Information Processing Systems 24 (2011) 109-117
* NIPS 2011
Click to Read Paper
Bottom-up Object Detection by Grouping Extreme and Center Points
Feb 03, 2019
Xingyi Zhou, Jiacheng Zhuo, Philipp Krähenbühl
Feb 03, 2019
Xingyi Zhou, Jiacheng Zhuo, Philipp Krähenbühl




Click to Read Paper
The ability of the Generative Adversarial Networks (GANs) framework to learn generative models mapping from simple latent distributions to arbitrarily complex data distributions has been demonstrated empirically, with compelling results showing that the latent space of such generators captures semantic variation in the data distribution. Intuitively, models trained to predict these semantic latent representations given data may serve as useful feature representations for auxiliary problems where semantics are relevant. However, in their existing form, GANs have no means of learning the inverse mapping -- projecting data back into the latent space. We propose Bidirectional Generative Adversarial Networks (BiGANs) as a means of learning this inverse mapping, and demonstrate that the resulting learned feature representation is useful for auxiliary supervised discrimination tasks, competitive with contemporary approaches to unsupervised and self-supervised feature learning.
* Published as a conference paper at ICLR 2017. Changelog: (v7) Table 2 results improved 1-2% due to averaging predictions over 10 crops at test time, as done in Noroozi & Favaro; Table 3 VOC classification results slightly improved due to minor bugfix. (See v6 changelog for previous versions.)
Click to Read Paper
* Published as a conference paper at ICLR 2017. Changelog: (v7) Table 2 results improved 1-2% due to averaging predictions over 10 crops at test time, as done in Noroozi & Favaro; Table 3 VOC classification results slightly improved due to minor bugfix. (See v6 changelog for previous versions.)
Click to Read Paper
Constrained Convolutional Neural Networks for Weakly Supervised Segmentation
Oct 18, 2015
Deepak Pathak, Philipp Krähenbühl, Trevor Darrell
Oct 18, 2015
Deepak Pathak, Philipp Krähenbühl, Trevor Darrell




* 12 pages, ICCV 2015
Click to Read Paper
Learning Data-driven Reflectance Priors for Intrinsic Image Decomposition
Oct 08, 2015
Tinghui Zhou, Philipp Krähenbühl, Alexei A. Efros
Oct 08, 2015
Tinghui Zhou, Philipp Krähenbühl, Alexei A. Efros




* International Conference on Computer Vision (ICCV) 2015
Click to Read Paper
Video Compression through Image Interpolation
Apr 18, 2018
Chao-Yuan Wu, Nayan Singhal, Philipp Krähenbühl
Apr 18, 2018
Chao-Yuan Wu, Nayan Singhal, Philipp Krähenbühl




* Project page: https://chaoyuaw.github.io/vcii/
Click to Read Paper
Data-dependent Initializations of Convolutional Neural Networks
Sep 22, 2016
Philipp Krähenbühl, Carl Doersch, Jeff Donahue, Trevor Darrell
Convolutional Neural Networks spread through computer vision like a wildfire, impacting almost all visual tasks imaginable. Despite this, few researchers dare to train their models from scratch. Most work builds on one of a handful of ImageNet pre-trained models, and fine-tunes or adapts these for specific tasks. This is in large part due to the difficulty of properly initializing these networks from scratch. A small miscalibration of the initial weights leads to vanishing or exploding gradients, as well as poor convergence properties. In this work we present a fast and simple data-dependent initialization procedure, that sets the weights of a network such that all units in the network train at roughly the same rate, avoiding vanishing or exploding gradients. Our initialization matches the current state-of-the-art unsupervised or self-supervised pre-training methods on standard computer vision tasks, such as image classification and object detection, while being roughly three orders of magnitude faster. When combined with pre-training methods, our initialization significantly outperforms prior work, narrowing the gap between supervised and unsupervised pre-training.
Sep 22, 2016
Philipp Krähenbühl, Carl Doersch, Jeff Donahue, Trevor Darrell
* ICLR 2016
Click to Read Paper
Generative Visual Manipulation on the Natural Image Manifold
Sep 25, 2016
Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, Alexei A. Efros
Sep 25, 2016
Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, Alexei A. Efros




* In European Conference on Computer Vision (ECCV 2016)
Click to Read Paper
Constrained Structured Regression with Convolutional Neural Networks
Nov 23, 2015
Deepak Pathak, Philipp Krähenbühl, Stella X. Yu, Trevor Darrell
Nov 23, 2015
Deepak Pathak, Philipp Krähenbühl, Stella X. Yu, Trevor Darrell




Click to Read Paper
Learning a Discriminative Model for the Perception of Realism in Composite Images
Oct 02, 2015
Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, Alexei A. Efros
Oct 02, 2015
Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, Alexei A. Efros




* International Conference on Computer Vision (ICCV) 2015
Click to Read Paper
Sampling Matters in Deep Embedding Learning
Jan 16, 2018
Chao-Yuan Wu, R. Manmatha, Alexander J. Smola, Philipp Krähenbühl
Deep embeddings answer one simple question: How similar are two images? Learning these embeddings is the bedrock of verification, zero-shot learning, and visual search. The most prominent approaches optimize a deep convolutional network with a suitable loss function, such as contrastive loss or triplet loss. While a rich line of work focuses solely on the loss functions, we show in this paper that selecting training examples plays an equally important role. We propose distance weighted sampling, which selects more informative and stable examples than traditional approaches. In addition, we show that a simple margin based loss is sufficient to outperform all other loss functions. We evaluate our approach on the Stanford Online Products, CAR196, and the CUB200-2011 datasets for image retrieval and clustering, and on the LFW dataset for face verification. Our method achieves state-of-the-art performance on all of them.
Jan 16, 2018
Chao-Yuan Wu, R. Manmatha, Alexander J. Smola, Philipp Krähenbühl
* Add supplementary material. Paper published in ICCV 2017
Click to Read Paper
Context Encoders: Feature Learning by Inpainting
Nov 21, 2016
Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, Alexei A. Efros
Nov 21, 2016
Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, Alexei A. Efros




* CVPR 2016
* New results on ImageNet Generation
Click to Read Paper
Learning Dense Correspondence via 3D-guided Cycle Consistency
Apr 18, 2016
Tinghui Zhou, Philipp Krähenbühl, Mathieu Aubry, Qixing Huang, Alexei A. Efros
Discriminative deep learning approaches have shown impressive results for problems where human-labeled ground truth is plentiful, but what about tasks where labels are difficult or impossible to obtain? This paper tackles one such problem: establishing dense visual correspondence across different object instances. For this task, although we do not know what the ground-truth is, we know it should be consistent across instances of that category. We exploit this consistency as a supervisory signal to train a convolutional neural network to predict cross-instance correspondences between pairs of images depicting objects of the same category. For each pair of training images we find an appropriate 3D CAD model and render two synthetic views to link in with the pair, establishing a correspondence flow 4-cycle. We use ground-truth synthetic-to-synthetic correspondences, provided by the rendering engine, to train a ConvNet to predict synthetic-to-real, real-to-real and real-to-synthetic correspondences that are cycle-consistent with the ground-truth. At test time, no CAD models are required. We demonstrate that our end-to-end trained ConvNet supervised by cycle-consistency outperforms state-of-the-art pairwise matching methods in correspondence-related tasks.
Apr 18, 2016
Tinghui Zhou, Philipp Krähenbühl, Mathieu Aubry, Qixing Huang, Alexei A. Efros
* To appear in CVPR 2016 (oral presentation)
Click to Read Paper
Long-Term Feature Banks for Detailed Video Understanding
Dec 12, 2018
Chao-Yuan Wu, Christoph Feichtenhofer, Haoqi Fan, Kaiming He, Philipp Krähenbühl, Ross Girshick
Dec 12, 2018
Chao-Yuan Wu, Christoph Feichtenhofer, Haoqi Fan, Kaiming He, Philipp Krähenbühl, Ross Girshick




* Technical report
Click to Read Paper
Assessing Generalization in Deep Reinforcement Learning
Oct 29, 2018
Charles Packer, Katelyn Gao, Jernej Kos, Philipp Krähenbühl, Vladlen Koltun, Dawn Song
Oct 29, 2018
Charles Packer, Katelyn Gao, Jernej Kos, Philipp Krähenbühl, Vladlen Koltun, Dawn Song




* 18 pages, 6 figures
Click to Read Paper
Compressed Video Action Recognition
Mar 29, 2018
Chao-Yuan Wu, Manzil Zaheer, Hexiang Hu, R. Manmatha, Alexander J. Smola, Philipp Krähenbühl
Mar 29, 2018
Chao-Yuan Wu, Manzil Zaheer, Hexiang Hu, R. Manmatha, Alexander J. Smola, Philipp Krähenbühl




* CVPR 2018 (Selected for spotlight presentation)
Click to Read Paper
Joint Monocular 3D Vehicle Detection and Tracking
Dec 02, 2018
Hou-Ning Hu, Qi-Zhi Cai, Dequan Wang, Ji Lin, Min Sun, Philipp Krähenbühl, Trevor Darrell, Fisher Yu
Dec 02, 2018
Hou-Ning Hu, Qi-Zhi Cai, Dequan Wang, Ji Lin, Min Sun, Philipp Krähenbühl, Trevor Darrell, Fisher Yu




* 14 pages, 11 figures. Fix table values misplacement and typos. All the results unchanged
Click to Read Paper