Models, code, and papers for "Martin C":

An iterative feature selection method for GRNs inference by exploring topological properties

Jul 25, 2011
Fabrício Martins Lopes, David C. Martins-Jr, Junior Barrera, Roberto M. Cesar-Jr

An important problem in bioinformatics is the inference of gene regulatory networks (GRN) from temporal expression profiles. In general, the main limitations faced by GRN inference methods is the small number of samples with huge dimensionalities and the noisy nature of the expression measurements. In face of these limitations, alternatives are needed to get better accuracy on the GRNs inference problem. This work addresses this problem by presenting an alternative feature selection method that applies prior knowledge on its search strategy, called SFFS-BA. The proposed search strategy is based on the Sequential Floating Forward Selection (SFFS) algorithm, with the inclusion of a scale-free (Barab\'asi-Albert) topology information in order to guide the search process to improve inference. The proposed algorithm explores the scale-free property by pruning the search space and using a power law as a weight for reducing it. In this way, the search space traversed by the SFFS-BA method combines a breadth-first search when the number of combinations is small (<k> <= 2) with a depth-first search when the number of combinations becomes explosive (<k> >= 3), being guided by the scale-free prior information. Experimental results show that the SFFS-BA provides a better inference similarities than SFS and SFFS, keeping the robustness of the SFS and SFFS methods, thus presenting very good results.

* 10 pages, 5 figures, SFFS search method based on scale-free network topology 

  Click for Model/Code and Paper
A Deep Neural Network for Pixel-Level Electromagnetic Particle Identification in the MicroBooNE Liquid Argon Time Projection Chamber

Aug 22, 2018
MicroBooNE collaboration, C. Adams, M. Alrashed, R. An, J. Anthony, J. Asaadi, A. Ashkenazi, M. Auger, S. Balasubramanian, B. Baller, C. Barnes, G. Barr, M. Bass, F. Bay, A. Bhat, K. Bhattacharya, M. Bishai, A. Blake, T. Bolton, L. Camilleri, D. Caratelli, I. Caro Terrazas, R. Carr, R. Castillo Fernandez, F. Cavanna, G. Cerati, Y. Chen, E. Church, D. Cianci, E. Cohen, G. H. Collin, J. M. Conrad, M. Convery, L. Cooper-Troendle, J. I. Crespo-Anadon, M. Del Tutto, D. Devitt, A. Diaz, K. Duffy, S. Dytman, B. Eberly, A. Ereditato, L. Escudero Sanchez, J. Esquivel, J. J. Evans, A. A. Fadeeva, R. S. Fitzpatrick, B. T. Fleming, D. Franco, A. P. Furmanski, D. Garcia-Gamez, G. T. Garvey, V. Genty, D. Goeldi, S. Gollapinni, O. Goodwin, E. Gramellini, H. Greenlee, R. Grosso, R. Guenette, P. Guzowski, A. Hackenburg, P. Hamilton, O. Hen, J. Hewes, C. Hill, G. A. Horton-Smith, A. Hourlier, E. -C. Huang, C. James, J. Jan de Vries, L. Jiang, R. A. Johnson, J. Joshi, H. Jostlein, Y. -J. Jwa, G. Karagiorgi, W. Ketchum, B. Kirby, M. Kirby, T. Kobilarcik, I. Kreslo, Y. Li, A. Lister, B. R. Littlejohn, S. Lockwitz, D. Lorca, W. C. Louis, M. Luethi, B. Lundberg, X. Luo, A. Marchionni, S. Marcocci, C. Mariani, J. Marshall, J. Martin-Albo, D. A. Martinez Caicedo, A. Mastbaum, V. Meddage, T. Mettler, G. B. Mills, K. Mistry, A. Mogan, J. Moon, M. Mooney, C. D. Moore, J. Mousseau, M. Murphy, R. Murrells, D. Naples, P. Nienaber, J. Nowak, O. Palamara, V. Pandey, V. Paolone, A. Papadopoulou, V. Papavassiliou, S. F. Pate, Z. Pavlovic, E. Piasetzky, D. Porzio, G. Pulliam, X. Qian, J. L. Raaf, A. Rafique, L. Rochester, M. Ross-Lonergan, C. Rudolf von Rohr, B. Russell, D. W. Schmitz, A. Schukraft, W. Seligman, M. H. Shaevitz, R. Sharankova, J. Sinclair, A. Smith, E. L. Snider, M. Soderberg, S. Soldner-Rembold, S. R. Soleti, P. Spentzouris, J. Spitz, J. St. John, T. Strauss, K. Sutton, S. Sword-Fehlberg, A. M. Szelc, N. Tagg, W. Tang, K. Terao, M. Thomson, R. T. Thornton, M. Toups, Y. -T. Tsai, S. Tufanli, T. Usher, W. Van De Pontseele, R. G. Van de Water, B. Viren, M. Weber, H. Wei, D. A. Wickremasinghe, K. Wierman, Z. Williams, S. Wolbers, T. Wongjirad, K. Woodruff, T. Yang, G. Yarbrough, L. E. Yates, G. P. Zeller, J. Zennamo, C. Zhang

We have developed a convolutional neural network (CNN) that can make a pixel-level prediction of objects in image data recorded by a liquid argon time projection chamber (LArTPC) for the first time. We describe the network design, training techniques, and software tools developed to train this network. The goal of this work is to develop a complete deep neural network based data reconstruction chain for the MicroBooNE detector. We show the first demonstration of a network's validity on real LArTPC data using MicroBooNE collection plane images. The demonstration is performed for stopping muon and a $\nu_\mu$ charged current neutral pion data samples.


  Click for Model/Code and Paper
The Power of Arc Consistency for CSPs Defined by Partially-Ordered Forbidden Patterns

Dec 25, 2017
Martin C. Cooper, Stanislav Živný

Characterising tractable fragments of the constraint satisfaction problem (CSP) is an important challenge in theoretical computer science and artificial intelligence. Forbidding patterns (generic sub-instances) provides a means of defining CSP fragments which are neither exclusively language-based nor exclusively structure-based. It is known that the class of binary CSP instances in which the broken-triangle pattern (BTP) does not occur, a class which includes all tree-structured instances, are decided by arc consistency (AC), a ubiquitous reduction operation in constraint solvers. We provide a characterisation of simple partially-ordered forbidden patterns which have this AC-solvability property. It turns out that BTP is just one of five such AC-solvable patterns. The four other patterns allow us to exhibit new tractable classes.

* Logical Methods in Computer Science, Volume 13, Issue 4 (December 27, 2017) lmcs:4168 

  Click for Model/Code and Paper
Tractable Triangles and Cross-Free Convexity in Discrete Optimisation

Jan 23, 2014
Martin C. Cooper, Stanislav Živný

The minimisation problem of a sum of unary and pairwise functions of discrete variables is a general NP-hard problem with wide applications such as computing MAP configurations in Markov Random Fields (MRF), minimising Gibbs energy, or solving binary Valued Constraint Satisfaction Problems (VCSPs). We study the computational complexity of classes of discrete optimisation problems given by allowing only certain types of costs in every triangle of variable-value assignments to three distinct variables. We show that for several computational problems, the only non- trivial tractable classes are the well known maximum matching problem and the recently discovered joint-winner property. Our results, apart from giving complete classifications in the studied cases, provide guidance in the search for hybrid tractable classes; that is, classes of problems that are not captured by restrictions on the functions (such as submodularity) or the structure of the problem graph (such as bounded treewidth). Furthermore, we introduce a class of problems with convex cardinality functions on cross-free sets of assignments. We prove that while imposing only one of the two conditions renders the problem NP-hard, the conjunction of the two gives rise to a novel tractable class satisfying the cross-free convexity property, which generalises the joint-winner property to problems of unbounded arity.

* Journal Of Artificial Intelligence Research, Volume 44, pages 455-490, 2012 
* arXiv admin note: text overlap with arXiv:1008.4035 by other authors 

  Click for Model/Code and Paper
A Dichotomy for 2-Constraint Forbidden CSP Patterns

Jan 18, 2012
Martin C. Cooper, Guillaume Escamocher

Although the CSP (constraint satisfaction problem) is NP-complete, even in the case when all constraints are binary, certain classes of instances are tractable. We study classes of instances defined by excluding subproblems. This approach has recently led to the discovery of novel tractable classes. The complete characterisation of all tractable classes defined by forbidding patterns (where a pattern is simply a compact representation of a set of subproblems) is a challenging problem. We demonstrate a dichotomy in the case of forbidden patterns consisting of either one or two constraints. This has allowed us to discover new tractable classes including, for example, a novel generalisation of 2SAT.

* 19 pages 

  Click for Model/Code and Paper
Hybrid tractability of soft constraint problems

Aug 24, 2010
Martin C. Cooper, Stanislav Zivny

The constraint satisfaction problem (CSP) is a central generic problem in computer science and artificial intelligence: it provides a common framework for many theoretical problems as well as for many real-life applications. Soft constraint problems are a generalisation of the CSP which allow the user to model optimisation problems. Considerable effort has been made in identifying properties which ensure tractability in such problems. In this work, we initiate the study of hybrid tractability of soft constraint problems; that is, properties which guarantee tractability of the given soft constraint problem, but which do not depend only on the underlying structure of the instance (such as being tree-structured) or only on the types of soft constraints in the instance (such as submodularity). We present several novel hybrid classes of soft constraint problems, which include a machine scheduling problem, constraint problems of arbitrary arities with no overlapping nogoods, and the SoftAllDiff constraint with arbitrary unary soft constraints. An important tool in our investigation will be the notion of forbidden substructures.

* Artificial Intelligence 175(9-10) 1555-1569 (2011) 
* A full version of a CP'10 paper, 26 pages 

  Click for Model/Code and Paper
Efficient Implementation of a Semantic-based Transfer Approach

Jul 09, 1996
Michael Dorna, Martin C. Emele

This article gives an overview of a new semantic-based transfer approach developed and applied within the Verbmobil Machine Translation project. We present the declarative transfer formalism and discuss its implementation.

* 5 pages (to appear in Proceedings of ECAI '96), uuencoded gzipped PS file 

  Click for Model/Code and Paper
Semantic-based Transfer

Jul 09, 1996
Michael Dorna, Martin C. Emele

This article presents a new semantic-based transfer approach developed and applied within the Verbmobil Machine Translation project. We give an overview of the declarative transfer formalism together with its procedural realization. Our approach is discussed and compared with several other approaches from the MT literature.

* 6 pages (to appear in Proceedings of COLING '96) 

  Click for Model/Code and Paper
Deep learning-based survival prediction for multiple cancer types using histopathology images

Dec 16, 2019
Ellery Wulczyn, David F. Steiner, Zhaoyang Xu, Apaar Sadhwani, Hongwu Wang, Isabelle Flament, Craig H. Mermel, Po-Hsuan Cameron Chen, Yun Liu, Martin C. Stumpe

Prognostic information at diagnosis has important implications for cancer treatment and monitoring. Although cancer staging, histopathological assessment, molecular features, and clinical variables can provide useful prognostic insights, improving risk stratification remains an active research area. We developed a deep learning system (DLS) to predict disease specific survival across 10 cancer types from The Cancer Genome Atlas (TCGA). We used a weakly-supervised approach without pixel-level annotations, and tested three different survival loss functions. The DLS was developed using 9,086 slides from 3,664 cases and evaluated using 3,009 slides from 1,216 cases. In multivariable Cox regression analysis of the combined cohort including all 10 cancers, the DLS was significantly associated with disease specific survival (hazard ratio of 1.58, 95% CI 1.28-1.70, p<0.0001) after adjusting for cancer type, stage, age, and sex. In a per-cancer adjusted subanalysis, the DLS remained a significant predictor of survival in 5 of 10 cancer types. Compared to a baseline model including stage, age, and sex, the c-index of the model demonstrated an absolute 3.7% improvement (95% CI 1.0-6.5) in the combined cohort. Additionally, our models stratified patients within individual cancer stages, particularly stage II (p=0.025) and stage III (p<0.001). By developing and evaluating prognostic models across multiple cancer types, this work represents one of the most comprehensive studies exploring the direct prediction of clinical outcomes using deep learning and histopathology images. Our analysis demonstrates the potential for this approach to provide prognostic information in multiple cancer types, and even within specific pathologic stages. However, given the relatively small number of clinical events, we observed wide confidence intervals, suggesting that future work will benefit from larger datasets.


  Click for Model/Code and Paper
On Backdoors To Tractable Constraint Languages

Oct 10, 2014
Clement Carbonnel, Martin C. Cooper, Emmanuel Hebrard

In the context of CSPs, a strong backdoor is a subset of variables such that every complete assignment yields a residual instance guaranteed to have a specified property. If the property allows efficient solving, then a small strong backdoor provides a reasonable decomposition of the original instance into easy instances. An important challenge is the design of algorithms that can find quickly a small strong backdoor if one exists. We present a systematic study of the parameterized complexity of backdoor detection when the target property is a restricted type of constraint language defined by means of a family of polymorphisms. In particular, we show that under the weak assumption that the polymorphisms are idempotent, the problem is unlikely to be FPT when the parameter is either r (the constraint arity) or k (the size of the backdoor) unless P = NP or FPT = W[2]. When the parameter is k+r, however, we are able to identify large classes of languages for which the problem of finding a small backdoor is FPT.

* 16 pages 

  Click for Model/Code and Paper
Comunication-Efficient Algorithms for Statistical Optimization

Oct 11, 2013
Yuchen Zhang, John C. Duchi, Martin Wainwright

We analyze two communication-efficient algorithms for distributed statistical optimization on large-scale data sets. The first algorithm is a standard averaging method that distributes the $N$ data samples evenly to $\nummac$ machines, performs separate minimization on each subset, and then averages the estimates. We provide a sharp analysis of this average mixture algorithm, showing that under a reasonable set of conditions, the combined parameter achieves mean-squared error that decays as $\order(N^{-1}+(N/m)^{-2})$. Whenever $m \le \sqrt{N}$, this guarantee matches the best possible rate achievable by a centralized algorithm having access to all $\totalnumobs$ samples. The second algorithm is a novel method, based on an appropriate form of bootstrap subsampling. Requiring only a single round of communication, it has mean-squared error that decays as $\order(N^{-1} + (N/m)^{-3})$, and so is more robust to the amount of parallelization. In addition, we show that a stochastic gradient-based method attains mean-squared error decaying as $O(N^{-1} + (N/ m)^{-3/2})$, easing computation at the expense of penalties in the rate of convergence. We also provide experimental evaluation of our methods, investigating their performance both on simulated data and on a large-scale regression problem from the internet search domain. In particular, we show that our methods can be used to efficiently solve an advertisement prediction problem from the Chinese SoSo Search Engine, which involves logistic regression with $N \approx 2.4 \times 10^8$ samples and $d \approx 740,000$ covariates.

* 44 pages, to appear in Journal of Machine Learning Research (JMLR) 

  Click for Model/Code and Paper
Memoryless Control Design for Persistent Surveillance under Safety Constraints

Nov 08, 2012
Eduardo Arvelo, Eric Kim, Nuno C. Martins

This paper deals with the design of time-invariant memoryless control policies for robots that move in a finite two- dimensional lattice and are tasked with persistent surveillance of an area in which there are forbidden regions. We model each robot as a controlled Markov chain whose state comprises its position in the lattice and the direction of motion. The goal is to find the minimum number of robots and an associated time-invariant memoryless control policy that guarantees that the largest number of states are persistently surveilled without ever visiting a forbidden state. We propose a design method that relies on a finitely parametrized convex program inspired by entropy maximization principles. Numerical examples are provided.


  Click for Model/Code and Paper
Divide and Conquer Kernel Ridge Regression: A Distributed Algorithm with Minimax Optimal Rates

Apr 29, 2014
Yuchen Zhang, John C. Duchi, Martin J. Wainwright

We establish optimal convergence rates for a decomposition-based scalable approach to kernel ridge regression. The method is simple to describe: it randomly partitions a dataset of size N into m subsets of equal size, computes an independent kernel ridge regression estimator for each subset, then averages the local solutions into a global predictor. This partitioning leads to a substantial reduction in computation time versus the standard approach of performing kernel ridge regression on all N samples. Our two main theorems establish that despite the computational speed-up, statistical optimality is retained: as long as m is not too large, the partition-based estimator achieves the statistical minimax rate over all estimators using the set of N samples. As concrete examples, our theory guarantees that the number of processors m may grow nearly linearly for finite-rank kernels and Gaussian kernels and polynomially in N for Sobolev spaces, which in turn allows for substantial reductions in computational cost. We conclude with experiments on both simulated data and a music-prediction task that complement our theoretical results, exhibiting the computational and statistical benefits of our approach.


  Click for Model/Code and Paper
Privacy Aware Learning

Oct 10, 2013
John C. Duchi, Michael I. Jordan, Martin J. Wainwright

We study statistical risk minimization problems under a privacy model in which the data is kept confidential even from the learner. In this local privacy framework, we establish sharp upper and lower bounds on the convergence rates of statistical estimation procedures. As a consequence, we exhibit a precise tradeoff between the amount of privacy the data preserves and the utility, as measured by convergence rate, of any statistical estimator or learning procedure.

* 60 pages 

  Click for Model/Code and Paper
A branch-and-bound feature selection algorithm for U-shaped cost functions

Oct 30, 2008
Marcelo Ris, Junior Barrera, David C. Martins Jr

This paper presents the formulation of a combinatorial optimization problem with the following characteristics: i.the search space is the power set of a finite set structured as a Boolean lattice; ii.the cost function forms a U-shaped curve when applied to any lattice chain. This formulation applies for feature selection in the context of pattern recognition. The known approaches for this problem are branch-and-bound algorithms and heuristics, that explore partially the search space. Branch-and-bound algorithms are equivalent to the full search, while heuristics are not. This paper presents a branch-and-bound algorithm that differs from the others known by exploring the lattice structure and the U-shaped chain curves of the search space. The main contribution of this paper is the architecture of this algorithm that is based on the representation and exploration of the search space by new lattice properties proven here. Several experiments, with well known public data, indicate the superiority of the proposed method to SFFS, which is a popular heuristic that gives good results in very short computational time. In all experiments, the proposed method got better or equal results in similar or even smaller computational time.


  Click for Model/Code and Paper
Steepest ascent can be exponential in bounded treewidth problems

Dec 02, 2019
David A. Cohen, Martin C. Cooper, Artem Kaznatcheev, Mark Wallace

We investigate the complexity of local search based on steepest ascent. We show that even when all variables have domains of size two and the underlying constraint graph of variable interactions has bounded treewidth (in our construction, treewidth 7), there are fitness landscapes for which an exponential number of steps may be required to reach a local optimum. This is an improvement on prior recursive constructions of long steepest ascents, which we prove to need constraint graphs of unbounded treewidth.

* 8 pages main text, 4 pages appendix, 1 page references; fixed error in f(a,b) to match code 

  Click for Model/Code and Paper
On Singleton Arc Consistency for CSPs Defined by Monotone Patterns

Aug 10, 2018
Clement Carbonnel, David A. Cohen, Martin C. Cooper, Stanislav Zivny

Singleton arc consistency is an important type of local consistency which has been recently shown to solve all constraint satisfaction problems (CSPs) over constraint languages of bounded width. We aim to characterise all classes of CSPs defined by a forbidden pattern that are solved by singleton arc consistency and closed under removing constraints. We identify five new patterns whose absence ensures solvability by singleton arc consistency, four of which are provably maximal and three of which generalise 2-SAT. Combined with simple counter-examples for other patterns, we make significant progress towards a complete classification.

* v4: Full version of a STACS'18 paper; improved presentation 

  Click for Model/Code and Paper
Variable and value elimination in binary constraint satisfaction via forbidden patterns

Feb 12, 2015
David A. Cohen, Martin C. Cooper, Guillaume Escamocher, Stanislav Zivny

Variable or value elimination in a constraint satisfaction problem (CSP) can be used in preprocessing or during search to reduce search space size. A variable elimination rule (value elimination rule) allows the polynomial-time identification of certain variables (domain elements) whose elimination, without the introduction of extra compensatory constraints, does not affect the satisfiability of an instance. We show that there are essentially just four variable elimination rules and three value elimination rules defined by forbidding generic sub-instances, known as irreducible existential patterns, in arc-consistent CSP instances. One of the variable elimination rules is the already-known Broken Triangle Property, whereas the other three are novel. The three value elimination rules can all be seen as strict generalisations of neighbourhood substitution.

* Journal of Computer and System Sciences 81(7) 1127-1143 (2015) 
* A full version of an IJCAI'13 paper to appear in Journal of Computer and System Sciences (JCSS) 

  Click for Model/Code and Paper
Randomized Smoothing for Stochastic Optimization

Apr 07, 2012
John C. Duchi, Peter L. Bartlett, Martin J. Wainwright

We analyze convergence rates of stochastic optimization procedures for non-smooth convex optimization problems. By combining randomized smoothing techniques with accelerated gradient methods, we obtain convergence rates of stochastic optimization procedures, both in expectation and with high probability, that have optimal dependence on the variance of the gradient estimates. To the best of our knowledge, these are the first variance-based rates for non-smooth optimization. We give several applications of our results to statistical estimation problems, and provide experimental results that demonstrate the effectiveness of the proposed algorithms. We also describe how a combination of our algorithm with recent work on decentralized optimization yields a distributed stochastic optimization algorithm that is order-optimal.

* 39 pages, 3 figures 

  Click for Model/Code and Paper