Models, code, and papers for "Renan Souza":
Multi-party Conversational Systems are systems with natural language interaction between one or more people or systems. From the moment that an utterance is sent to a group, to the moment that it is replied in the group by a member, several activities must be done by the system: utterance understanding, information search, reasoning, among others. In this paper we present the challenges of designing and building multi-party conversational systems, the state of the art, our proposed hybrid architecture using both rules and machine learning and some insights after implementing and evaluating one on the finance domain.
Machine Learning (ML) has become essential in several industries. In Computational Science and Engineering (CSE), the complexity of the ML lifecycle comes from the large variety of data, scientists' expertise, tools, and workflows. If data are not tracked properly during the lifecycle, it becomes unfeasible to recreate a ML model from scratch or to explain to stakeholders how it was created. The main limitation of provenance tracking solutions is that they cannot cope with provenance capture and integration of domain and ML data processed in the multiple workflows in the lifecycle while keeping the provenance capture overhead low. To handle this problem, in this paper we contribute with a detailed characterization of provenance data in the ML lifecycle in CSE; a new provenance data representation, called PROV-ML, built on top of W3C PROV and ML Schema; and extensions to a system that tracks provenance from multiple workflows to address the characteristics of ML and CSE, and to allow for provenance queries with a standard vocabulary. We show a practical use in a real case in the Oil and Gas industry, along with its evaluation using 48 GPUs in parallel.
We use large-scale commonsense knowledge bases, e.g. ConceptNet, to provide context cues to establish semantic relationships among entities directly hypothesized from video signal, such as putative object and actions labels, and infer a deeper interpretation of events than what is directly sensed. One approach is to learn semantic relationships between objects and actions from training annotations of videos and as such, depend largely on statistics of the vocabulary in these annotations. However, the use of prior encoded commonsense knowledge sources alleviates this dependence on large annotated training datasets. We represent interpretations using a connected structure of basic detected (grounded) concepts, such as objects and actions, that are bound by semantics with other background concepts not directly observed, i.e. contextualization cues. We mathematically express this using the language of Grenander's pattern generator theory. Concepts are basic generators and the bonds are defined by the semantic relationships between concepts. We formulate an inference engine based on energy minimization using an efficient Markov Chain Monte Carlo that uses the ConceptNet in its move proposals to find these structures. Using three different publicly available datasets, Breakfast, CMU Kitchen and MSVD, whose distribution of possible interpretations span more than 150000 possible solutions for over 5000 videos, we show that the proposed model can generate video interpretations whose quality are comparable or better than those reported by approaches such as discriminative approaches, hidden Markov models, context free grammars, deep learning models, and prior pattern theory approaches, all of whom rely on learning from domain-specific training data.