Models, code, and papers for "Suzanne R":
The United States is in the midst of an opioid epidemic with recent estimates indicating that more than 130 people die every day due to drug overdose. The over-prescription and addiction to opioid painkillers, heroin, and synthetic opioids, has led to a public health crisis and created a huge social and economic burden. Statistical learning methods that use data from multiple clinical centers across the US to detect opioid over-prescribing trends and predict possible opioid misuse are required. However, the semantic heterogeneity in the representation of clinical data across different centers makes the development and evaluation of such methods difficult and non-trivial. We create the Opioid Drug Knowledge Graph (ODKG) -- a network of opioid-related drugs, active ingredients, formulations, combinations, and brand names. We use the ODKG to normalize drug strings in a clinical data warehouse consisting of patient data from over 400 healthcare facilities in 42 different states. We showcase the use of ODKG to generate summary statistics of opioid prescription trends across US regions. These methods and resources can aid the development of advanced and scalable models to monitor the opioid epidemic and to detect illicit opioid misuse behavior. Our work is relevant to policymakers and pain researchers who wish to systematically assess factors that contribute to opioid over-prescribing and iatrogenic opioid addiction in the US.
Objectives: Most cancer data sources lack information on metastatic recurrence. Electronic medical records (EMRs) and population-based cancer registries contain complementary information on cancer treatment and outcomes, yet are rarely used synergistically. To enable detection of metastatic breast cancer (MBC), we applied a semi-supervised machine learning framework to linked EMR-California Cancer Registry (CCR) data. Materials and Methods: We studied 11,459 female patients treated at Stanford Health Care who received an incident breast cancer diagnosis from 2000-2014. The dataset consisted of structured data and unstructured free-text clinical notes from EMR, linked to CCR, a component of the Surveillance, Epidemiology and End Results (SEER) database. We extracted information on metastatic disease from patient notes to infer a class label and then trained a regularized logistic regression model for MBC classification. We evaluated model performance on a gold standard set of set of 146 patients. Results: There are 495 patients with de novo stage IV MBC, 1,374 patients initially diagnosed with Stage 0-III disease had recurrent MBC, and 9,590 had no evidence of metastatis. The median follow-up time is 96.3 months (mean 97.8, standard deviation 46.7). The best-performing model incorporated both EMR and CCR features. The area under the receiver-operating characteristic curve=0.925 [95% confidence interval: 0.880-0.969], sensitivity=0.861, specificity=0.878 and overall accuracy=0.870. Discussion and Conclusion: A framework for MBC case detection combining EMR and CCR data achieved good sensitivity, specificity and discrimination without requiring expert-labeled examples. This approach enables population-based research on how patients die from cancer and may identify novel predictors of cancer recurrence.
A primary motivation for research in Digital Ecosystems is the desire to exploit the self-organising properties of natural ecosystems. Ecosystems arc thought to be robust, scalable architectures that can automatically solve complex, dynamic problems. However, the biological processes that contribute to these properties have not been made explicit in Digital Ecosystem research. Here, we introduce how biological properties contribute to the self-organising features of natural ecosystems. These properties include populations of evolving agents, a complex dynamic environment, and spatial distributions which generate local interactions. The potential for exploiting these properties in artificial systems is then considered.
The present study is focused on the automatic identification and description of frozen similes in British and French novels written between the 19 th century and the beginning of the 20 th century. Two main patterns of frozen similes were considered: adjectival ground + simile marker + nominal vehicle (e.g. happy as a lark) and eventuality + simile marker + nominal vehicle (e.g. sleep like a top). All potential similes and their components were first extracted using a rule-based algorithm. Then, frozen similes were identified based on reference lists of existing similes and semantic distance between the tenor and the vehicle. The results obtained tend to confirm the fact that frozen similes are not used haphazardly in literary texts. In addition, contrary to how they are often presented, frozen similes often go beyond the ground or the eventuality and the vehicle to also include the tenor.
Similes play an important role in literary texts not only as rhetorical devices and as figures of speech but also because of their evocative power, their aptness for description and the relative ease with which they can be combined with other figures of speech (Israel et al. 2004). Detecting all types of simile constructions in a particular text therefore seems crucial when analysing the style of an author. Few research studies however have been dedicated to the study of less prominent simile markers in fictional prose and their relevance for stylistic studies. The present paper studies the frequency of adjective and verb simile markers in a corpus of British and French novels in order to determine which ones are really informative and worth including in a stylistic analysis. Furthermore, are those adjectives and verb simile markers used differently in both languages?
This work tackles the Pixel Privacy task put forth by MediaEval 2019. Our goal is to manipulate images in a way that conceals them from automatic scene classifiers while preserving the original image quality. We use the fast gradient sign method, which normally has a corrupting influence on image appeal, and devise two methods to minimize the damage. The first approach uses a map of pixel locations that are either salient or flat, and directs perturbations away from them. The second approach subtracts the gradient of an aesthetics evaluation model from the gradient of the attack model to guide the perturbations towards a direction that preserves appeal. We make our code available at: https://git.io/JesXr.
In contrast to many decades of research on oral code-switching, the study of written multilingual productions has only recently enjoyed a surge of interest. Many open questions remain regarding the sociolinguistic underpinnings of written code-switching, and progress has been limited by a lack of suitable resources. We introduce a novel, large, and diverse dataset of written code-switched productions, curated from topical threads of multiple bilingual communities on the Reddit discussion platform, and explore questions that were mainly addressed in the context of spoken language thus far. We investigate whether findings in oral code-switching concerning content and style, as well as speaker proficiency, are carried over into written code-switching in discussion forums. The released dataset can further facilitate a range of research and practical activities.
Recent work has attempted to characterize the structure of semantic memory and the search algorithms which, together, best approximate human patterns of search revealed in a semantic fluency task. There are a number of models that seek to capture semantic search processes over networks, but they vary in the cognitive plausibility of their implementation. Existing work has also neglected to consider the constraints that the incremental process of language acquisition must place on the structure of semantic memory. Here we present a model that incrementally updates a semantic network, with limited computational steps, and replicates many patterns found in human semantic fluency using a simple random walk. We also perform thorough analyses showing that a combination of both structural and semantic features are correlated with human performance patterns.
People exhibit a tendency to generalize a novel noun to the basic-level in a hierarchical taxonomy -- a cognitively salient category such as "dog" -- with the degree of generalization depending on the number and type of exemplars. Recently, a change in the presentation timing of exemplars has also been shown to have an effect, surprisingly reversing the prior observed pattern of basic-level generalization. We explore the precise mechanisms that could lead to such behavior by extending a computational model of word learning and word generalization to integrate cognitive processes of memory and attention. Our results show that the interaction of forgetting and attention to novelty, as well as sensitivity to both type and token frequencies of exemplars, enables the model to replicate the empirical results from different presentation timings. Our results reinforce the need to incorporate general cognitive processes within word learning models to better understand the range of observed behaviors in vocabulary acquisition.
Recent empirical and modeling research has focused on the semantic fluency task because it is informative about semantic memory. An interesting interplay arises between the richness of representations in semantic memory and the complexity of algorithms required to process it. It has remained an open question whether representations of words and their relations learned from language use can enable a simple search algorithm to mimic the observed behavior in the fluency task. Here we show that it is plausible to learn rich representations from naturalistic data for which a very simple search algorithm (a random walk) can replicate the human patterns. We suggest that explicitly structuring knowledge about words into a semantic network plays a crucial role in modeling human behavior in memory search and retrieval; moreover, this is the case across a range of semantic information sources.
Like any large system development effort, the construction of a complex belief network model requires systems engineering to manage the design and construction process. We propose a rapid prototyping approach to network engineering. We describe criteria for identifying network modules and the use of "stubs" to represent not-yet-constructed modules. We propose an object oriented representation for belief networks which captures the semantics of the problem in addition to conditional independencies and probabilities. Methods for evaluating complex belief network models are discussed. The ideas are illustrated with examples from a large belief network construction problem in the military intelligence domain.
In most current applications of belief networks, domain knowledge is represented by a single belief network that applies to all problem instances in the domain. In more complex domains, problem-specific models must be constructed from a knowledge base encoding probabilistic relationships in the domain. Most work in knowledge-based model construction takes the rule as the basic unit of knowledge. We present a knowledge representation framework that permits the knowledge base designer to specify knowledge in larger semantically meaningful units which we call network fragments. Our framework provides for representation of asymmetric independence and canonical intercausal interaction. We discuss the combination of network fragments to form problem-specific models to reason about particular problem instances. The framework is illustrated using examples from the domain of military situation awareness.
This paper describes a process for constructing situation-specific belief networks from a knowledge base of network fragments. A situation-specific network is a minimal query complete network constructed from a knowledge base in response to a query for the probability distribution on a set of target variables given evidence and context variables. We present definitions of query completeness and situation-specific networks. We describe conditions on the knowledge base that guarantee query completeness. The relationship of our work to earlier work on KBMC is also discussed.
This paper extends previous work with network fragments and situation-specific network construction. We formally define the asymmetry network, an alternative representation for a conditional probability table. We also present an object-oriented representation for partially specified asymmetry networks. We show that the representation is parsimonious. We define an algebra for the elements of the representation that allows us to 'factor' any CPT and to soundly combine the partially specified asymmetry networks.
We view Digital Ecosystems to be the digital counterparts of biological ecosystems. Here, we are concerned with the creation of these Digital Ecosystems, exploiting the self-organising properties of biological ecosystems to evolve high-level software applications. Therefore, we created the Digital Ecosystem, a novel optimisation technique inspired by biological ecosystems, where the optimisation works at two levels: a first optimisation, migration of agents which are distributed in a decentralised peer-to-peer network, operating continuously in time; this process feeds a second optimisation based on evolutionary computing that operates locally on single peers and is aimed at finding solutions to satisfy locally relevant constraints. The Digital Ecosystem was then measured experimentally through simulations, with measures originating from theoretical ecology, evaluating its likeness to biological ecosystems. This included its responsiveness to requests for applications from the user base, as a measure of the ecological succession (ecosystem maturity). Overall, we have advanced the understanding of Digital Ecosystems, creating Ecosystem-Oriented Architectures where the word ecosystem is more than just a metaphor.
Computational research on error detection in second language speakers has mainly addressed clear grammatical anomalies typical to learners at the beginner-to-intermediate level. We focus instead on acquisition of subtle semantic nuances of English indefinite pronouns by non-native speakers at varying levels of proficiency. We first lay out theoretical, linguistically motivated hypotheses, and supporting empirical evidence on the nature of the challenges posed by indefinite pronouns to English learners. We then suggest and evaluate an automatic approach for detection of atypical usage patterns, demonstrating that deep learning architectures are promising for this task involving nuanced semantic anomalies.
A fundamental question in language learning concerns the role of a speaker's first language in second language acquisition. We present a novel methodology for studying this question: analysis of eye-movement patterns in second language reading of free-form text. Using this methodology, we demonstrate for the first time that the native language of English learners can be predicted from their gaze fixations when reading English. We provide analysis of classifier uncertainty and learned features, which indicates that differences in English reading are likely to be rooted in linguistic divergences across native languages. The presented framework complements production studies and offers new ground for advancing research on multilingualism.
Children can use the statistical regularities of their environment to learn word meanings, a mechanism known as cross-situational learning. We take a computational approach to investigate how the information present during each observation in a cross-situational framework can affect the overall acquisition of word meanings. We do so by formulating various in-the-moment learning mechanisms that are sensitive to different statistics of the environment, such as counts and conditional probabilities. Each mechanism introduces a unique source of competition or mutual exclusivity bias to the model; the mechanism that maximally uses the model's knowledge of word meanings performs the best. Moreover, the gap between this mechanism and others is amplified in more challenging learning scenarios, such as learning from few examples.
Contemporary research on computational processing of linguistic metaphors is divided into two main branches: metaphor recognition and metaphor interpretation. We take a different line of research and present an automated method for generating conceptual metaphors from linguistic data. Given the generated conceptual metaphors, we find corresponding linguistic metaphors in corpora. In this paper, we describe our approach and its evaluation using English and Russian data.
This paper considers the problem of knowledge-based model construction in the presence of uncertainty about the association of domain entities to random variables. Multi-entity Bayesian networks (MEBNs) are defined as a representation for knowledge in domains characterized by uncertainty in the number of relevant entities, their interrelationships, and their association with observables. An MEBN implicitly specifies a probability distribution in terms of a hierarchically structured collection of Bayesian network fragments that together encode a joint probability distribution over arbitrarily many interrelated hypotheses. Although a finite query-complete model can always be constructed, association uncertainty typically makes exact model construction and evaluation intractable. The objective of hypothesis management is to balance tractability against accuracy. We describe an application to the problem of using intelligence reports to infer the organization and activities of groups of military vehicles. Our approach is compared to related work in the tracking and fusion literature.