Models, code, and papers for "Fiona L":
Conversational agents, also known as chatbots, are versatile tools that have the potential of being used in dialogical argumentation. They could possibly be deployed in tasks such as persuasion for behaviour change (e.g. persuading people to eat more fruit, to take regular exercise, etc.) However, to achieve this, there is a need to develop methods for acquiring appropriate arguments and counterargument that reflect both sides of the discussion. For instance, to persuade someone to do regular exercise, the chatbot needs to know counterarguments that the user might have for not doing exercise. To address this need, we present methods for acquiring arguments and counterarguments, and importantly, meta-level information that can be useful for deciding when arguments can be used during an argumentation dialogue. We evaluate these methods in studies with participants and show how harnessing these methods in a chatbot can make it more persuasive.
Much research in computational argumentation assumes that arguments and counterarguments can be obtained in some way. Yet, to improve and apply models of argument, we need methods for acquiring them. Current approaches include argument mining from text, hand coding of arguments by researchers, or generating arguments from knowledge bases. In this paper, we propose a new approach, which we call argument harvesting, that uses a chatbot to enter into a dialogue with a participant to get arguments and counterarguments from him or her. Because it is automated, the chatbot can be used repeatedly in many dialogues, and thereby it can generate a large corpus. We describe the architecture of the chatbot, provide methods for managing a corpus of arguments and counterarguments, and an evaluation of our approach in a case study concerning attitudes of women to participation in sport.
The key issues pertaining to collection of epidemic disease data for our analysis purposes are that it is a labour intensive, time consuming and expensive process resulting in availability of sparse sample data which we use to develop prediction models. To address this sparse data issue, we present novel Incremental Transductive methods to circumvent the data collection process by applying previously acquired data to provide consistent, confidence-based labelling alternatives to field survey research. We investigated various reasoning approaches for semisupervised machine learning including Bayesian models for labelling data. The results show that using the proposed methods, we can label instances of data with a class of vector density at a high level of confidence. By applying the Liberal and Strict Training Approaches, we provide a labelling and classification alternative to standalone algorithms. The methods in this paper are components in the process of reducing the proliferation of the Schistosomiasis disease and its effects.
Histopathological assessments, including surgical resection and core needle biopsy, are the standard procedures in the diagnosis of the prostate cancer. Current interpretation of the histopathology images includes the determination of the tumor area, Gleason grading, and identification of certain prognosis-critical features. Such a process is not only tedious, but also prune to intra/inter-observe variabilities. Recently, FDA cleared the marketing of the first whole slide imaging system for digital pathology. This opens a new era for the computer aided prostate image analysis and feature extraction based on the digital histopathology images. In this work, we present an analysis pipeline that includes localization of the cancer region, grading, area ratio of different Gleason grades, and cytological/architectural feature extraction. The proposed algorithm combines the human engineered feature extraction as well as those learned by the deep neural network. Moreover, the entire pipeline is implemented to directly operate on the whole slide images produced by the digital scanners and is therefore potentially easy to translate into clinical practices. The algorithm is tested on 368 whole slide images from the TCGA data set and achieves an overall accuracy of 75% in differentiating Gleason 3+4 with 4+3 slides.
Community annotation of biological entities with concepts from multiple bio-ontologies has created large and growing repositories of ontology-based annotation data with embedded implicit relationships among orthogonal ontologies. Development of efficient data mining methods and metrics to mine and assess the quality of the mined relationships has not kept pace with the growth of annotation data. In this study, we present a data mining method that uses ontology-guided generalization to discover relationships across ontologies along with a new interestingness metric based on information theory. We apply our data mining algorithm and interestingness measures to datasets from the Gene Expression Database at the Mouse Genome Informatics as a preliminary proof of concept to mine relationships between developmental stages in the mouse anatomy ontology and Gene Ontology concepts (biological process, molecular function and cellular component). In addition, we present a comparison of our interestingness metric to four existing metrics. Ontology-based annotation datasets provide a valuable resource for discovery of relationships across ontologies. The use of efficient data mining methods and appropriate interestingness metrics enables the identification of high quality relationships.
The energy landscape for the Low-Voltage (LV) networks are beginning to change; changes resulted from the increase penetration of renewables and/or the predicted increase of electric vehicles charging at home. The previously passive `fit-and-forget' approach to LV network management will be inefficient to ensure its effective operations. A more adaptive approach is required that includes the prediction of risk and capacity of the circuits. Many of the proposed methods require full observability of the networks, motivating the installations of smart meters and advance metering infrastructure in many countries. However, the expectation of `perfect data' is unrealistic in operational reality. Smart meter (SM) roll-out can have its issues, which may resulted in low-likelihood of full SM coverage for all LV networks. This, together with privacy requirements that limit the availability of high granularity demand power data have resulted in the low uptake of many of the presented methods. To address this issue, Deep Learning Neural Network is proposed to predict the voltage distribution with partial SM coverage. The results show that SM measurements from key locations are sufficient for effective prediction of voltage distribution.
The aim of behaviour change is to help people to change aspects of their behaviour for the better (e.g., to decrease calorie intake, to drink in moderation, to take more exercise, to complete a course of antibiotics once started, etc.). In current persuasion technology for behaviour change, the emphasis is on helping people to explore their issues (e.g., through questionnaires or game playing) or to remember to follow a behaviour change plan (e.g., diaries and email reminders). However, recent developments in computational persuasion are leading to an argument-centric approach to persuasion that can potentially be harnessed in behaviour change applications. In this paper, we review developments in computational persuasion, and then focus on domain modelling as a key component. We present a multi-dimensional approach to domain modelling. At the core of this proposal is an ontology which provides a representation of key factors, in particular kinds of belief, which we have identified in the behaviour change literature as being important in diverse behaviour change initiatives. Our proposal for domain modelling is intended to facilitate the acquisition and representation of the arguments that can be used in persuasion dialogues, together with meta-level information about them which can be used by the persuader to make strategic choices of argument to present.
Many industries are now investing heavily in data science and automation to replace manual tasks and/or to help with decision making, especially in the realm of leveraging computer vision to automate many monitoring, inspection, and surveillance tasks. This has resulted in the emergence of the 'data scientist' who is conversant in statistical thinking, machine learning (ML), computer vision, and computer programming. However, as ML becomes more accessible to the general public and more aspects of ML become automated, applications leveraging computer vision are increasingly being created by non-experts with less opportunity for regulatory oversight. This points to the overall need for more educated responsibility for these lay-users of usable ML tools in order to mitigate potentially unethical ramifications. In this paper, we undertake a SWOT analysis to study the strengths, weaknesses, opportunities, and threats of building usable ML tools for mass adoption for important areas leveraging ML such as computer vision. The paper proposes a set of data science literacy criteria for educating and supporting lay-users in the responsible development and deployment of ML applications.
As in many other scientific domains, we face a fundamental problem when using machine learning to identify proteins from mass spectrometry data: large ground truth datasets mapping inputs to correct outputs are extremely difficult to obtain. Instead, we have access to imperfect hand-coded models crafted by domain experts. In this paper, we apply deep neural networks to an important step of the protein identification problem, the pairing of mass spectra with short sequences of amino acids called peptides. We train our model to differentiate between top scoring results from a state-of-the art classical system and hard-negative second and third place results. Our resulting model is much better at identifying peptides with spectra than the model used to generate its training data. In particular, we achieve a 43% improvement over standard matching methods and a 10% improvement over a combination of the matching method and an industry standard cross-spectra reranking tool. Importantly, in a more difficult experimental regime that reflects current challenges facing biologists, our advantage over the previous state-of-the-art grows to 15% even after reranking. We believe this approach will generalize to other challenging scientific problems.
In this study we assessed the repeatability of the values of radiomics features for small prostate tumors using test-retest? Multiparametric Magnetic Resonance Imaging (mpMRI) images. The premise of radiomics is that quantitative image features can serve as biomarkers characterizing disease. For such biomarkers to be useful, repeatability is a basic requirement, meaning its value must remain stable between two scans, if the conditions remain stable. We investigated repeatability of radiomics features under various preprocessing and extraction configurations including various image normalization schemes, different image pre-filtering, 2D vs 3D texture computation, and different bin widths for image discretization. Image registration as means to re-identify regions of interest across time points was evaluated against human-expert segmented regions in both time points. Even though we found many radiomics features and preprocessing combinations with a high repeatability (Intraclass Correlation Coefficient (ICC) > 0.85), our results indicate that overall the repeatability is highly sensitive to the processing parameters (under certain configurations, it can be below 0.0). Image normalization, using a variety of approaches considered, did not result in consistent improvements in repeatability. There was also no consistent improvement of repeatability through the use of pre-filtering options, or by using image registration between timepoints to improve consistency of the region of interest localization. Based on these results we urge caution when interpreting radiomics features and advise paying close attention to the processing configuration details of reported results. Furthermore, we advocate reporting all processing details in radiomics studies and strongly recommend making the implementation available.