Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Maya Srikanth

An Empirical Study of Automated Mislabel Detection in Real World Vision Datasets

Dec 02, 2023
Maya Srikanth, Jeremy Irvin, Brian Wesley Hill, Felipe Godoy, Ishan Sabane, Andrew Y. Ng

Major advancements in computer vision can primarily be attributed to the use of labeled datasets. However, acquiring labels for datasets often results in errors which can harm model performance. Recent works have proposed methods to automatically identify mislabeled images, but developing strategies to effectively implement them in real world datasets has been sparsely explored. Towards improved data-centric methods for cleaning real world vision datasets, we first conduct more than 200 experiments carefully benchmarking recently developed automated mislabel detection methods on multiple datasets under a variety of synthetic and real noise settings with varying noise levels. We compare these methods to a Simple and Efficient Mislabel Detector (SEMD) that we craft, and find that SEMD performs similarly to or outperforms prior mislabel detection approaches. We then apply SEMD to multiple real world computer vision datasets and test how dataset size, mislabel removal strategy, and mislabel removal amount further affect model performance after retraining on the cleaned data. With careful design of the approach, we find that mislabel removal leads per-class performance improvements of up to 8% of a retrained classifier in smaller data regimes.

Via

Access Paper or Ask Questions

Dynamic Social Media Monitoring for Fast-Evolving Online Discussions

Feb 24, 2021
Maya Srikanth, Anqi Liu, Nicholas Adams-Cohen, Jian Cao, R. Michael Alvarez, Anima Anandkumar

Figure 1 for Dynamic Social Media Monitoring for Fast-Evolving Online Discussions

Figure 2 for Dynamic Social Media Monitoring for Fast-Evolving Online Discussions

Figure 3 for Dynamic Social Media Monitoring for Fast-Evolving Online Discussions

Figure 4 for Dynamic Social Media Monitoring for Fast-Evolving Online Discussions

Tracking and collecting fast-evolving online discussions provides vast data for studying social media usage and its role in people's public lives. However, collecting social media data using a static set of keywords fails to satisfy the growing need to monitor dynamic conversations and to study fast-changing topics. We propose a dynamic keyword search method to maximize the coverage of relevant information in fast-evolving online discussions. The method uses word embedding models to represent the semantic relations between keywords and predictive models to forecast the future time series. We also implement a visual user interface to aid in the decision-making process in each round of keyword updates. This allows for both human-assisted tracking and fully-automated data collection. In simulations using historical #MeToo data in 2017, our human-assisted tracking method outperforms the traditional static baseline method significantly, with 37.1% higher F-1 score than traditional static monitors in tracking the top trending keywords. We conduct a contemporary case study to cover dynamic conversations about the recent Presidential Inauguration and to test the dynamic data collection system. Our case studies reflect the effectiveness of our process and also points to the potential challenges in future deployment.

* Preprint, Under Review

Via

Access Paper or Ask Questions

Finding Social Media Trolls: Dynamic Keyword Selection Methods for Rapidly-Evolving Online Debates

Nov 16, 2019
Anqi Liu, Maya Srikanth, Nicholas Adams-Cohen, R. Michael Alvarez, Anima Anandkumar

Figure 1 for Finding Social Media Trolls: Dynamic Keyword Selection Methods for Rapidly-Evolving Online Debates

Figure 2 for Finding Social Media Trolls: Dynamic Keyword Selection Methods for Rapidly-Evolving Online Debates

Online harassment is a significant social problem. Prevention of online harassment requires rapid detection of harassing, offensive, and negative social media posts. In this paper, we propose the use of word embedding models to identify offensive and harassing social media messages in two aspects: detecting fast-changing topics for more effective data collection and representing word semantics in different domains. We demonstrate with preliminary results that using the GloVe (Global Vectors for Word Representation) model facilitates the discovery of new and relevant keywords to use for data collection and trolling detection. Our paper concludes with a discussion of a research agenda to further develop and test word embedding models for identification of social media harassment and trolling.

* AI for Social Good workshop at NeurIPS (2019)

Via

Access Paper or Ask Questions