Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kurt Partridge

Mixed Federated Learning: Joint Decentralized and Centralized Learning

May 26, 2022
Sean Augenstein, Andrew Hard, Lin Ning, Karan Singhal, Satyen Kale, Kurt Partridge, Rajiv Mathews

Figure 1 for Mixed Federated Learning: Joint Decentralized and Centralized Learning

Figure 2 for Mixed Federated Learning: Joint Decentralized and Centralized Learning

Figure 3 for Mixed Federated Learning: Joint Decentralized and Centralized Learning

Figure 4 for Mixed Federated Learning: Joint Decentralized and Centralized Learning

Federated learning (FL) enables learning from decentralized privacy-sensitive data, with computations on raw data confined to take place at edge clients. This paper introduces mixed FL, which incorporates an additional loss term calculated at the coordinating server (while maintaining FL's private data restrictions). There are numerous benefits. For example, additional datacenter data can be leveraged to jointly learn from centralized (datacenter) and decentralized (federated) training data and better match an expected inference data distribution. Mixed FL also enables offloading some intensive computations (e.g., embedding regularization) to the server, greatly reducing communication and client computation load. For these and other mixed FL use cases, we present three algorithms: PARALLEL TRAINING, 1-WAY GRADIENT TRANSFER, and 2-WAY GRADIENT TRANSFER. We state convergence bounds for each, and give intuition on which are suited to particular mixed FL problems. Finally we perform extensive experiments on three tasks, demonstrating that mixed FL can blend training data to achieve an oracle's accuracy on an inference distribution, and can reduce communication and computation overhead by over 90%. Our experiments confirm theoretical predictions of how algorithms perform under different mixed FL problem settings.

* 36 pages, 12 figures

Via

Access Paper or Ask Questions

Production federated keyword spotting via distillation, filtering, and joint federated-centralized training

Apr 11, 2022
Andrew Hard, Kurt Partridge, Neng Chen, Sean Augenstein, Aishanee Shah, Hyun Jin Park, Alex Park, Sara Ng, Jessica Nguyen, Ignacio Lopez Moreno, Rajiv Mathews, Françoise Beaufays

Figure 1 for Production federated keyword spotting via distillation, filtering, and joint federated-centralized training

Figure 2 for Production federated keyword spotting via distillation, filtering, and joint federated-centralized training

Figure 3 for Production federated keyword spotting via distillation, filtering, and joint federated-centralized training

Figure 4 for Production federated keyword spotting via distillation, filtering, and joint federated-centralized training

We trained a keyword spotting model using federated learning on real user devices and observed significant improvements when the model was deployed for inference on phones. To compensate for data domains that are missing from on-device training caches, we employed joint federated-centralized training. And to learn in the absence of curated labels on-device, we formulated a confidence filtering strategy based on user-feedback signals for federated distillation. These techniques created models that significantly improved quality metrics in offline evaluations and user-experience metrics in live A/B experiments.

* Submitted to Interspeech 2022

Via

Access Paper or Ask Questions

Jointly Learning from Decentralized (Federated) and Centralized Data to Mitigate Distribution Shift

Nov 23, 2021
Sean Augenstein, Andrew Hard, Kurt Partridge, Rajiv Mathews

Figure 1 for Jointly Learning from Decentralized (Federated) and Centralized Data to Mitigate Distribution Shift

With privacy as a motivation, Federated Learning (FL) is an increasingly used paradigm where learning takes place collectively on edge devices, each with a cache of user-generated training examples that remain resident on the local device. These on-device training examples are gathered in situ during the course of users' interactions with their devices, and thus are highly reflective of at least part of the inference data distribution. Yet a distribution shift may still exist; the on-device training examples may lack for some data inputs expected to be encountered at inference time. This paper proposes a way to mitigate this shift: selective usage of datacenter data, mixed in with FL. By mixing decentralized (federated) and centralized (datacenter) data, we can form an effective training data distribution that better matches the inference data distribution, resulting in more useful models while still meeting the private training data access constraints imposed by FL.

* 9 pages, 1 figure. Camera-ready NeurIPS 2021 DistShift workshop version

Via

Access Paper or Ask Questions

Training Keyword Spotting Models on Non-IID Data with Federated Learning

Jun 04, 2020
Andrew Hard, Kurt Partridge, Cameron Nguyen, Niranjan Subrahmanya, Aishanee Shah, Pai Zhu, Ignacio Lopez Moreno, Rajiv Mathews

Figure 1 for Training Keyword Spotting Models on Non-IID Data with Federated Learning

Figure 2 for Training Keyword Spotting Models on Non-IID Data with Federated Learning

Figure 3 for Training Keyword Spotting Models on Non-IID Data with Federated Learning

Figure 4 for Training Keyword Spotting Models on Non-IID Data with Federated Learning

We demonstrate that a production-quality keyword-spotting model can be trained on-device using federated learning and achieve comparable false accept and false reject rates to a centrally-trained model. To overcome the algorithmic constraints associated with fitting on-device data (which are inherently non-independent and identically distributed), we conduct thorough empirical studies of optimization algorithms and hyperparameter configurations using large-scale federated simulations. To overcome resource constraints, we replace memory intensive MTR data augmentation with SpecAugment, which reduces the false reject rate by 56%. Finally, to label examples (given the zero visibility into on-device data), we explore teacher-student training.

* Submitted to Interspeech 2020

Via

Access Paper or Ask Questions