Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Denis Tarasov

Distilling LLMs' Decomposition Abilities into Compact Language Models

Feb 02, 2024
Denis Tarasov, Kumar Shridhar

Large Language Models (LLMs) have demonstrated proficiency in their reasoning abilities, yet their large size presents scalability challenges and limits any further customization. In contrast, compact models offer customized training but often fall short in solving complex reasoning tasks. This study focuses on distilling the LLMs' decomposition skills into compact models using offline reinforcement learning. We leverage the advancements in the LLM`s capabilities to provide feedback and generate a specialized task-specific dataset for training compact models. The development of an AI-generated dataset and the establishment of baselines constitute the primary contributions of our work, underscoring the potential of compact models in replicating complex problem-solving skills.

* https://github.com/DT6A/GSM8K-AI-SubQ

Via

Access Paper or Ask Questions

Katakomba: Tools and Benchmarks for Data-Driven NetHack

Jun 14, 2023
Vladislav Kurenkov, Alexander Nikulin, Denis Tarasov, Sergey Kolesnikov

Figure 1 for Katakomba: Tools and Benchmarks for Data-Driven NetHack

Figure 2 for Katakomba: Tools and Benchmarks for Data-Driven NetHack

Figure 3 for Katakomba: Tools and Benchmarks for Data-Driven NetHack

Figure 4 for Katakomba: Tools and Benchmarks for Data-Driven NetHack

NetHack is known as the frontier of reinforcement learning research where learning-based methods still need to catch up to rule-based solutions. One of the promising directions for a breakthrough is using pre-collected datasets similar to recent developments in robotics, recommender systems, and more under the umbrella of offline reinforcement learning (ORL). Recently, a large-scale NetHack dataset was released; while it was a necessary step forward, it has yet to gain wide adoption in the ORL community. In this work, we argue that there are three major obstacles for adoption: tool-wise, implementation-wise, and benchmark-wise. To address them, we develop an open-source library that provides workflow fundamentals familiar to the ORL community: pre-defined D4RL-style tasks, uncluttered baseline implementations, and reliable evaluation tools with accompanying configs and logs synced to the cloud.

* Source code at https://github.com/tinkoff-ai/katakomba

Via

Access Paper or Ask Questions

Revisiting the Minimalist Approach to Offline Reinforcement Learning

May 16, 2023
Denis Tarasov, Vladislav Kurenkov, Alexander Nikulin, Sergey Kolesnikov

Figure 1 for Revisiting the Minimalist Approach to Offline Reinforcement Learning

Figure 2 for Revisiting the Minimalist Approach to Offline Reinforcement Learning

Figure 3 for Revisiting the Minimalist Approach to Offline Reinforcement Learning

Figure 4 for Revisiting the Minimalist Approach to Offline Reinforcement Learning

Recent years have witnessed significant advancements in offline reinforcement learning (RL), resulting in the development of numerous algorithms with varying degrees of complexity. While these algorithms have led to noteworthy improvements, many incorporate seemingly minor design choices that impact their effectiveness beyond core algorithmic advances. However, the effect of these design choices on established baselines remains understudied. In this work, we aim to bridge this gap by conducting a retrospective analysis of recent works in offline RL and propose ReBRAC, a minimalistic algorithm that integrates such design elements built on top of the TD3+BC method. We evaluate ReBRAC on 51 datasets with both proprioceptive and visual state spaces using D4RL and V-D4RL benchmarks, demonstrating its state-of-the-art performance among ensemble-free methods. To further illustrate the efficacy of these design choices, we perform a large-scale ablation study and hyperparameter sensitivity analysis on the scale of thousands of experiments.

* Source code: https://github.com/tinkoff-ai/ReBRAC

Via

Access Paper or Ask Questions

Anti-Exploration by Random Network Distillation

Jan 31, 2023
Alexander Nikulin, Vladislav Kurenkov, Denis Tarasov, Sergey Kolesnikov

Figure 1 for Anti-Exploration by Random Network Distillation

Figure 2 for Anti-Exploration by Random Network Distillation

Figure 3 for Anti-Exploration by Random Network Distillation

Figure 4 for Anti-Exploration by Random Network Distillation

Despite the success of Random Network Distillation (RND) in various domains, it was shown as not discriminative enough to be used as an uncertainty estimator for penalizing out-of-distribution actions in offline reinforcement learning. In this paper, we revisit these results and show that, with a naive choice of conditioning for the RND prior, it becomes infeasible for the actor to effectively minimize the anti-exploration bonus and discriminativity is not an issue. We show that this limitation can be avoided with conditioning based on Feature-wise Linear Modulation (FiLM), resulting in a simple and efficient ensemble-free algorithm based on Soft Actor-Critic. We evaluate it on the D4RL benchmark, showing that it is capable of achieving performance comparable to ensemble-based methods and outperforming ensemble-free approaches by a wide margin.

* Source code: https://github.com/tinkoff-ai/sac-rnd

Via

Access Paper or Ask Questions

Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size

Nov 20, 2022
Alexander Nikulin, Vladislav Kurenkov, Denis Tarasov, Dmitry Akimov, Sergey Kolesnikov

Figure 1 for Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size

Figure 2 for Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size

Figure 3 for Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size

Figure 4 for Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size

Training large neural networks is known to be time-consuming, with the learning duration taking days or even weeks. To address this problem, large-batch optimization was introduced. This approach demonstrated that scaling mini-batch sizes with appropriate learning rate adjustments can speed up the training process by orders of magnitude. While long training time was not typically a major issue for model-free deep offline RL algorithms, recently introduced Q-ensemble methods achieving state-of-the-art performance made this issue more relevant, notably extending the training duration. In this work, we demonstrate how this class of methods can benefit from large-batch optimization, which is commonly overlooked by the deep offline RL community. We show that scaling the mini-batch size and naively adjusting the learning rate allows for (1) a reduced size of the Q-ensemble, (2) stronger penalization of out-of-distribution actions, and (3) improved convergence time, effectively shortening training duration by 3-4x times on average.

* Accepted at 3rd Offline Reinforcement Learning Workshop at Neural Information Processing Systems, 2022

Via

Access Paper or Ask Questions

Let Offline RL Flow: Training Conservative Agents in the Latent Space of Normalizing Flows

Nov 20, 2022
Dmitriy Akimov, Vladislav Kurenkov, Alexander Nikulin, Denis Tarasov, Sergey Kolesnikov

Figure 1 for Let Offline RL Flow: Training Conservative Agents in the Latent Space of Normalizing Flows

Figure 2 for Let Offline RL Flow: Training Conservative Agents in the Latent Space of Normalizing Flows

Figure 3 for Let Offline RL Flow: Training Conservative Agents in the Latent Space of Normalizing Flows

Figure 4 for Let Offline RL Flow: Training Conservative Agents in the Latent Space of Normalizing Flows

Offline reinforcement learning aims to train a policy on a pre-recorded and fixed dataset without any additional environment interactions. There are two major challenges in this setting: (1) extrapolation error caused by approximating the value of state-action pairs not well-covered by the training data and (2) distributional shift between behavior and inference policies. One way to tackle these problems is to induce conservatism - i.e., keeping the learned policies closer to the behavioral ones. To achieve this, we build upon recent works on learning policies in latent action spaces and use a special form of Normalizing Flows for constructing a generative model, which we use as a conservative action encoder. This Normalizing Flows action encoder is pre-trained in a supervised manner on the offline dataset, and then an additional policy model - controller in the latent space - is trained via reinforcement learning. This approach avoids querying actions outside of the training dataset and therefore does not require additional regularization for out-of-dataset actions. We evaluate our method on various locomotion and navigation tasks, demonstrating that our approach outperforms recently proposed algorithms with generative action models on a large portion of datasets.

* Accepted at 3rd Offline Reinforcement Learning Workshop at Neural Information Processing Systems, 2022

Via

Access Paper or Ask Questions

CORL: Research-oriented Deep Offline Reinforcement Learning Library

Oct 13, 2022
Denis Tarasov, Alexander Nikulin, Dmitry Akimov, Vladislav Kurenkov, Sergey Kolesnikov

Figure 1 for CORL: Research-oriented Deep Offline Reinforcement Learning Library

Figure 2 for CORL: Research-oriented Deep Offline Reinforcement Learning Library

Figure 3 for CORL: Research-oriented Deep Offline Reinforcement Learning Library

Figure 4 for CORL: Research-oriented Deep Offline Reinforcement Learning Library

CORL is an open-source library that provides single-file implementations of Deep Offline Reinforcement Learning algorithms. It emphasizes a simple developing experience with a straightforward codebase and a modern analysis tracking tool. In CORL, we isolate methods implementation into distinct single files, making performance-relevant details easier to recognise. Additionally, an experiment tracking feature is available to help log metrics, hyperparameters, dependencies, and more to the cloud. Finally, we have ensured the reliability of the implementations by benchmarking a commonly employed D4RL benchmark. The source code can be found https://github.com/tinkoff-ai/CORL

Via

Access Paper or Ask Questions

Inception Architecture and Residual Connections in Classification of Breast Cancer Histology Images

Dec 10, 2019
Mohammad Ibrahim Sarker, Hyongsuk Kim, Denis Tarasov, Dinar Akhmetzanov

Figure 1 for Inception Architecture and Residual Connections in Classification of Breast Cancer Histology Images

Figure 2 for Inception Architecture and Residual Connections in Classification of Breast Cancer Histology Images

Figure 3 for Inception Architecture and Residual Connections in Classification of Breast Cancer Histology Images

This paper presents results of applying Inception v4 deep convolutional neural network to ICIAR-2018 Breast Cancer Classification Grand Challenge, part a. The Challenge task is to classify breast cancer biopsy results, presented in form of hematoxylin and eosin stained images. Breast cancer classification is of primary interest to the medical practitioners and thus binary classification of breast cancer images have been under investigation by many researchers, but multi-class categorization of histology breast images have been challenging due to the subtle differences among the categories. In this work extensive data augmentation is conducted to reduce overfitting and effectiveness of committee of several Inception v4 networks is studied. We report 89% accuracy on 4 class classification task and 93.7% on carcinoma/non-carcinoma two class classification task using our test set of 80 images.

* Achieved 23rd place out if 50 accepted positions (ICIAR Grand Challenge on Brest cancer histology images)

Via

Access Paper or Ask Questions