Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ahmed Abdelali

LLMeBench: A Flexible Framework for Accelerating LLMs Benchmarking

Aug 09, 2023
Fahim Dalvi, Maram Hasanain, Sabri Boughorbel, Basel Mousi, Samir Abdaljalil, Nizi Nazar, Ahmed Abdelali, Shammur Absar Chowdhury, Hamdy Mubarak, Ahmed Ali, Majd Hawasly, Nadir Durrani, Firoj Alam

Figure 1 for LLMeBench: A Flexible Framework for Accelerating LLMs Benchmarking

Figure 2 for LLMeBench: A Flexible Framework for Accelerating LLMs Benchmarking

Figure 3 for LLMeBench: A Flexible Framework for Accelerating LLMs Benchmarking

The recent development and success of Large Language Models (LLMs) necessitate an evaluation of their performance across diverse NLP tasks in different languages. Although several frameworks have been developed and made publicly available, their customization capabilities for specific tasks and datasets are often complex for different users. In this study, we introduce the LLMeBench framework. Initially developed to evaluate Arabic NLP tasks using OpenAI's GPT and BLOOM models; it can be seamlessly customized for any NLP task and model, regardless of language. The framework also features zero- and few-shot learning settings. A new custom dataset can be added in less than 10 minutes, and users can use their own model API keys to evaluate the task at hand. The developed framework has been already tested on 31 unique NLP tasks using 53 publicly available datasets within 90 experimental setups, involving approximately 296K data points. We plan to open-source the framework for the community (https://github.com/qcri/LLMeBench/). A video demonstrating the framework is available online (https://youtu.be/FkQn4UjYA0s).

* Foundation Models, Large Language Models, NLP, CHatGPT Evaluation, LLMs Benchmark

Via

Access Paper or Ask Questions

Benchmarking Arabic AI with Large Language Models

May 24, 2023
Ahmed Abdelali, Hamdy Mubarak, Shammur Absar Chowdhury, Maram Hasanain, Basel Mousi, Sabri Boughorbel, Yassine El Kheir, Daniel Izham, Fahim Dalvi, Majd Hawasly, Nizi Nazar, Yousseif Elshahawy, Ahmed Ali, Nadir Durrani, Natasa Milic-Frayling, Firoj Alam

Figure 1 for Benchmarking Arabic AI with Large Language Models

Figure 2 for Benchmarking Arabic AI with Large Language Models

Figure 3 for Benchmarking Arabic AI with Large Language Models

Figure 4 for Benchmarking Arabic AI with Large Language Models

With large Foundation Models (FMs), language technologies (AI in general) are entering a new paradigm: eliminating the need for developing large-scale task-specific datasets and supporting a variety of tasks through set-ups ranging from zero-shot to few-shot learning. However, understanding FMs capabilities requires a systematic benchmarking effort by comparing FMs performance with the state-of-the-art (SOTA) task-specific models. With that goal, past work focused on the English language and included a few efforts with multiple languages. Our study contributes to ongoing research by evaluating FMs performance for standard Arabic NLP and Speech processing, including a range of tasks from sequence tagging to content classification across diverse domains. We start with zero-shot learning using GPT-3.5-turbo, Whisper, and USM, addressing 33 unique tasks using 59 publicly available datasets resulting in 96 test setups. For a few tasks, FMs performs on par or exceeds the performance of the SOTA models but for the majority it under-performs. Given the importance of prompt for the FMs performance, we discuss our prompt strategies in detail and elaborate on our findings. Our future work on Arabic AI will explore few-shot prompting, expand the range of tasks, and investigate additional open-source models.

* Foundation Models, Large Language Models, Arabic NLP, Arabic Speech, Arabic AI, , CHatGPT Evaluation, USM Evaluation, Whisper Evaluation

Via

Access Paper or Ask Questions

Post-hoc analysis of Arabic transformer models

Oct 18, 2022
Ahmed Abdelali, Nadir Durrani, Fahim Dalvi, Hassan Sajjad

Figure 1 for Post-hoc analysis of Arabic transformer models

Figure 2 for Post-hoc analysis of Arabic transformer models

Figure 3 for Post-hoc analysis of Arabic transformer models

Figure 4 for Post-hoc analysis of Arabic transformer models

Arabic is a Semitic language which is widely spoken with many dialects. Given the success of pre-trained language models, many transformer models trained on Arabic and its dialects have surfaced. While there have been an extrinsic evaluation of these models with respect to downstream NLP tasks, no work has been carried out to analyze and compare their internal representations. We probe how linguistic information is encoded in the transformer models, trained on different Arabic dialects. We perform a layer and neuron analysis on the models using morphological tagging tasks for different dialects of Arabic and a dialectal identification task. Our analysis enlightens interesting findings such as: i) word morphology is learned at the lower and middle layers, ii) while syntactic dependencies are predominantly captured at the higher layers, iii) despite a large overlap in their vocabulary, the MSA-based models fail to capture the nuances of Arabic dialects, iv) we found that neurons in embedding layers are polysemous in nature, while the neurons in middle layers are exclusive to specific properties

* BlackboxNLP 2022. arXiv admin note: substantial text overlap with arXiv:2201.07434

Via

Access Paper or Ask Questions

NatiQ: An End-to-end Text-to-Speech System for Arabic

Jun 15, 2022
Ahmed Abdelali, Nadir Durrani, Cenk Demiroglu, Fahim Dalvi, Hamdy Mubarak, Kareem Darwish

Figure 1 for NatiQ: An End-to-end Text-to-Speech System for Arabic

Figure 2 for NatiQ: An End-to-end Text-to-Speech System for Arabic

Figure 3 for NatiQ: An End-to-end Text-to-Speech System for Arabic

Figure 4 for NatiQ: An End-to-end Text-to-Speech System for Arabic

NatiQ is end-to-end text-to-speech system for Arabic. Our speech synthesizer uses an encoder-decoder architecture with attention. We used both tacotron-based models (tacotron-1 and tacotron-2) and the faster transformer model for generating mel-spectrograms from characters. We concatenated Tacotron1 with the WaveRNN vocoder, Tacotron2 with the WaveGlow vocoder and ESPnet transformer with the parallel wavegan vocoder to synthesize waveforms from the spectrograms. We used in-house speech data for two voices: 1) neutral male "Hamza"- narrating general content and news, and 2) expressive female "Amina"- narrating children story books to train our models. Our best systems achieve an average Mean Opinion Score (MOS) of 4.21 and 4.40 for Amina and Hamza respectively. The objective evaluation of the systems using word and character error rate (WER and CER) as well as the response time measured by real-time factor favored the end-to-end architecture ESPnet. NatiQ demo is available on-line at https://tts.qcri.org

Via

Access Paper or Ask Questions

Interpreting Arabic Transformer Models

Jan 19, 2022
Ahmed Abdelali, Nadir Durrani, Fahim Dalvi, Hassan Sajjad

Figure 1 for Interpreting Arabic Transformer Models

Figure 2 for Interpreting Arabic Transformer Models

Figure 3 for Interpreting Arabic Transformer Models

Figure 4 for Interpreting Arabic Transformer Models

Arabic is a Semitic language which is widely spoken with many dialects. Given the success of pre-trained language models, many transformer models trained on Arabic and its dialects have surfaced. While these models have been compared with respect to downstream NLP tasks, no evaluation has been carried out to directly compare the internal representations. We probe how linguistic information is encoded in Arabic pretrained models, trained on different varieties of Arabic language. We perform a layer and neuron analysis on the models using three intrinsic tasks: two morphological tagging tasks based on MSA (modern standard Arabic) and dialectal POS-tagging and a dialectal identification task. Our analysis enlightens interesting findings such as: i) word morphology is learned at the lower and middle layers ii) dialectal identification necessitate more knowledge and hence preserved even in the final layers, iii) despite a large overlap in their vocabulary, the MSA-based models fail to capture the nuances of Arabic dialects, iv) we found that neurons in embedding layers are polysemous in nature, while the neurons in middle layers are exclusive to specific properties.

* 11 pages, 6 figures, 4 tables

Via

Access Paper or Ask Questions

Code-Switching Text Augmentation for Multilingual Speech Processing

Jan 07, 2022
Amir Hussein, Shammur Absar Chowdhury, Ahmed Abdelali, Najim Dehak, Ahmed Ali

Figure 1 for Code-Switching Text Augmentation for Multilingual Speech Processing

Figure 2 for Code-Switching Text Augmentation for Multilingual Speech Processing

Figure 3 for Code-Switching Text Augmentation for Multilingual Speech Processing

Figure 4 for Code-Switching Text Augmentation for Multilingual Speech Processing

The pervasiveness of intra-utterance Code-switching (CS) in spoken content has enforced ASR systems to handle mixed input. Yet, designing a CS-ASR has many challenges, mainly due to the data scarcity, grammatical structure complexity, and mismatch along with unbalanced language usage distribution. Recent ASR studies showed the predominance of E2E-ASR using multilingual data to handle CS phenomena with little CS data. However, the dependency on the CS data still remains. In this work, we propose a methodology to augment the monolingual data for artificially generating spoken CS text to improve different speech modules. We based our approach on Equivalence Constraint theory while exploiting aligned translation pairs, to generate grammatically valid CS content. Our empirical results show a relative gain of 29-34 % in perplexity and around 2% in WER for two ecological and noisy CS test sets. Finally, the human evaluation suggests that 83.8% of the generated data is acceptable to humans.

Via

Access Paper or Ask Questions

Automatic Expansion and Retargeting of Arabic Offensive Language Training

Nov 18, 2021
Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish, Younes Samih

Figure 1 for Automatic Expansion and Retargeting of Arabic Offensive Language Training

Figure 2 for Automatic Expansion and Retargeting of Arabic Offensive Language Training

Figure 3 for Automatic Expansion and Retargeting of Arabic Offensive Language Training

Rampant use of offensive language on social media led to recent efforts on automatic identification of such language. Though offensive language has general characteristics, attacks on specific entities may exhibit distinct phenomena such as malicious alterations in the spelling of names. In this paper, we present a method for identifying entity specific offensive language. We employ two key insights, namely that replies on Twitter often imply opposition and some accounts are persistent in their offensiveness towards specific targets. Using our methodology, we are able to collect thousands of targeted offensive tweets. We show the efficacy of the approach on Arabic tweets with 13% and 79% relative F1-measure improvement in entity specific offensive language detection when using deep-learning based and support vector machine based classifiers respectively. Further, expanding the training set with automatically identified offensive tweets directed at multiple entities can improve F1-measure by 48%.

Via

Access Paper or Ask Questions

Towards One Model to Rule All: Multilingual Strategy for Dialectal Code-Switching Arabic ASR

May 31, 2021
Shammur Absar Chowdhury, Amir Hussein, Ahmed Abdelali, Ahmed Ali

Figure 1 for Towards One Model to Rule All: Multilingual Strategy for Dialectal Code-Switching Arabic ASR

Figure 2 for Towards One Model to Rule All: Multilingual Strategy for Dialectal Code-Switching Arabic ASR

Figure 3 for Towards One Model to Rule All: Multilingual Strategy for Dialectal Code-Switching Arabic ASR

Figure 4 for Towards One Model to Rule All: Multilingual Strategy for Dialectal Code-Switching Arabic ASR

With the advent of globalization, there is an increasing demand for multilingual automatic speech recognition (ASR), handling language and dialectal variation of spoken content. Recent studies show its efficacy over monolingual systems. In this study, we design a large multilingual end-to-end ASR using self-attention based conformer architecture. We trained the system using Arabic (Ar), English (En) and French (Fr) languages. We evaluate the system performance handling: (i) monolingual (Ar, En and Fr); (ii) multi-dialectal (Modern Standard Arabic, along with dialectal variation such as Egyptian and Moroccan); (iii) code-switching -- cross-lingual (Ar-En/Fr) and dialectal (MSA-Egyptian dialect) test cases, and compare with current state-of-the-art systems. Furthermore, we investigate the influence of different embedding/character representations including character vs word-piece; shared vs distinct input symbol per language. Our findings demonstrate the strength of such a model by outperforming state-of-the-art monolingual dialectal Arabic and code-switching Arabic ASR.

* Submitted to INTERSPEECH 2021, Multilingual ASR, Multi-dialectal ASR, Code-Switching ASR, Arabic ASR, Conformer, Transformer, E2E ASR, Speech Recognition, ASR, Arabic, English, French

Via

Access Paper or Ask Questions

Pre-Training BERT on Arabic Tweets: Practical Considerations

Feb 21, 2021
Ahmed Abdelali, Sabit Hassan, Hamdy Mubarak, Kareem Darwish, Younes Samih

Figure 1 for Pre-Training BERT on Arabic Tweets: Practical Considerations

Figure 2 for Pre-Training BERT on Arabic Tweets: Practical Considerations

Figure 3 for Pre-Training BERT on Arabic Tweets: Practical Considerations

Figure 4 for Pre-Training BERT on Arabic Tweets: Practical Considerations

Pretraining Bidirectional Encoder Representations from Transformers (BERT) for downstream NLP tasks is a non-trival task. We pretrained 5 BERT models that differ in the size of their training sets, mixture of formal and informal Arabic, and linguistic preprocessing. All are intended to support Arabic dialects and social media. The experiments highlight the centrality of data diversity and the efficacy of linguistically aware segmentation. They also highlight that more data or more training step do not necessitate better models. Our new models achieve new state-of-the-art results on several downstream tasks. The resulting models are released to the community under the name QARiB.

* 6 pages, 5 figures

Via

Access Paper or Ask Questions

BERT Transformer model for Detecting Arabic GPT2 Auto-Generated Tweets

Jan 22, 2021
Fouzi Harrag, Maria Debbah, Kareem Darwish, Ahmed Abdelali

Figure 1 for BERT Transformer model for Detecting Arabic GPT2 Auto-Generated Tweets

Figure 2 for BERT Transformer model for Detecting Arabic GPT2 Auto-Generated Tweets

Figure 3 for BERT Transformer model for Detecting Arabic GPT2 Auto-Generated Tweets

Figure 4 for BERT Transformer model for Detecting Arabic GPT2 Auto-Generated Tweets

During the last two decades, we have progressively turned to the Internet and social media to find news, entertain conversations and share opinion. Recently, OpenAI has developed a ma-chine learning system called GPT-2 for Generative Pre-trained Transformer-2, which can pro-duce deepfake texts. It can generate blocks of text based on brief writing prompts that look like they were written by humans, facilitating the spread false or auto-generated text. In line with this progress, and in order to counteract potential dangers, several methods have been pro-posed for detecting text written by these language models. In this paper, we propose a transfer learning based model that will be able to detect if an Arabic sentence is written by humans or automatically generated by bots. Our dataset is based on tweets from a previous work, which we have crawled and extended using the Twitter API. We used GPT2-Small-Arabic to generate fake Arabic Sentences. For evaluation, we compared different recurrent neural network (RNN) word embeddings based baseline models, namely: LSTM, BI-LSTM, GRU and BI-GRU, with a transformer-based model. Our new transfer-learning model has obtained an accuracy up to 98%. To the best of our knowledge, this work is the first study where ARABERT and GPT2 were combined to detect and classify the Arabic auto-generated texts.

* Proceedings of the Fifth Arabic Natural Language Processing Workshop (WANLP @ COLING 2020)

Via

Access Paper or Ask Questions