Transformer Models in Education: Summarizing Science Textbooks with AraBART, MT5, AraT5, and mBART
- URL: http://arxiv.org/abs/2406.07692v1
- Date: Tue, 11 Jun 2024 20:14:09 GMT
- Title: Transformer Models in Education: Summarizing Science Textbooks with AraBART, MT5, AraT5, and mBART
- Authors: Sari Masri, Yaqeen Raddad, Fidaa Khandaqji, Huthaifa I. Ashqar, Mohammed Elhenawy,
- Abstract summary: We have developed an advanced text summarization system targeting Arabic textbooks.
This system evaluates and extracts the most important sentences found in biology textbooks for the 11th and 12th grades in the Palestinian curriculum.
- Score: 4.214194481944042
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, with the rapid development in the fields of technology and the increasing amount of text t available on the internet, it has become urgent to develop effective tools for processing and understanding texts in a way that summaries the content without losing the fundamental essence of the information. Given this challenge, we have developed an advanced text summarization system targeting Arabic textbooks. Relying on modern natu-ral language processing models such as MT5, AraBART, AraT5, and mBART50, this system evaluates and extracts the most important sentences found in biology textbooks for the 11th and 12th grades in the Palestinian curriculum, which enables students and teachers to obtain accurate and useful summaries that help them easily understand the content. We utilized the Rouge metric to evaluate the performance of the trained models. Moreover, experts in education Edu textbook authoring assess the output of the trained models. This approach aims to identify the best solutions and clarify areas needing improvement. This research provides a solution for summarizing Arabic text. It enriches the field by offering results that can open new horizons for research and development in the technologies for understanding and generating the Arabic language. Additionally, it contributes to the field with Arabic texts through creating and compiling schoolbook texts and building a dataset.
Related papers
- Gazelle: An Instruction Dataset for Arabic Writing Assistance [12.798604366250261]
We present Gazelle, a comprehensive dataset for Arabic writing assistance.
We also offer an evaluation framework designed to enhance Arabic writing assistance tools.
Our findings underscore the need for continuous model training and dataset enrichment.
arXiv Detail & Related papers (2024-10-23T17:51:58Z) - Integrating A.I. in Higher Education: Protocol for a Pilot Study with 'SAMCares: An Adaptive Learning Hub' [0.6990493129893112]
This research aims to introduce an innovative study buddy we will be calling the 'SAMCares'
The system leverages a Large Language Model (LLM) and Retriever-Augmented Generation (RAG) to offer real-time, context-aware, and adaptive educational support.
arXiv Detail & Related papers (2024-05-01T05:39:07Z) - Text2Data: Low-Resource Data Generation with Textual Control [104.38011760992637]
Natural language serves as a common and straightforward control signal for humans to interact seamlessly with machines.
We propose Text2Data, a novel approach that utilizes unlabeled data to understand the underlying data distribution through an unsupervised diffusion model.
It undergoes controllable finetuning via a novel constraint optimization-based learning objective that ensures controllability and effectively counteracts catastrophic forgetting.
arXiv Detail & Related papers (2024-02-08T03:41:39Z) - Automatic and Human-AI Interactive Text Generation [27.05024520190722]
This tutorial aims to provide an overview of the state-of-the-art natural language generation research.
Text-to-text generation tasks are more constrained in terms of semantic consistency and targeted language styles.
arXiv Detail & Related papers (2023-10-05T20:26:15Z) - A Benchmark for Text Expansion: Datasets, Metrics, and Baselines [87.47745669317894]
This work presents a new task of Text Expansion (TE), which aims to insert fine-grained modifier into proper locations of the plain text.
We leverage four complementary approaches to construct a dataset with 12 million automatically generated instances and 2K human-annotated references.
On top of a pre-trained text-infilling model, we build both pipelined and joint Locate&Infill models, which demonstrate the superiority over the Text2Text baselines.
arXiv Detail & Related papers (2023-09-17T07:54:38Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Uzbek text summarization based on TF-IDF [0.0]
This article presents an experiment on summarization task for Uzbek language.
The methodology was based on text abstracting based on TF-IDF algorithm.
We summarize the given text by applying the n-gram method to important parts of the whole text.
arXiv Detail & Related papers (2023-03-01T12:39:46Z) - A Survey on Arabic Named Entity Recognition: Past, Recent Advances, and
Future Trends [15.302538985992518]
We provide a comprehensive review of the development of Arabic NER.
Traditional Arabic NER systems focus on feature engineering and designing domain-specific rules.
With the growth of pre-trained language model, Arabic NER yields better performance.
arXiv Detail & Related papers (2023-02-07T14:56:52Z) - Pre-training Language Model Incorporating Domain-specific Heterogeneous Knowledge into A Unified Representation [49.89831914386982]
We propose a unified pre-trained language model (PLM) for all forms of text, including unstructured text, semi-structured text, and well-structured text.
Our approach outperforms the pre-training of plain text using only 1/4 of the data.
arXiv Detail & Related papers (2021-09-02T16:05:24Z) - A Survey of Knowledge-Enhanced Text Generation [81.24633231919137]
The goal of text generation is to make machines express in human language.
Various neural encoder-decoder models have been proposed to achieve the goal by learning to map input text to output text.
To address this issue, researchers have considered incorporating various forms of knowledge beyond the input text into the generation models.
arXiv Detail & Related papers (2020-10-09T06:46:46Z) - Exploring the Limits of Transfer Learning with a Unified Text-to-Text
Transformer [64.22926988297685]
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP)
In this paper, we explore the landscape of introducing transfer learning techniques for NLP by a unified framework that converts all text-based language problems into a text-to-text format.
arXiv Detail & Related papers (2019-10-23T17:37:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.