NLP for Local Governance Meeting Records: A Focus Article on Tasks, Datasets, Metrics and Benchmark
- URL: http://arxiv.org/abs/2602.08162v1
- Date: Sun, 08 Feb 2026 23:45:17 GMT
- Title: NLP for Local Governance Meeting Records: A Focus Article on Tasks, Datasets, Metrics and Benchmark
- Authors: Ricardo Campos, José Pedro Evans, José Miguel Isidro, Miguel Marques, Luís Filipe Cunha, Alípio Jorge, Sérgio Nunes, Nuno Guimarães,
- Abstract summary: Local governance meeting records are official documents, in the form of minutes or transcripts, documenting how proposals, discussions, and procedural actions unfold during institutional meetings.<n>These documents are often dense, bureaucratic, and highly heterogeneous across municipalities, exhibiting significant variation in language, terminology, structure, and overall organization.<n>To address these challenges, computational methods can be employed to structure and interpret such complex documents.<n>In this focus article, we review foundational NLP tasks that support the structuring of local governance meeting documents.
- Score: 4.320509359331248
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Local governance meeting records are official documents, in the form of minutes or transcripts, documenting how proposals, discussions, and procedural actions unfold during institutional meetings. While generally structured, these documents are often dense, bureaucratic, and highly heterogeneous across municipalities, exhibiting significant variation in language, terminology, structure, and overall organization. This heterogeneity makes them difficult for non-experts to interpret and challenging for intelligent automated systems to process, limiting public transparency and civic engagement. To address these challenges, computational methods can be employed to structure and interpret such complex documents. In particular, Natural Language Processing (NLP) offers well-established methods that can enhance the accessibility and interpretability of governmental records. In this focus article, we review foundational NLP tasks that support the structuring of local governance meeting documents. Specifically, we review three core tasks: document segmentation, domain-specific entity extraction and automatic text summarization, which are essential for navigating lengthy deliberations, identifying political actors and personal information, and generating concise representations of complex decision-making processes. In reviewing these tasks, we discuss methodological approaches, evaluation metrics, and publicly available resources, while highlighting domain-specific challenges such as data scarcity, privacy constraints, and source variability. By synthesizing existing work across these foundational tasks, this article provides a structured overview of how NLP can enhance the structuring and accessibility of local governance meeting records.
Related papers
- HiPS: Hierarchical PDF Segmentation of Textbooks [2.2903728931592395]
Legal textbooks contain layered knowledge essential for interpreting and applying legal norms.<n>We examine a Table of Contents (TOC)-based technique and approaches that rely on open-source structural parsing tools.<n>To enhance parsing accuracy, we incorporate preprocessing strategies such as OCR-based title detection, XML-derived features, and contextual text features.
arXiv Detail & Related papers (2025-08-31T15:40:43Z) - Beyond Chunking: Discourse-Aware Hierarchical Retrieval for Long Document Question Answering [51.7493726399073]
We present a discourse-aware hierarchical framework to enhance long document question answering.<n>The framework involves three key innovations: specialized discourse parsing for lengthy documents, LLM-based enhancement of discourse relation nodes, and structure-guided hierarchical retrieval.
arXiv Detail & Related papers (2025-05-26T14:45:12Z) - Beyond Text: Characterizing Domain Expert Needs in Document Research [10.98467955215441]
We ask sixteen domain experts across two domains to understand their processes of document research.<n>We find that our participants processes are idiosyncratic, iterative, and rely extensively on the social context of a document.<n>We call on the NLP community to more carefully consider the role of the document in building useful tools.
arXiv Detail & Related papers (2025-04-16T21:24:41Z) - StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization [94.31508613367296]
Retrieval-augmented generation (RAG) is a key means to effectively enhance large language models (LLMs)
We propose StructRAG, which can identify the optimal structure type for the task at hand, reconstruct original documents into this structured format, and infer answers based on the resulting structure.
Experiments show that StructRAG achieves state-of-the-art performance, particularly excelling in challenging scenarios.
arXiv Detail & Related papers (2024-10-11T13:52:44Z) - Document Structure in Long Document Transformers [64.76981299465885]
Long documents often exhibit structure with hierarchically organized elements of different functions, such as section headers and paragraphs.
Despite the omnipresence of document structure, its role in natural language processing (NLP) remains opaque.
Do long-document Transformer models acquire an internal representation of document structure during pre-training?
How can structural information be communicated to a model after pre-training, and how does it influence downstream performance?
arXiv Detail & Related papers (2024-01-31T08:28:06Z) - Identification of Regulatory Requirements Relevant to Business
Processes: A Comparative Study on Generative AI, Embedding-based Ranking,
Crowd and Expert-driven Methods [10.899912290518648]
This work examines how legal and domain experts can be assisted in the assessment of relevant requirements.
We compare an embedding-based NLP ranking method, a generative AI method using GPT-4, and a crowdsourced method with the purely manual method of creating labels by experts.
A gold standard is created for both BPMN2.0 processes and matched to real-world requirements from multiple regulatory documents.
arXiv Detail & Related papers (2024-01-02T12:08:31Z) - Leveraging Large Language Models for Topic Classification in the Domain
of Public Affairs [65.9077733300329]
Large Language Models (LLMs) have the potential to greatly enhance the analysis of public affairs documents.
LLMs can be of great use to process domain-specific documents, such as those in the domain of public affairs.
arXiv Detail & Related papers (2023-06-05T13:35:01Z) - Advancing Topic Segmentation and Outline Generation in Chinese Texts: The Paragraph-level Topic Representation, Corpus, and Benchmark [44.06803331843307]
paragraph-level topic structure can grasp and understand the overall context of a document from a higher level.
The lack of large-scale, high-quality Chinese paragraph-level topic structure corpora restrained research and applications.
We propose a hierarchical paragraph-level topic structure representation with three layers to guide the corpus construction.
We employ a two-stage man-machine collaborative annotation method to construct the largest Chinese paragraph-level Topic Structure corpus.
arXiv Detail & Related papers (2023-05-24T06:43:23Z) - MUG: A General Meeting Understanding and Generation Benchmark [60.09540662936726]
We build the AliMeeting4MUG Corpus, which consists of 654 recorded Mandarin meeting sessions with diverse topic coverage.
In this paper, we provide a detailed introduction of this corpus, SLP tasks and evaluation methods, baseline systems and their performance.
arXiv Detail & Related papers (2023-03-24T11:52:25Z) - Prompting Language Models for Linguistic Structure [73.11488464916668]
We present a structured prompting approach for linguistic structured prediction tasks.
We evaluate this approach on part-of-speech tagging, named entity recognition, and sentence chunking.
We find that while PLMs contain significant prior knowledge of task labels due to task leakage into the pretraining corpus, structured prompting can also retrieve linguistic structure with arbitrary labels.
arXiv Detail & Related papers (2022-11-15T01:13:39Z) - Predicting Themes within Complex Unstructured Texts: A Case Study on
Safeguarding Reports [66.39150945184683]
We focus on the problem of automatically identifying the main themes in a safeguarding report using supervised classification approaches.
Our results show the potential of deep learning models to simulate subject-expert behaviour even for complex tasks with limited labelled data.
arXiv Detail & Related papers (2020-10-27T19:48:23Z) - Data Readiness for Natural Language Processing [3.6296396308298795]
This document concerns data readiness in the context of machine learning and Natural Language Processing.
It describes how an organization may proceed to identify, make available, validate, and prepare data to facilitate automated analysis methods.
arXiv Detail & Related papers (2020-09-04T07:53:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.