Pharos-ESG: A Framework for Multimodal Parsing, Contextual Narration, and Hierarchical Labeling of ESG Report
- URL: http://arxiv.org/abs/2511.16417v1
- Date: Thu, 20 Nov 2025 14:41:44 GMT
- Title: Pharos-ESG: A Framework for Multimodal Parsing, Contextual Narration, and Hierarchical Labeling of ESG Report
- Authors: Yan Chen, Yu Zou, Jialei Zeng, Haoran You, Xiaorui Zhou, Aixi Zhong,
- Abstract summary: Pharos-ESG is a framework that transforms ESG reports into structured representations through multimodal parsing, contextual nar- ration, and hierarchical labeling.<n>We release Aurora-ESG, the first large-scale public dataset of ESG re- ports, spanning Mainland China, Hong Kong, and U.S.
- Score: 9.026784135029034
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Environmental, Social, and Governance (ESG) principles are reshaping the foundations of global financial gover- nance, transforming capital allocation architectures, regu- latory frameworks, and systemic risk coordination mecha- nisms. However, as the core medium for assessing corpo- rate ESG performance, the ESG reports present significant challenges for large-scale understanding, due to chaotic read- ing order from slide-like irregular layouts and implicit hier- archies arising from lengthy, weakly structured content. To address these challenges, we propose Pharos-ESG, a uni- fied framework that transforms ESG reports into structured representations through multimodal parsing, contextual nar- ration, and hierarchical labeling. It integrates a reading-order modeling module based on layout flow, hierarchy-aware seg- mentation guided by table-of-contents anchors, and a multi- modal aggregation pipeline that contextually transforms vi- sual elements into coherent natural language. The framework further enriches its outputs with ESG, GRI, and sentiment labels, yielding annotations aligned with the analytical de- mands of financial research. Extensive experiments on anno- tated benchmarks demonstrate that Pharos-ESG consistently outperforms both dedicated document parsing systems and general-purpose multimodal models. In addition, we release Aurora-ESG, the first large-scale public dataset of ESG re- ports, spanning Mainland China, Hong Kong, and U.S. mar- kets, featuring unified structured representations of multi- modal content, enriched with fine-grained layout and seman- tic annotations to better support ESG integration in financial governance and decision-making.
Related papers
- LEC-KG: An LLM-Embedding Collaborative Framework for Domain-Specific Knowledge Graph Construction -- A Case Study on SDGs [2.3873490763985408]
LEC-KG integrates the semantic understanding of Large Language Models (LLMs) with the structural reasoning of Knowledge Graph Embeddings (KGE)<n>Our framework reliably transforms unstructured policy text into validated knowledge graph triples.
arXiv Detail & Related papers (2026-02-02T13:37:17Z) - FysicsWorld: A Unified Full-Modality Benchmark for Any-to-Any Understanding, Generation, and Reasoning [52.88164697048371]
We introduce FysicsWorld, the first unified full-modality benchmark that supports bidirectional input-output across image, video, audio, and text.<n>FysicsWorld encompasses 16 primary tasks and 3,268 curated samples, aggregated from over 40 high-quality sources.
arXiv Detail & Related papers (2025-12-14T16:41:29Z) - Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding [61.36285696607487]
Document understanding is critical for applications from financial analysis to scientific discovery.<n>Current approaches, whether OCR-based pipelines feeding Large Language Models (LLMs) or native Multimodal LLMs (MLLMs) face key limitations.<n>Retrieval-Augmented Generation (RAG) helps ground models in external data, but documents' multimodal nature, combining text, tables, charts, and layout, demands a more advanced paradigm: Multimodal RAG.
arXiv Detail & Related papers (2025-10-17T02:33:16Z) - Domain-Specific Data Generation Framework for RAG Adaptation [58.20906914537952]
Retrieval-Augmented Generation (RAG) combines the language understanding and reasoning power of large language models with external retrieval to enable domain-grounded responses.<n>We propose RAGen, a framework for generating domain-grounded question-answer-context (QAC) triples tailored to diverse RAG adaptation approaches.
arXiv Detail & Related papers (2025-10-13T09:59:49Z) - Aligning ESG Controversy Data with International Guidelines through Semi-Automatic Ontology Construction [0.0]
We present a semi-automatic method for constructing structured knowledge representations of environmental, social, and governance events reported in the news.<n>Our approach uses lightweight ontology design, formal pattern modeling, and large language models to convert normative principles into reusable templates.<n>These templates are used to extract relevant information from news content and populate a structured knowledge graph that links reported incidents to specific framework principles.
arXiv Detail & Related papers (2025-09-13T17:49:59Z) - MMESGBench: Pioneering Multimodal Understanding and Complex Reasoning Benchmark for ESG Tasks [56.350173737493215]
Environmental, Social, and Governance (ESG) reports are essential for evaluating sustainability practices, ensuring regulatory compliance, and promoting financial transparency.<n>MMESGBench is a first-of-its-kind benchmark dataset to evaluate multimodal understanding and complex reasoning across structurally diverse and multi-source ESG documents.<n>MMESGBench comprises 933 validated QA pairs derived from 45 ESG documents, spanning across seven distinct document types and three major ESG source categories.
arXiv Detail & Related papers (2025-07-25T03:58:07Z) - eSapiens: A Real-World NLP Framework for Multimodal Document Understanding and Enterprise Knowledge Processing [6.450269621190948]
We introduce eSapiens, a unified question-answering system designed for enterprise settings.<n>eSapiens bridges structured databases and unstructured corpora via a dual-module architecture.<n>We evaluate eSapiens on the RAGTruth benchmark, analyzing performance across key dimensions such as completeness, hallucination, and context utilization.
arXiv Detail & Related papers (2025-06-20T06:07:20Z) - Graph Foundation Models: A Comprehensive Survey [66.74249119139661]
Graph Foundation Models (GFMs) aim to bring scalable, general-purpose intelligence to structured data.<n>This survey provides a comprehensive overview of GFMs, unifying diverse efforts under a modular framework.<n>GFMs are poised to become foundational infrastructure for open-ended reasoning over structured data.
arXiv Detail & Related papers (2025-05-21T05:08:00Z) - Relation-R1: Progressively Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relation Comprehension [31.952192907460713]
Relation-R1 is the textitfirst unified relation comprehension framework.<n>It integrates cognitive chain-of-thought (CoT)-guided supervised fine-tuning (SFT) and group relative policy optimization ( GRPO)<n>Experiments on widely-used PSG and SWiG datasets demonstrate that Relation-R1 achieves state-of-the-art performance in both binary and textitN-ary relation understanding.
arXiv Detail & Related papers (2025-04-20T14:50:49Z) - Universal Scene Graph Generation [77.53076485727414]
We present Universal Universal SG (USG), a novel representation capable of characterizing comprehensive semantic scenes.<n>We also introduce USG-Par, which effectively addresses two key bottlenecks of cross-modal object alignment and out-of-domain challenges.
arXiv Detail & Related papers (2025-03-19T08:55:06Z) - Enhancing Retrieval for ESGLLM via ESG-CID -- A Disclosure Content Index Finetuning Dataset for Mapping GRI and ESRS [15.217878978015856]
Climate change has intensified the need for transparency and accountability in organizational practices.<n> Frameworks like the Global Reporting Initiative (GRI) and the new European Sustainability Reporting Standards (ESRS) aim to standardize ESG reporting.<n> generating comprehensive reports remains challenging due to the considerable length of ESG documents and variability in company reporting styles.
arXiv Detail & Related papers (2025-03-10T18:07:33Z) - Advanced Unstructured Data Processing for ESG Reports: A Methodology for
Structured Transformation and Enhanced Analysis [20.038120319271773]
This study introduces an innovative methodology to transform ESG reports into structured, analyzable formats.
Our approach offers high-precision text cleaning, adept identification and extraction of text from images, and standardization of tables within these reports.
This research marks a substantial contribution to the fields of industrial ecology and corporate sustainability assessment.
arXiv Detail & Related papers (2024-01-04T06:26:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.