FuguReport

Better with Experience: Self-Evolving LLM Agents for Evidence-Grounded Health Community Notes

Authors Zihang Fu, Fanxiao Li, Jianyang Gu, Haonan Wang, Preslav Nakov, Bryan Hooi, Min-Yen Kan, Jiaying Wu
Affiliations National University of Singapore / Yunnan University / The Ohio State University / Mohamed bin Zayed University of Artificial Intelligence
Categories Method / LLM Agents / Self-evolving agent framework, Application / Health Informatics / Community note generation for health misinformation, Evaluation / Misinformation Correction / Effectiveness of evolving memory for misinformation
License CC BY 4.0

Abstract Overview

This paper introduces EvoNote, a self-evolving agent framework for generating evidence-grounded health Community Notes on social platforms. The central idea is to reuse experience from prior misinformation-correction episodes by converting trajectory-level feedback into phase-specific memory for claim analysis, evidence acquisition, and note writing. The authors also construct MM-HealthCN, a 1.2K-instance multimodal benchmark of user-flagged health posts paired with human-written notes and helpfulness labels. Their evaluation emphasizes hierarchical utility judgment and pairwise comparison against human-written notes and automated baselines.

Novelty

The distinctive contribution is a memory-based self-evolving design that distills feedback from past correction trajectories into actionable, phase-specific strategies rather than treating each post independently. The work also contributes a multimodal benchmark and a health-specific utility evaluation protocol tailored to Community Notes generation.

Results

On MM-HealthCN, EvoNote-generated notes were preferred over corresponding human-written Community Notes in 89.6% of cases under the reported human-validated utility judge, and the method outperformed several web-search, Community Notes, and memory-augmented baselines. On unresolved Needs More Ratings posts, the system produced helpful notes for 82.0% of cases, while the paper reports reducing median candidate-correction time from over 13 hours in the human pipeline to under 2 minutes.

Key Points

  1. EvoNote uses a Social Utility Judge and Memory Evolver to turn completed note-generation trajectories into reusable memory for later cases.
  2. The authors introduce MM-HealthCN, a 1.2K multimodal benchmark spanning text, image, and video health misinformation posts with linked Community Notes data.
  3. Analyses attribute performance gains to stronger evidence use, including higher-quality and more diverse sources, and to explicit claim analysis combined with evolving memory.

References

This page was created using generative AI such as GPT-5, Claude Opus 4, Gemini 3, Gemini 3.1 Flash Image, and their higher-end successor versions. No guarantee can be made regarding its contents.