P1: Mastering Physics Olympiads with Reinforcement Learning
- URL: http://arxiv.org/abs/2511.13612v1
- Date: Mon, 17 Nov 2025 17:18:13 GMT
- Title: P1: Mastering Physics Olympiads with Reinforcement Learning
- Authors: Jiacheng Chen, Qianjia Cheng, Fangchen Yu, Haiyuan Wan, Yuchen Zhang, Shenghe Zheng, Junchi Yao, Qingyang Zhang, Haonan He, Yun Luo, Yufeng Zhao, Futing Wang, Li Sheng, Chengxing Xie, Yuxin Zuo, Yizhuo Li, Wenxauan Zeng, Yulun Wu, Rui Huang, Dongzhan Zhou, Kai Chen, Yu Qiao, Lei Bai, Yu Cheng, Ning Ding, Bowen Zhou, Peng Ye, Ganqu Cui,
- Abstract summary: We introduce P1, a family of open-source physics reasoning models trained entirely through reinforcement learning (RL)<n>P1-235B-A22B is the first open-source model with Gold-medal performance at the latest International Physics Olympiad (IPhO 2025), and wins 12 gold medals out of 13 international/regional physics competitions in 2024/2025.<n>P1-235B-A22B+PhysicsMinions achieves overall No.1 on IPhO 2025, and obtains the highest average score over the 13 physics competitions.
- Score: 84.08897284032724
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent progress in large language models (LLMs) has moved the frontier from puzzle-solving to science-grade reasoning-the kind needed to tackle problems whose answers must stand against nature, not merely fit a rubric. Physics is the sharpest test of this shift, which binds symbols to reality in a fundamental way, serving as the cornerstone of most modern technologies. In this work, we manage to advance physics research by developing large language models with exceptional physics reasoning capabilities, especially excel at solving Olympiad-level physics problems. We introduce P1, a family of open-source physics reasoning models trained entirely through reinforcement learning (RL). Among them, P1-235B-A22B is the first open-source model with Gold-medal performance at the latest International Physics Olympiad (IPhO 2025), and wins 12 gold medals out of 13 international/regional physics competitions in 2024/2025. P1-30B-A3B also surpasses almost all other open-source models on IPhO 2025, getting a silver medal. Further equipped with an agentic framework PhysicsMinions, P1-235B-A22B+PhysicsMinions achieves overall No.1 on IPhO 2025, and obtains the highest average score over the 13 physics competitions. Besides physics, P1 models also present great performance on other reasoning tasks like math and coding, showing the great generalibility of P1 series.
Related papers
- Perfect score on IPhO 2025 theory by Gemini agent [5.634825161148485]
The International Physics Olympiad (IPhO) is the world's most prestigious and renowned physics competition for pre-university students.<n>On IPhO 2025 theory problems, while gold medal performance by AI models was reported previously, it falls behind the best human contestant.<n>Here we build a simple agent with Gemini 3.1 Pro Preview.
arXiv Detail & Related papers (2026-02-26T18:53:05Z) - P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads [91.05736019384489]
We introduce P1-VL, a family of open-source vision-language models engineered for advanced scientific reasoning.<n>Our flagship P1-VL-235B-A22B becomes the first open-source Vision-Language Model to secure 12 gold medals and achieves the state-of-the-art performance in the open-source models.
arXiv Detail & Related papers (2026-02-10T06:28:08Z) - PhysicsMinions: Winning Gold Medals in the Latest Physics Olympiads with a Coevolutionary Multimodal Multi-Agent System [65.02248709992442]
Physics is central to understanding and shaping the real world, and the ability to solve physics problems is a key indicator of real-world physical intelligence.<n>Existing approaches are predominantly single-model based, and open-source MLLMs rarely reach gold-medal-level performance.<n>We propose PhysicsMinions, a coevolutionary multi-agent system for Physics Olympiad.<n>Its architecture features three synergistic studios: a Visual Studio to interpret diagrams, a Logic Studio to formulate solutions, and a Review Studio to perform dual-stage verification.
arXiv Detail & Related papers (2025-09-29T14:40:53Z) - HiPhO: How Far Are (M)LLMs from Humans in the Latest High School Physics Olympiad Benchmark? [53.76627321546095]
HiPhO is the first benchmark dedicated to high school physics Olympiads with human-aligned evaluation.<n>It compiles 13 latest Olympiad exams from 2024-2025, spanning both international and regional competitions.<n>We assign gold, silver, and bronze medals to models based on official medal thresholds, thereby enabling direct comparison between (M)LLMs and human contestants.
arXiv Detail & Related papers (2025-09-09T16:24:51Z) - Physics Supernova: AI Agent Matches Elite Gold Medalists at IPhO 2025 [55.8464246603186]
We introduce Physics Supernova, an AI system with superior physics problem-solving abilities.<n>Supernova attains 23.5/30 points, ranking 14th of 406 contestants and surpassing the median performance of human gold medalists.<n>These results show that principled tool integration within agent systems can deliver competitive improvements.
arXiv Detail & Related papers (2025-09-01T17:59:13Z) - PhysUniBench: An Undergraduate-Level Physics Reasoning Benchmark for Multimodal Models [69.73115077227969]
We present PhysUniBench, a large-scale benchmark designed to evaluate and improve the reasoning capabilities of large language models (MLLMs)<n>PhysUniBench consists of 3,304 physics questions spanning 8 major sub-disciplines of physics, each accompanied by one visual diagram.<n>The benchmark's construction involved a rigorous multi-stage process, including multiple roll-outs, expert-level evaluation, automated filtering of easily solved problems, and a nuanced difficulty grading system with five levels.
arXiv Detail & Related papers (2025-06-21T09:55:42Z) - UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models [39.917074900737575]
Large language models (LLMs) have demonstrated remarkable capabilities in solving complex reasoning tasks.<n>The domain of physics reasoning presents unique challenges that have received significantly less attention.<n>Existing benchmarks often fall short in evaluating LLMs' abilities on the breadth and depth of undergraduate-level physics.
arXiv Detail & Related papers (2025-02-01T06:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.