The False Dawn: Reevaluating Google's Reinforcement Learning for Chip Macro Placement
- URL: http://arxiv.org/abs/2306.09633v9
- Date: Fri, 5 Jul 2024 13:35:54 GMT
- Title: The False Dawn: Reevaluating Google's Reinforcement Learning for Chip Macro Placement
- Authors: Igor L. Markov,
- Abstract summary: Reinforcement learning for physical design of silicon chips in a Google 2021 Nature paper stirred controversy due to poorly documented claims.
We show that Google RL lags behind (i) human designers, (ii) a well-known algorithm (Simulated Annealing), and (iii) generally-available commercial software, while being slower.
Crosschecked data indicate that the integrity of the Nature paper is substantially undermined owing to errors in conduct, analysis and reporting.
- Score: 1.4803764446062861
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Reinforcement learning (RL) for physical design of silicon chips in a Google 2021 Nature paper stirred controversy due to poorly documented claims that raised eyebrows and drew critical media coverage. The paper withheld critical methodology steps and most inputs needed to reproduce results. Our meta-analysis shows how two separate evaluations filled in the gaps and demonstrated that Google RL lags behind (i) human designers, (ii) a well-known algorithm (Simulated Annealing), and (iii) generally-available commercial software, while being slower; and in a 2023 open research contest, RL methods weren't in top 5. Crosschecked data indicate that the integrity of the Nature paper is substantially undermined owing to errors in conduct, analysis and reporting. Before publishing, Google rebuffed internal allegations of fraud, which still stand. We note policy implications and conclusions for chip design.
Related papers
- Reevaluating Policy Gradient Methods for Imperfect-Information Games [94.45878689061335]
We conduct the largest-ever exploitability comparison of DRL algorithms for imperfect-information games.
Over 5600 training runs, FP, DO, and CFR-based approaches fail to outperform generic policy gradient methods.
arXiv Detail & Related papers (2025-02-13T03:38:41Z) - Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative Querying [0.3659498819753633]
State-of-the-art Large Language models (LLMs) continue to struggle when performing logical and mathematical reasoning.
This paper makes use of the notion of critical questions from the literature on argumentation theory, focusing in particular on Toulmin's model of argumentation.
We show that employing these critical questions can improve the reasoning capabilities of LLMs.
arXiv Detail & Related papers (2024-12-19T18:51:30Z) - Are We There Yet? Revealing the Risks of Utilizing Large Language Models in Scholarly Peer Review [66.73247554182376]
Large language models (LLMs) have led to their integration into peer review.
The unchecked adoption of LLMs poses significant risks to the integrity of the peer review system.
We show that manipulating 5% of the reviews could potentially cause 12% of the papers to lose their position in the top 30% rankings.
arXiv Detail & Related papers (2024-12-02T16:55:03Z) - That Chip Has Sailed: A Critique of Unfounded Skepticism Around AI for Chip Design [6.383127282050037]
In 2020, we introduced a deep reinforcement learning method capable of generating superhuman chip layouts.
A non-peer-reviewed paper at ISPD 2023 questioned its performance claims, despite failing to run our method as described in Nature.
We publish this response to ensure that no one is wrongly discouraged from innovating in this impactful area.
arXiv Detail & Related papers (2024-11-15T09:11:10Z) - JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking [81.88787401178378]
We introduce JudgeRank, a novel agentic reranker that emulates human cognitive processes when assessing document relevance.
We evaluate JudgeRank on the reasoning-intensive BRIGHT benchmark, demonstrating substantial performance improvements over first-stage retrieval methods.
In addition, JudgeRank performs on par with fine-tuned state-of-the-art rerankers on the popular BEIR benchmark, validating its zero-shot generalization capability.
arXiv Detail & Related papers (2024-10-31T18:43:12Z) - Autonomous LLM-driven research from data to human-verifiable research papers [0.0]
We build an automation platform that guides interacting through complete stepwise process.
In mode provided annotated data alone, datapaper raised hypotheses, designed plans, wrote and interpreted analysis codes, generated and interpreted results.
We demonstrate potential for AI-driven acceleration of scientific discovery while enhancing traceability, transparency and verifiability.
arXiv Detail & Related papers (2024-04-24T23:15:49Z) - FABLES: Evaluating faithfulness and content selection in book-length summarization [55.50680057160788]
In this paper, we conduct the first large-scale human evaluation of faithfulness and content selection on book-length documents.
We collect FABLES, a dataset of annotations on 3,158 claims made in LLM-generated summaries of 26 books, at a cost of $5.2K USD.
An analysis of the annotations reveals that most unfaithful claims relate to events and character states, and they generally require indirect reasoning over the narrative to invalidate.
arXiv Detail & Related papers (2024-04-01T17:33:38Z) - Second-Order Information Matters: Revisiting Machine Unlearning for Large Language Models [1.443696537295348]
Privacy leakage and copyright violation are still underexplored.
Our unlearning algorithms are not only data-agnostic/model-agnostic but also proven to be robust in terms of utility preservation or privacy guarantee.
arXiv Detail & Related papers (2024-03-13T18:57:30Z) - A LLM Assisted Exploitation of AI-Guardian [57.572998144258705]
We evaluate the robustness of AI-Guardian, a recent defense to adversarial examples published at IEEE S&P 2023.
We write none of the code to attack this model, and instead prompt GPT-4 to implement all attack algorithms following our instructions and guidance.
This process was surprisingly effective and efficient, with the language model at times producing code from ambiguous instructions faster than the author of this paper could have done.
arXiv Detail & Related papers (2023-07-20T17:33:25Z) - A Gold Standard Dataset for the Reviewer Assignment Problem [117.59690218507565]
"Similarity score" is a numerical estimate of the expertise of a reviewer in reviewing a paper.
Our dataset consists of 477 self-reported expertise scores provided by 58 researchers.
For the task of ordering two papers in terms of their relevance for a reviewer, the error rates range from 12%-30% in easy cases to 36%-43% in hard cases.
arXiv Detail & Related papers (2023-03-23T16:15:03Z) - Automated scholarly paper review: Concepts, technologies, and challenges [5.431798850623952]
Recent years have seen the application of artificial intelligence (AI) in assisting the peer review process.
With the involvement of humans, such limitations remain inevitable.
arXiv Detail & Related papers (2021-11-15T04:44:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.