A conference just tested AI agents’ ability to do science
AI promises to speed up scientific analysis and writing. However, AI agents struggled with accuracy and judgment.
 
                                “You have to be really careful when working with AI,” says one participant
In a first, scientists just convened to discuss science conducted by AI agents. While many of the papers covered topics in computer science, others delved into a range of fields, including economics, psychology and biology.
					 demaerre/iStock/Getty Images Plus
					 
		
					
In a first, a scientific conference welcomed paper submissions from any area of science, but with one catch: AI had to do most of the work. Called Agents4Science 2025, the Oct. 22 virtual event focused on the work of artificial intelligence agents — systems that pair large language models with other tools or databases to perform multistep tasks.
From formulating hypotheses to analyzing data and providing the first round of peer reviews, AI agents took the lead. Human reviewers then stepped in to assess the top submissions. In all, 48 papers out of 314 made the cut. Each had to detail how people and AI collaborated on every stage of the research and writing process.
“We’re seeing this interesting paradigm shift,” said James Zou, a computer scientist at Stanford University who co-organized the conference. “People are starting to explore using AI as a co-scientist.”
Most scientific journals and meetings currently ban AI coauthors and prohibit peer reviewers from relying on AI. These policies aim to avoid hallucinations and other issues related to AI use. However, this approach makes it tough to learn how good AI is at science. That’s what Agents4Science aimed to explore, Zou said, calling the conference an experiment, with all the materials publicly available for anyone to study.
At the virtual meeting, humans presented AI-assisted work spanning fields such as economics, biology and engineering. Min Min Fong, an economist at the University of California, Berkeley, and her team collaborated with AI to study car-towing data from San Francisco. Their study found that waiving high towing fees helped low-income people keep their vehicles.
“AI was really great at helping us with computational acceleration,” Fong said. But, she found, “you have to be really careful when working with AI.”
As an example, the AI kept citing the wrong date for when San Francisco’s rule waiving towing fees went into effect. Fong had to check this in the original source to discover the error. “The core scientific work still remains human-driven,” she said.
For Risa Wechsler, a computational astrophysicist at Stanford who helped review submissions, the results were mixed. The papers she saw were technically correct, she said, “but they were neither interesting nor important.” She was excited about the potential of AI for research but remained unconvinced that today’s agents can “design robust scientific questions.” And, she added, the technical skill of AI can “mask poor scientific judgment.”
Still, the event included some glimmers of hope for the future of AI in science. Silvia Terragni, a machine learning engineer at the company Upwork in San Francisco, said that she gave ChatGPT some context about the kinds of problems her company deals with and asked the bot to propose paper ideas. “One of these was the winner,” she said, selected as one of the three top papers in the conference. It was a study about using AI reasoning in a job marketplace. “I think [AI] can actually come up with novel ideas,” she said.
More Stories from Science News on Artificial Intelligence
What's Your Reaction?
 
                    
                
 
                    
                
 
                    
                
 
                    
                
 
                    
                
 
                    
                
 
                    
                
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
                                                                                                                                             
                                                                                                                                             
                                                                                                                                             
                                             
                                            