It would take a human being years to read tens of thousands of scholarly articles, but an artificial intelligence system that can do it in a matter of minutes is about to go to work in the fight against COVID-19.

Ilya Safro, an associate professor of computer science at Clemson University, said that his team will soon roll out a new artificial intelligence system aimed at helping researchers explore the scientific literature as they strive for new discoveries to combat the novel coronavirus.

Ilya Safro

Ilya Safro

The system searches for words and phrases in the literature and looks for previously unknown connections between them. Then the system automatically generates hypotheses for scientists to test. 

The number of queries the system can take is very big and could include questions about everything from genes and proteins to drugs and their side effects. The system will be able to search hundreds of millions of concepts and billions of potential connections, Safro said.

“What we really hope is that this system will help in the question of how drugs can be repurposed to treat COVID-19,” he said. “There is already knowledge of these drugs in the datasets. People have already collected the information. We can use this information to design a new drug or repurpose an existing drug for COVID-19.”

Safro said the system will be ready to go very soon and that his biggest challenge at this point is letting the scientific community know it will be available for use. Anyone with queries or questions can contact Safro at isafro@clemson.edu.

The system fuses together two databases, CORD-19 and PubMed. 

CORD-19 is a free update resource of more than 51,000 scholarly articles about the coronavirus family of viruses. It is offered to the global research community through the Allen Institute for AI.

PubMed contains more than 30 million citations and abstracts of peer-reviewed biomedical literature and is available through the National Institutes of Health. 

Safro is principal investigator on the new artificial-intelligence project. He is working on it with Ph.D. students in his lab, Justin Sybrandt and Ilya Tyagin, and co-principal investigator Michael Shtutman, an associate professor in the College of Pharmacy at the University of South Carolina.

Shtutman is helping make sure researchers ask the right questions and helping ensure they get biologically meaningful answers.

“The system will allow researchers to make fast clinical decisions,” he said. “You can’t imagine the amount of literature that is coming out now. The system will help to navigate through the literature.”

The team has received funding through the National Science Foundation’s Rapid Research Response program.

The new system builds on Safro’s previous work. A system created prior to the COVID-19 situation searches PubMed and sometimes its sister archive, PubMed Central, to generate hypotheses.

Amy Apon, the C. Tycho Howle Director of the School of Computing, said that Safro is well suited to develop the system to combat COVID-19.

“Dr. Safro and his team are deeply experienced in applying big data to important healthcare decisions,” Apon said. “Their quick response to the COVID-19 situation is positioning them to have a major global impact as researchers seek new discoveries.”