Supercomputer speeds up genetics research at Clemson
CLEMSON — The next time you find yourself looking for a needle in a haystack, you might want to make friends with someone who can hook you up to a supercomputer. In the time it will take you to examine the first stem, the computer will have sorted through the entire stack, found the needle and returned everything to its original condition, neat as can be.
It just so happens that Clemson University has a supercomputer of its own. The Palmetto Cluster, the University’s primary high-performance computing resource, is the 89th fastest computer in the world and among the top five fastest supercomputers at public academic institutions in the United States that don’t have federally funded centers.
This is all good news for research scientists like Alex Feltus, associate professor of genetics and biochemistry. Feltus, one of more than 1,500 users of the machine since its inception in 2007, utilizes the supercomputer to find figurative needles in haystacks. In Feltus’ case, the “needles” are valuable traits, such as biomass production and disease resistance, that are hidden deep within the immensely intricate genetic material of plants.
Without a supercomputer, Feltus and his fellow researchers could not function nearly as efficiently as they do.
“Only a supercomputer can handle the massive numbers of DNA sequences that are generated by new DNA sequencing technology,” said Feltus, a member of a research team that recently received a $1.4 million grant from the National Science Foundation to enhance the capacity of genomic databases to process mind-boggling bundles of data.
“All crops are very, very complex. For instance, there are 55,000 genes in rice. We’re trying to understand the genes and their pathways and how they are combining and interacting. You can’t examine this level of complexity without using a supercomputer.”
Feltus is one of the leaders in an emerging field called systems genetics, which studies how genes in crops collectively interact using new computational technologies that Clemson University’s Public Service and Agriculture programs are helping to develop.
“Most agricultural traits in plants are controlled not just by one gene but by many genes,” said Feltus. “We’re trying to find the complexity ‘sweet spot,’ the balance between being too reductionist and too holistic.”
The majority of Clemson’s computer research and production systems are housed in the Information Technology Center at the innovation campus in Anderson. Jay Harris, director of data center facilities, explains that the cluster is not a single enormous computer but rather thousands of small computers stacked together like a really smart condominium complex. This highly intricate connection produces incredible speed.
“We have 100 gigabits per second connectivity to Internet2,” said Harris, who has helped oversee a continuous updating and redesigning of the data center that began in 2007. “To put that in perspective, a typical Internet service provider will sell you anywhere from 10 to 60 megabits per second for your residential service. Our 100 gigabits translates to 100,000 megabits per second. And that represents just one of our connections.”
In other words, waaaaaaaay fast.
“The speed of a super-computer is measured in teraflops – trillions of floating point operations per second,” said Harris. “We have one room with about 2,000 machines that can perform 550 trillion floating point operations in one second. And a second later, another 550 trillion and then another 550 trillion.”
Internet2 – an international community of academic and government entities that includes Clemson University – plays a crucial role, as well. When compared to the number of users on the ordinary Internet, the Internet2’s base is relatively tiny, which further increases the speed required by 21st century researchers and educators.
“On one of my research projects, our group was able to transfer DNA files from the National Center for Biotechnology Information in Maryland to Clemson University much faster due to Internet2,” said Feltus. “That transfer of 7,500 files took eight days. The same transfer can now be done in half a day. These are huge files, and some of our research is devoted to getting collaborators to use this really fast connection to share data.”
Because Clemson’s supercomputer has attained world-renowned status, it more than pays for itself, garnering millions of dollars in funding and attracting top talent from a variety of expertise. Grants that use the Palmetto cluster have been awarded from the U.S. Department of Agriculture, the U.S. Department of Energy, the National Science Foundation and other federal agencies.
“If the data center hadn’t been here, I would never have come to Clemson,” said Feltus. “This is critical to my work. And it’s becoming critical to a lot of people’s work.”
This material is based upon work supported by the National Science Foundation (NSF) under Grant Nos. 1443040 and 1245936. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NSF.