Protein folding: Proving open-source compute for academia’s hardest problems
Our protein folding subnet is Macrocosmos' venture into academic use cases, highlighting how Bittensor is flexible enough to solve almost even the most complex problems.
by Dr. Steffen Cruz
‘Form follows function:’ it’s a fundamental principle across design, science, and engineering. Yet in biochemistry generally, and protein-folding specifically, the principle is more accurately stated as ‘Form defines function’. The quest to determine form - and thus function of proteins - that makes this process so important to understand and simulate.
For decades, protein folding has been notorious for its complexity. That’s why we’ve designed Subnet X, a subnet dedicated to solving protein folding, as Bittensors’ first venture into academic use cases, demonstrating the network’s efficacy and flexibility. It uses the industry standard GROMACS software to simulate molecular dynamics of proteins. We take a known initial 3D structure, and put it in a cell-like environment and simulate it to understand its end form. This is an essential step in the protein folding process - and an entry point to many other high level techniques.
Researchers and universities can use this subnet to solve almost any protein, on demand, for free. We want this subnet to empower researchers to conduct world-class research and publish in top journals, while demonstrating that decentralized systems are an economic and efficient alternative to traditional approaches.
The problem of proteins
Proteins are the biological molecules that ‘do’ things: the molecular machines of biochemistry. Enzymes that break down food, hemoglobin that carries oxygen in blood, and actin filaments that make muscles contract are all proteins. They are made from long chains of amino acids, and the sequence of these chains is the information that is stored in DNA.
However, the transformation of 2D chains of amino acids to 3D structures capable of functioning is a highly complex, non-linear process. The process of this 2D structure folding on itself into a stable, 3D shape in a cell is called protein folding. For the most part, this process happens naturally and the end structure is in a much lower free energy state than the string.
Simply knowing the elements that make up a protein doesn’t reveal how the protein works. Knowing the building blocks isn’t enough; it is the way they're supposed to be put together that matters: its form defines its function. Since proteins have countless potential shapes, traditional computation methods are insufficient to map each protein’s folding. The processes involved are complex and time-consuming, even with state of the art systems.
Moreover, because proteins are so multifunctional, understanding how they work is critical in a wide variety of biotech applications - which means that, unless we overcome the complexity of their folding, breakthrough biotech innovations are blocked. For example, understanding how beta amyloid plaques fold, and thus misfold, is essential to understanding how Alzheimer's Disease develops and to identify potential treatment protocols. Folding@Home, a distributed computing community dedicated to simulating protein folding, was able to help design a treatment for SARS-covid-19 by identifying a unique folding pattern in the spike protein of the virus that left it open to interference.
Given the complexity of the challenge and the potential rewards of overcoming it, it’s no surprise that protein folding has quickly become the acid test against which machine learning pioneers have sought to test their algorithms. However, even though DeepMind accelerated the simulation process through deep learning, the challenges are still significant: this is an extremely resource intensive algorithmic innovation. In contrast to other subnets, a single mining task can take hours.
Folding on the subnet
Despite the specificity of protein folding, participating in this subnet as either a miner or a validator does not require any background knowledge of molecular dynamics simulations. While there are algorithmic improvements that skilled miners can bring, the primary axis of competition is pure FLOPS. Thus, this is a specialized compute subnet.
Moreover, the mechanics of the subnet are especially well-suited to this challenge. An ideal incentive mechanism defines an asymmetric workload between the validators and miners. The necessary proof of work (PoW) for the miners must require substantial effort and should be impossible to circumvent. On the other hand, the validation and rewarding process should benefit from some kind of privileged position or vantage point so that an objective score can be assigned without excess work. Put simply, rewarding should be objective and adversarially robust.
Protein folding is a textbook example of this kind of asymmetry; the molecular dynamics simulation involves long and arduous calculations which apply the laws of physics to the system over and over again until an optimized configuration is obtained. There are no reasonable shortcuts. While the process of simulation is exceedingly compute-intensive, the evaluation process is surprisingly straightforward. The reward given to the miners is based on the ‘energy’ of their protein configuration (or shape). The energy value compactly represents the overall quality of their result, and this value is precisely what is decreased over the course of a molecular dynamics simulation.
When the simulations finally converge, they produce the form of the proteins as they are observed in real physical contexts, and this form gives rise to their biological function. Thus, the miners provide utility by preparing ready-for-study proteins on demand.
From deep learning to a deep blue sky
We believe that in the future, Bittensor will become an essential tool for researchers around the world. At Macrocosmos, we are developing powerful and reusable tools which enable us to reformulate hard research problems - from energy minimization to matrix diagonalization and constrained optimizations - as efficient and robust incentive mechanisms. By building a subnet to solve for protein folding, we are taking a step towards a generally applicable framework for conducting research on physical systems. We call this project ‘deep blue sky’.
Protein folding is a notoriously difficult research problem. But that’s why we chose it: to demonstrate that Bittensor can tackle the worlds’ hardest research problems, and to motivate academics and universities to begin building research subnets. We’re working with researchers around the world, to help them to benefit from the immense computational resources that can be procured from decentralized networks. By bringing together their subject matter expertise with our incentive mechanism and Bittensor expertise, our goal is to build subnets which drive cutting edge research for the benefit of all of humanity.