2 months, 70,000 proteins folded: Our roadmap for scaling Subnet 25
Subnet 25 has seen rapid growth, and we’re just getting started. Here’s our outlook for new developments on the subnet.
By Macrocosmos
The protein folding subnet is growing fast. Since launch, Subnet 25 has completed 70,000 unique protein submissions, each undergoing 10 simulations. Every day, we’re adding hundreds more submissions to that number. To put that in context, the Protein Data Bank, one of the largest data sources for protein folding, contains 222,000 structures verified by laboratory work and 1 million computer simulations.
We’re now at the point where protein folding is out of the proof-of-concept stage; we have a subnet that’s responsible for a rapidly growing number of protein folding simulations. Our focus now is to improve the subnet further, addressing pain-points we are aware of for miners and validators, and further demonstrating how Bittensor can apply to academic use cases.
To reach that point, we wanted to share our SN25 priorities for the rest of 2024.
Simulation stability and reproducibility
An academic use case that is of clear importance is reproducibility: ensuring that a miner’s output can be replicated easily. As it currently stands, GROMACS is unable to provide reproducibility in a way that ensures the validity of a miner’s work. For this reason, we are planning to replace GROMACS with another open-source alternative.
While reproducibility is useful in its own right, it also provides us with the chance to improve the overall stability of the subnet and tighten up the codebase. For any new platform we introduce, we’re looking to implement direct pythonic interfaces, which reduce overhead, give us neater code and remove the requirement to run commands in the command line. It also gives us a chance to further improve Validator Trust on the subnet.
Improving Validator Trust
The protein folding subnet has some unique features, which affect how we subsequently evaluate and reward contributions. Protein folding simulations take considerably longer to complete than tasks on other subnets, running up to a number of hours. As a result, each individual miner is only receiving a relatively small number of simulations. If their performance varies between different simulations or if there’s a failure in miner-validator communications, there’s a greater risk that validators fall out of consensus. While this is a challenge for every subnet, it’s particularly relevant for Subnet 25 - there is an inherent randomness in protein folding that affects miner performance from one task to the next. For validators, if an error occurs there is a chance the validation process ends and the miner’s contribution isn’t recognised. All these factors increase the risk of validators falling out of consensus on miner performance, negatively impacting Validator Trust.
To resolve this, we’re rolling out an increasing number of fixes across the subnet to strengthen the entire process. We’re increasing the number of simulations assigned to miners, reworking the scoring method and adjusting how the miners are sampled through a new PingSynapse. We’re also working on issues like error handling, giving validators the ability to halt a task if they encounter a systemic issue during processing. The net result should be that validators find it easier to consistently assess miner performance.
Launching the subnet 25 product
Our overarching goal with the protein folding subnet is to reach a point where we have a product any researcher or organization can use to request a simulation with their required environmental and hyperparameter conditions. In the medium term, that means building out the front-end features for our desired user groups - the Bittensor community and research groups. Right now, we’re investing more effort into our dashboards and reporting work, so that miners and validators have the information they need in an accessible format.
Our dashboards are a first step here, giving miners and validators access to total subnet performance, as well as leaderboards and logged runs on the subnet. Over time, we’ll be evolving the reporting we deliver to be more bespoke towards either miners or validators, as well as deepening the level of data we provide.
Once we have delivered these improvements, we will be able to further show that Bittensor is the right platform for delivering protein folding at scale. And, by extension, that subnets can (and should) be addressing academic use cases.