Task management: Redesigning SN1’s validation mechanisms
On SN1, the tasks set by validators were not effective at driving intelligence. So how can validation mechanisms be redesigned to generate better performance?
by Felix Quinque
As Bittensor’s inaugural subnet, SN1 pioneered much of what is now deemed standard among the ecosystem. However, the network’s inherent competitiveness means that things change fast.
SN1’s position at the forefront of Bittensor meant that the subnet served as a site for experiment, and through SN1 we have consistently learnt by testing, observing, and evolving our approach. In particular, SN1’s tasks - the tests set by validators to stimulate innovation among competing miners - were no longer as effective in driving intelligence.
Since June 2024, our engineers have been conducting a thorough audit and redesign of SN1, reviewing and rewriting the codebase. Of the seven different tasks set by validators, four were found to be unproductive and removed. The latest release improves the subnet’s design, adding two new tasks capable of stimulating continuous improvement from the models submitted by miners. This technical enhancement is emblematic of our approach to subnets as a whole.
At Macrocosmos, managing subnets has enhanced our appreciation of what is valuable, important, and desirable in such networks. Beyond SN1, we’ve also released refactors across all of our subnets over the last quarter, with further improvements coming in the near future. This exploration of network design and productive incentive mechanisms is pushing the boundaries of distributed intelligence, opening new possibilities for the Bittensor ecosystem as a whole.
The role of tasks
Each subnet’s set tasks are supposed to channel miner activity towards improving the capabilities of submitted models. The subnet’s previous design favored volume. By setting miners challenges across various types of task - math, data retrieval, inference, and so on - SN1’s design sought to shape well-rounded agents.
However, in practice this range of tasks was found to be less effective in selecting for greater intelligence. Instead, miners developed sophisticated strategies for anticipating and solving particular problems.
Whenever a problem contradicted such expectations, even experienced miners did no better than newcomers - implying that the tasks intending to drive up intelligence were not doing so. So rather than improving within a category of cognition, miners were getting better and better at rapidly solving specific problems. For example, the previous data retrieval task was drawn from Wikipedia, which selected not for intelligent data retrieval but for Wikipedia retrieval.
Similarly, math tasks selected for particular types of math tasks, not for mathematic skill in general. Miners would identify what task they’re sent, and then pass it onto a calculator; when presented with an unexpected math problem, they perform on par with beginners. Without effective tasks, models on SN1 would not noticeably improve in performance. Over time, this would lead to the subnet’s stagnation. Clearly, something had to be done.
The challenge of thought
Designing effective tasks reveals the deeper problems around selecting for intelligence in machine learning. Effective tasks must prevent the possibility of exploitation, especially by miners taking advantage of the validators’ own methodology for validating their answers. For instance, we can’t use a fixed data set for validators, because miners could mine it for the answer. In other words, if you can answer the question, the miner can copy it.
That’s why the subnet design must incorporate a power imbalance. Asymmetry is critical to designing elegant incentive mechanisms that motivate miners to create intelligent systems surpassing those that can be contained within a distributed validation system.
Our programming task offers a good example. After being sourced from Github, an LLM renames the task’s variable and function names, cutting the code off midway through and asking the miners to complete it instead. As it’s impossible to retrieve the original files, the miners must rewrite the code from scratch.
This design ensures both that miners cannot simply retrieve the answer, and that they must instead channel their efforts towards intelligence - in this case, programming - in order to be rewarded. Without the power imbalance derived from renaming and interrupting the code midway, miners would inevitably prefer to retrieve the codebase rather than programming from scratch, which would fail to select for the tasks’s goal of enhancing intelligent programming.
Improving inference
The second updated task is inference. The inference task allows SN1 to produce consistent, high quality outputs capable of utilization in a variety of different forms, such as supporting products like Corcel, Taobot, and our internal chat product, Chattensor. Effective inference allows us to power Chattensor with much higher throughput and create a general all-purpose model, enabling Chattensor to become a more powerful and effective application.
By rotating models on validators, sending requests for inference from miners and checking whether the loaded model agrees with the miner response, SN1’s new inference task stimulates miners to use open-source models. Through seeding, we can check the response of the miner matches exactly. Not only does this effectively test for inference skills, but it also deploys the programming models submitted by miners in the programming task.
Our next release will build upon this by adding an ensembling task. Ensembling refers to the selective merging of competing responses to a query through an ensembling workflow, drawing upon the most relevant aspects of an answer to form a composite that is more accurate than any single source from which it is derived. This leads to better output while also helping to drive miners to submit models of greater intelligence.
More broadly, ensembling can drive Mixture of Agent approaches, which, as Together AI explain in this paper, can yield significant leaps in performance through layered architecture. Our goal is to outperform state of the art, centralized models through ensembling.
System, redesigned
Beyond the specific tasks, SN1’s architecture has been reshaped to promote greater competitiveness. One of our big challenges with driving innovation is maintaining the power imbalance described above. This can be resolved by encouraging miners to upload their content so that they can be seen by all.
In turn, this allows us to set effective limitations upon how miners derive answers. For instance, limiting access to the internet enables us to scrape a news article from a website and challenge miners to recreate the article while limiting their LLMs to non-internet sources.
Alternatively, we could compete with Llama 70bn while limiting miners to 5bn parameters, incentivizing them to find efficient ways to reach higher levels of performance with fewer parameters. However, miners are understandably protective of their success and must be properly incentivised to reveal their code. Otherwise, the risk is that miners would obfuscate their code base to discourage other miners from building upon their work.
Hypothetically, changing the reward curve - for example, through GitHub base branching - can ensure that miners are able to capture more reward for sharing their codebase. Starting with the baseline model that miners must fork, miners will initially experience roughly equal loss and therefore equal reward. However, if one builds poorly on the forked model, no one will build on top of it. But miners who build constructively on the forked model, by writing nice clean code, will enjoy other miners building on top of it.
In the proposed system, each subsequent fork of the effective model bumps the decaying reward, incentivizing miners to share effective code and driving up intelligence across the competition. Much remains to be done before such a system could be deployed, not least in finding effective approaches for assessing the point at which a forked model constitutes a new model entirely.
Nonetheless, these proposals demonstrate the flexibility of the Bittensor protocol, and the extent to which subnets can drive intelligent machine learning through the effectiveness of their design.