Fine-tuning, finely tuned: How SN37 is delivering SOTA fine-tuning on Bittensor

Taoverse has resurrected fine-tuning on the Bittensor ecosystem. Here’s how they’re working with us on Subnet 37.

Jul 22, 2024

The most exciting thing about Bittensor is how subnets can connect together by sharing information to strengthen one-another as they improve individually. Our vision has always been to facilitate end-to-end AI creation in Bittensor. With Taoverse’s launch of Subnet 37, we’re closing the loop, with dataset creation, pre-training, fine-tuning and development each self-reinforcing one another within the Bittensor ecosystem.

After the launch of SN25, we wanted to share our vision for the subnet and what we wanted to achieve with it. To that end, we recently published the weight request form that we submitted to the OpenTensor Foundation. We’re continuing that tradition with the latest subnet we are involved with: supporting Taoverse on their launch of SN37.

Why do we need a fine-tuning subnet?

Fine-tuning as a critical step in bringing the entire AI development pipeline into Bittensor. We believe that this will strongly support Bittensor’s founding vision: that AI can be built in an economical, safe and decentralized way.

Fine-tuning is costly, time consuming and highly limited by expertise. It requires hundreds of GPU hours, typically requiring SOTA hardware and, perhaps most importantly, it requires expert engineers with the know-how to do it well. Those people are hard to find.

With SN37, we’re looking to address these challenges by creating a subnet which outsources the procurement process of the necessary computational resources and incentivizes the best AI developers in the world to monetize their skills and experience by competing to produce the best models.

What is SN37?

Subnet 37 is a framework for running multiple fine-tuning competitions in parallel. Over time, we will build a catalog of open-source SOTA models, each optimized for specialized tasks such as chat-bots, reasoning systems, programming assistants, and web-enabled agents. Models in the catalog will be publicly available on Hugging Face for download and hosted on UI front-ends for users to directly interact with.

Furthermore, we aim to integrate the models we create into subnet 1 as base models for future agentic assistants. We see Subnets 37 and 1 evolving together to produce ever-improving AI assistants. This will provide the additional benefit of real user feedback through Subnet 1’s chat application, which will be used to continuously refine the models and their capabilities.

How Subnet 37 functions

Subnet 37 rewards miners for producing fine-tuned models according to the competition parameters. It acts like a continuous benchmark, rewarding miners for achieving the best losses on randomly sampled data from the competition data set.

The subnet architecture is similar to SN9, the pre-training subnet within Bittensor. The subnet is launching with a competition based on synthetic data that is continuously produced by SN18. Future competitions will draw from a wider range of data sources, depending on the competition’s aims. SN37 will also build on models pre-trained on SN9 for the initial subnet competition.

Communication on SN37

Miners on the subnet will fine-tune models offline, before uploading the model to Hugging Face and metadata to the chain that identifies the subnet competition they are participating in as well as the Hugging Face repository, commit, and hash for the model. Chain uploads are rate limited by Bittensor to every ~20 minutes per hotkey. The payload itself is quite small, mostly a pointer to the Hugging Face repository and a hash for verification.

The subnet uses chain metadata to pass verified, secure information between miners and validators. Validators will regularly poll the chain for new metadata uploaded by miners so they can download the models and evaluate them. This occurs in a separate thread from the evaluation loop so that new models get picked up as fast as possible.

The hash for the model is encrypted with the hotkey of the uploader to ensure that attackers can’t copy commits directly from the chain. Models are also uploaded to a private repository by default to ensure that attackers can’t monitor Hugging Face repositories for updates.

In addition, each competition defines a minimum delay between model upload time and when it can be first evaluated. This guarantees that the model will be evaluated on data generated after the model was uploaded, so it can’t be overfitted to evaluation data.

Fine-tuning competitions

Like SN9, SN37 will operate on a winner-takes-all basis within each competition on the subnet, to incentivise the development of the best models for each use case. Competitions will be specified independently with a defined split of emissions from the subnet.

The reward mechanism works as follows:

Miners train and periodically publish competition specific models to Hugging Face and commit the metadata for that model to the Bittensor chain.
Validators download the models from Hugging Face for each miner based on the Bittensor chain metadata and continuously evaluate them. For each competition, only the top model will receive incentive. Validators will also log results to wandb.

Competitions will be specified independently with a defined split of emissions from the subnet. Competitions each have unique parameters that define the model(s), tokenizer(s), size(s), and sequence length(s) that miners will be evaluated against.

The subnet is launching with a competition to produce the best chatbot by fine-tuning the best pretrained model from subnet 9. The evaluation is performed on fresh, authenticated, synthetic data generated by subnet 18.

Future competitions will be drawn from various other well-vetted data sources like other subnets (such as Subnet 1) or high quality Hugging Face datasets of sufficient size.

Managing data quality and security

For the initial competition, validators evaluate miner-submitted model quality by checking its performance against a randomly generated dataset. This dataset is created by taking a random sample of synthetic prompt/responses from Subnet 18. To ensure data quality and integrity, only data that passes the following criteria is eligible to be part of the dataset:

The data must be submitted by a Subnet 18 validator with >100k stake. The data is signed with the validator’s hotkey to ensure authenticity.
The data must have been generated within the past 4 hours.
The prompt/response must not contain an undesirable phrase. This will filter out samples which do not bring value to the dataset.

The criteria above protect against all known vulnerabilities that miner’s could exploit to inject favorable/malicious data into the fine-tuning validator’s validation dataset. The above criteria are our contributions to the design of the fine-tuning subnet and are improvements to both its security and overall quality compared to the previous implementation of fine-tuning in Bittensor.

Furthermore, we have created a flexible interface which enables other subnet data stores to be straightforwardly incorporated into the incentive mechanism. Currently, we are working with the Subnet 1 team to refine their synthetic data generation process for use as an additional data source in the future. The benefit of using Subnet 1 data is that it contains tasks beyond straightforward question/answer pairs, which in turn will improve the fine-tuned models.

Compute requirements for SN37

Miners participating in this subnet require a strong mix of AI training expertise and available compute, because they need to beat all other miners in the competition to receive incentive. By our estimates, top teams can achieve efficiency gains of 10x or more by utilizing their training expertise, which creates a highly competitive arena for the subnet.

Since the protocol for this subnet is entirely on the chain, anyone can participate by looking at the metadata and downloading the associated models. Additional Mining and Validation APIs have been released to help ensure it is as easy as possible for contributors to build on top of this subnet.

There is also a live leaderboard that regularly updates and shows the performance of the top models on the subnet. There are future plans to expose an API or website that will run the current best model for each competition in a way that allows for direct interaction.

Validators on SN37 will require:

3TB of disk space
GPU with at least 48 GB of VRAM and at least 38 TFLOPs for half precision (bfloat16) operations. (This is a ~medium range card such as an A40 as opposed to an H100).

Working with the Bittensor ecosystem

This subnet is, to our knowledge, the most integrated subnet in Bittensor up to this date. At launch, this subnet uses:

Synthetic data from Subnet 18 as the training a validation dataset for the initial competition.
Pretrained models from Subnet 9 as the foundational model to fine-tune

In the near future we also plan to integrate with Subnet 1 as an additional data provider for training and evaluation data. Also, we are exploring the use of Subnet 37 models as the base models in Subnet 1.

Once miners have produced a high-quality model, it’ll be integrated into a UI frontend, with further opportunities for collaboration there.

What’s coming next?

Since launch, we have already expanded the competition framework on SN37 to create a flexible Competition Schedule that allows for adjusting competition constraints, reward ratios, and evaluation strategies at specified future blocks.

In the short-term, we’ll be working with the Taoverse team on the early work needed to scale the subnet and improve its codebase. We also plan to build on the work of the Nous Research team on SN6, as well as our experience running SN9.

Our medium-term goal is to expand into multiple competitions within the subnet, so we can deliver an increasing range of models that are fine-tuned to different use cases. Instead of using synthetic Q&A data from Subnet 18, the evaluation dataset will be a synthetically generated multiple choice dataset, produced by Subnet 1. The dataset will:

Include a diverse set of topics
Include a variety of tasks, including Q&A based on Wikipedia information, mathematical questions and sentiment analysis questions.
Be in multiple choice format, allowing for much more objective evaluation of model performance on the dataset.

The goal of this competition is to more closely align the incentives with model performance by using an evaluation technique similar to the open LLM leaderboard on Hugging Face. The top models from each competition will also be hosted directly on a leaderboard so anyone can interact with them.

In the long-term, we’re looking to build a directable marketplace for AI fine-tuning. One where anyone can submit the scope for a competition and see miners respond with the best-performing models that are fine-tuned to that use case.

We also see SN37 completing the loop of AI model development within Bittensor. Already, SN37 is the most integrated subnet on Bittensor, relying on synthetic data from SN18 and pre-trained models on SN9. Integration with SN1 is a priority for us as well, using SN1 as an additional data source as well as a specific use case for model development on SN9.

Ultimately, we see SN37 becoming the foundation for other subnets as the Bittensor ecosystem expands and starts to address new use cases. That virtuous cycle of model-building across multiple subnets is critical for delivering an AI system-of-systems on Bittensor.