Is Llama 3.1 a baby step towards intelligence explosion (AI improving AI)

3 min readAug 18, 2024

While reading [Llama 3.1 paper], I asked the same question many times, i.e. Can AI be used to Improve AI?

A recurring theme in Llama 3.1 paper is the use of its predecessor language models to improve the performance of their next generation of language models. For instance, Llama 2 has been used to filter the data, which subsequently is then used to train Llama 3. This is just one of the many examples to show how Meta is utilizing their previous generation of language models as sort of teachers. Considering the aforementioned example, it is highly hypothesized that the Llama 3.1 will be used in some capacity to help train the next generation of language models at Meta.

“To train a quality classifier based on Llama 2, we create a training set of cleaned web documents, describe the quality requirements, and instruct Llama 2’s chat model to determine if the documents meets these requirements.”

Its kinds of show a baby step towards the intelligence explosion defined by I. J. Good, however at Meta it seems that they are progressing towards it in a systematic manner so that responsible AI standards could be maintained (A very generous and positive presumption). Although its not new in the field of large language models, as many next generation of models leverage their predecessors in some capacity at least. However, Llama 3.1 did it a bit differently by training through a code expert model to help them find the highest quality human annotations for code.

“Expert training. We train a code expert which we use to collect high quality human annotations for code throughout subsequent rounds of post-training.” Llama 3.1 paper

and,

“we train a multilingual expert by branching off the pre-training run and continuing to pre-train on a data mix that consists of 90% multilingual tokens. We then perform post-training on this expert following Section 4.1. This expert model is then used to collect higher quality annotations in non-English languages until pre-training was fully complete.” Llama 3.1 paper

For non-English languages, they trained a multilingual expert model to collect better quality of annotations, which seems appropriate as Meta allows you to use their frontier model for generation of synthetic data. Therefore, researchers can use the generative model to train/fine-tune their smaller models. To the best of my knowledge, such allowance was not provided by any other competitor of Meta, including OpenAI. This allowance also opens new avenues and paradigms for technically using AI to train/improve AI.

Although the motto of using AI to improve AI seems quite interesting and catchy, it’s not quite helpful as mentioned in Llama 3.1 paper.

“However, our initial experiments revealed that training Llama 3 405B on its own generated data is not helpful (and can even degrade performance)”. Llama 3.1 paper

Do note that is different from those last two examples. This is the same model training on its own generated data. But when they introduced execution feedback, it did enable the model to learn from its own mistakes. Llama 3 indeed incorporated verifier models approach during training. In coding, for example, only generations that passed syntax checking and unit tests were used for fine-tuning.

Is Llama 3.1 a baby step towards intelligence explosion (AI improving AI)

Written by Sander Ali Khowaja

No responses yet