Meta Launches Largest Open-Source AI Model to Date

Yesterday, Meta announced the release of its latest open-source AI model, Llama 3.1 405B, which boasts a staggering 405 billion parameters. Parameters are akin to a model’s problem-solving capabilities, and models with more parameters tend to exhibit superior performance.

 

While Llama 3.1 405B isn’t the largest open-source model ever, it is the most significant in recent years. Trained with the help of 16,000 Nvidia H100 GPUs, it utilises modern training and development techniques that Meta asserts make it competitive with leading proprietary models like OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet, albeit with some limitations.

 

As with previous Meta models, Llama 3.1 405B can be downloaded or accessed via cloud platforms such as AWS, Azure, and Google Cloud. It is also being deployed on WhatsApp and Meta.ai, where it supports a chatbot experience for users in the U.S.

 

What Can Llama 3.1 405B Do?

 

Like other generative AI models, Llama 3.1 405B can handle a variety of tasks, from coding and solving basic maths problems to summarising documents in eight languages (English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai). It’s a text-only model, so it can’t process images, but it can manage text-based workloads such as analysing PDFs and spreadsheets.

 

Meta is actively experimenting with multimodality. In a paper released yesterday, company researchers noted their work on developing Llama models that can understand images, videos, and speech. However, these models are not yet available to the public.

 

How Was Llama 3.1 405B Trained?

 

To develop Llama 3.1 405B, Meta used a dataset of 15 trillion tokens (approximately 750 billion words) dated up to 2024. While the dataset is not entirely new, Meta refined its data curation processes and adopted more rigorous quality assurance and filtering methods for this model.

 

The model was also fine-tuned using synthetic data generated by other AI models. This approach is common among major AI vendors, although some experts caution that synthetic data should be a last resort due to its potential to introduce bias.

 

What Are the Model’s Capabilities?

 

Llama 3.1 405B features a larger context window than its predecessors, accommodating 128,000 tokens, which is roughly the length of a 50-page book. A larger context window allows the model to summarise longer text passages and maintain the context of recent discussions in chatbot applications.

 

Meta also introduced two smaller models, Llama 3.1 8B and Llama 3.1 70B, which share the 128,000-token context window. These models are updated versions of Llama 3 8B and Llama 3 70B, originally released in April, and represent a significant upgrade in context capacity.

 

How Does Llama 3.1 405B Fit Into the AI Ecosystem?

 

Llama 3.1 405B can utilise third-party tools, apps, and APIs to complete tasks, similar to models from Anthropic and OpenAI. Out of the box, it is integrated with Brave Search for recent event queries, the Wolfram Alpha API for maths and science questions, and a Python interpreter for code validation.

 

According to benchmarks, Llama 3.1 405B is highly capable, performing on par with OpenAI’s GPT-4 and achieving mixed results compared to GPT-4o and Claude 3.5 Sonnet. It excels in executing code and generating plots but is less proficient in multilingual capabilities and general reasoning.

 

What’s Meta’s Strategy with Llama 3.1?

 

Meta is pushing its smaller models, Llama 3.1 8B and Llama 3.1 70B, for general-purpose applications like chatbots and code generation, while reserving Llama 3.1 405B for tasks such as model distillation and generating synthetic data.

 

To support synthetic data applications, Meta has updated Llama’s licence, allowing developers to use outputs from the Llama 3.1 family to create third-party generative AI models. However, this licence still restricts deployment for app developers with over 700 million monthly users, who must obtain a special licence from Meta.

 

In a letter published yesterday, Meta CEO Mark Zuckerberg emphasised the goal of making AI tools and models widely accessible to developers worldwide. This strategy aims to establish Meta as a leader in the AI space by fostering an ecosystem of tools and models.

 

Meta’s Llama models have already garnered significant attention, with over 300 million downloads and more than 20,000 derived models created to date. Despite the challenges, including energy-related reliability issues during training, Meta is committed to refining and scaling its AI models to maintain its competitive edge.