Meta Llama 3 AI Models With 8B and 70B Parameters Launched, Said to Outperform Google’s Gemini 1.5 Pro

Internet

Products You May Like

Meta introduced the next generation of its artificial intelligence (AI) models, Llama 3 8B and 70B, on Thursday. Shortened for Large Language Model Meta AI, Llama 3 comes with improved capabilities over its predecessor. The company also adopted new training methods to optimise the efficiency of the models. Interestingly, with Llama 2, the largest model was 70B, but this time the company said its large models will contain more than 400 billion parameters. Notably, a report last week revealed that Meta will unveil its smaller AI models in April and its larger models later in the summer.

Those interested in trying out the new AI models are in luck as Meta is taking a community-first approach with the Llama 3. The new foundation models will be open source just like previous models. Meta stated in its blog post, “Llama 3 models will soon be available on AWS, Databricks, Google Cloud, Hugging Face, Kaggle, IBM WatsonX, Microsoft Azure, NVIDIA NIM, and Snowflake, and with support from hardware platforms offered by AMD, AWS, Dell, Intel, NVIDIA, and Qualcomm.”

The list includes all major cloud, hosting, and hardware platforms, which should make it easier for enthusiasts to get their hands on the AI models. Further, Meta has also integrated Llama 3 with its own Meta AI that can be accessed via Facebook Messenger, Instagram, and WhatsApp in supported countries.

Coming to the performance, the social media giant shared benchmark scores of Llama 3 for both its pre-trained and instruct models. For reference, pre-trained is the general conversational AI whereas the instruct models are aimed at completing specific tasks. The pre-trained model of Llama 3 70B outscored Google’s Gemini 1.0 Pro in the MMLU (79.5 vs 71.8), BIG-Bench Hard (81.3 vs 75.0), and DROP (79.7 vs 74.1) benchmarks, wheres the 70B Instruct model outscored the Gemini 1.5 Pro model in MMLU, HumanEval, and GSM-8K benchmarks, based on data shared by the company.

Meta has opted for a decoder-only transformer architecture for the new AI models but has made several improvements over the predecessor. Llama 3 now uses a tokeniser with a vocabulary of 128K tokens, and the company has adopted grouped query attention (GQA) to improve inference efficiency. GQA helps in improving the attention of the AI so it does not move outside of its designated context when answering queries. The social media giant has pre-trained the models with more than 15T tokens, which it claims to have sourced from publicly available data.


Affiliate links may be automatically generated – see our ethics statement for details.

Products You May Like

Articles You May Like

Researchers Warn of Flaws in Widely Used Industrial Gas Analysis Equipment
Bitcoin slides to two-month low as Fed signals it’s not ready to cut rates yet
macOS Sequoia Beta 2 With iPhone Mirroring Released Alongside New tvOS 18, watchOS 11 Beta Versions
Next Resident Evil Game Is in Development, Capcom Confirms
Global Police Operation Shuts Down 600 Cybercrime Servers Linked to Cobalt Strike

Leave a Reply

Your email address will not be published. Required fields are marked *