Mistral AI’s Open-Source Victory Over Llama2

Written By: Aniqa Batool
Last Updated On: October 14, 2023

Mistral AI, based in Paris and co-founded by former DeepMind and Meta employees, debuted its first major language model, Mistral 7B. This model is easily accessible via GitHub or a 13.4-gigabyte torrent. Even before producing a product, Mistral AI was able to collect considerable seed funding. Mistral 7B, their 7-billion-parameter model, exceeds Llama 2 13B in numerous tests and even outperforms Llama 1 34B in many criteria.

Large Language Models (LLMs) have recently received attention as a result of remarkable models such as ChatGPT. The introduction of Meta’s Llama models reignited interest in open-source LLMs. The goal was to develop low-cost, open-source LLMs that could compete with top-tier models such as GPT-4 without the exorbitant prices or complexity.

This combination of affordability and effectiveness not only provided new opportunities for academics and developers but also cleared the path for advances in natural language processing.

Recently, generative AI businesses have received significant funding. They raised $20 million in total to shape open-source Mistral AI. In June, Anthropic received $450 million, and Cohere received $270 million in collaboration with Google Cloud.

In comparison to models such as Llama 2, Mistral 7B has comparable or superior capabilities while requiring less computing complexity. While basic models like GPT-4 can achieve more, they are more expensive and less user-friendly because they are mostly available via APIs. Mistral 7B competes effectively in coding jobs with CodeLlama 7B and is small enough at 13.4 GB to run on normal machines.

Mistral 7B Instruct, which has been fine-tuned for instructional datasets on Hugging Face, also performs well. On MT-Bench, it outperforms other 7B models and competes with 13B chat models. In a detailed performance examination, Mistral 7B clearly outperforms Llama 2 13B across multiple benchmarks and matches Llama 34B, excelling in code and reasoning benchmarks in particular.

Several things contribute to Mistral 7B’s success. One important aspect is the model’s usage of attention mechanisms, which allow the model to focus on the most significant sections of the input data, resulting in coherent and contextually accurate outputs.

Multi-query attention (MQA) is also included in the model for faster processing while retaining quality. Furthermore, sliding window attention and dilated sliding window attention aid in the efficient management of attention sequences, making Mistral 7B a powerful and efficient model.

The rise of open-source Large Language Models such as Mistral 7B represents a fundamental shift in the AI business, bringing high-quality language models to a wider audience. The revolutionary ideas of Mistral AI guarantee economical efficiency without sacrificing quality.

As the landscape evolves, balancing the power of these models with ethical considerations and safety safeguards will be a focus. Mistral AI’s future goals include the release of progressively larger models, with the goal of becoming a leading player in the business within its first year.