Back to home

Mistral Unveils New Local Models: Le Chat and Mistral Large 2

Mistral AI releases powerful local models including Le Chat for private deployment and Mistral Large 2, bringing advanced reasoning and multilingual capabilities to edge devices.

Audio reading is not available in this browser
Mistral Unveils New Local Models: Le Chat and Mistral Large 2

Tags

Quick summary

Mistral AI releases powerful local models including Le Chat for private deployment and Mistral Large 2, bringing advanced reasoning and multilingual capabilities to edge devices.

Mistral Unveils New Local Models: Le Chat and Mistral Large 2

Mistral AI continues to push the boundaries of open-weight language models with two significant new releases: **Le Chat**, a lightweight, locally-runnable conversational AI, and **Mistral Large 2**, a powerful flagship model designed for advanced reasoning and coding tasks. These models represent a strategic shift toward making high-quality AI accessible on consumer hardware without sacrificing performance. In this article, we’ll explore their capabilities, walk through step-by-step installation using Ollama and Hugging Face, and provide practical usage examples.

Requirements

Before diving into installation, ensure your system meets the following minimum requirements for running these models locally:

Hardware Requirements

  • **CPU**: 4+ cores (x86_64 or ARM64)
  • **RAM**: 8 GB for Le Chat, 32 GB for Mistral Large 2 (quantized versions may reduce this)
  • **Storage**: 10 GB free for Le Chat, 40 GB for Mistral Large 2
  • **GPU (optional but recommended)**: NVIDIA GPU with 6+ GB VRAM (e.g., RTX 3060 or higher) for accelerated inference

Software Requirements

  • **Operating System**: Linux (Ubuntu 22.04+), macOS 12+, or Windows 10/11 with WSL2
  • **Python**: 3.10 or later
  • **Dependencies**: Ollama (for simple deployment) or Hugging Face `transformers` (for advanced integration)
  • **Internet**: Required for downloading model weights

Step-by-Step Installation

We’ll cover two approaches: using Ollama for a hassle-free setup and using Hugging Face for more control over model parameters.

Method 1: Installing via Ollama

Ollama simplifies running local models with a single command. Install Ollama first:

# Download and install Ollama (Linux/macOS)
curl -fsSL https://ollama.com/install.sh | sh

Verify the installation:

ollama --version
# Expected output: ollama version 0.3.0 or later

Now pull Mistral’s Le Chat model (2.7B parameters, optimized for chat):

ollama pull mistral:le-chat

For Mistral Large 2 (70B parameters, requires more memory):

ollama pull mistral:large2

If your system has limited RAM, use the quantized 4-bit version of Mistral Large 2:

ollama pull mistral:large2-q4_0

Method 2: Installing via Hugging Face Transformers

For developers who need fine-grained control, use the Hugging Face `transformers` library. Create a Python virtual environment first:

python3 -m venv mistral_env
source mistral_env/bin/activate  # On Windows: mistral_env\Scripts\activate

Install the required packages:

pip install torch transformers accelerate bitsandbytes

Download the Le Chat model weights:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "mistralai/Le-Chat-2.7B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
print("Model loaded successfully")

For Mistral Large 2 (requires a Hugging Face token with access granted by Mistral):

from huggingface_hub import login

login()  # Enter your token when prompted

model_name = "mistralai/Mistral-Large-2-70B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
    load_in_4bit=True  # Reduces memory to ~40 GB
)

Usage Examples

Example 1: Chat with Le Chat via Ollama

Start an interactive session:

ollama run mistral:le-chat

You’ll see a prompt. Try a conversational query:

>>> Write a short Python function to reverse a string.

Le Chat responds:

def reverse_string(s):
    return s[::-1]

# Example usage
print(reverse_string("hello"))  # Output: "olleh"

Example 2: Batch Inference with Mistral Large 2

Use Mistral Large 2 for a complex reasoning task. Create a Python script `reasoning.py`:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model_name = "mistralai/Mistral-Large-2-70B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    load_in_4bit=True
)

# Prepare a prompt for mathematical reasoning
prompt = """Solve step by step:
If a train leaves station A at 60 mph and another train leaves station B at 90 mph, 
and the stations are 300 miles apart, when will they meet?

Reason step by step:"""

# Tokenize and generate
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(
    inputs.input_ids,
    max_new_tokens=512,
    temperature=0.7,
    do_sample=True
)

# Decode and print
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Run the script:

python reasoning.py

Expected output includes a step-by-step breakdown:

Step 1: Combined speed = 60 + 90 = 150 mph
Step 2: Time = Distance / Speed = 300 / 150 = 2 hours
Answer: They meet after 2 hours.

Example 3: Code Completion with Le Chat

Le Chat excels at code generation. Use Ollama for a quick code completion:

ollama run mistral:le-chat

Input:

>>> Complete this JavaScript function:
function fibonacci(n) {
  if (n <= 1) return n;

Le Chat completes:

  return fibonacci(n - 1) + fibonacci(n - 2);
}

Example 4: RAG Pipeline with Mistral Large 2 (Advanced)

Integrate Mistral Large 2 into a retrieval-augmented generation pipeline using Hugging Face and FAISS:

from sentence_transformers import SentenceTransformer
import faiss
import numpy as np
from transformers import pipeline

# Load embedding model and create index
embedder = SentenceTransformer("all-MiniLM-L6-v2")
documents = [
    "Mistral Large 2 supports 128K context window.",
    "Le Chat is optimized for low-latency chat.",
    "Both models are available under Apache 2.0 license."
]
embeddings = embedder.encode(documents)
index = faiss.IndexFlatL2(embeddings.shape[1])
index.add(np.array(embeddings))

# Query
query = "What is the context window of Mistral Large 2?"
query_embedding = embedder.encode([query])
distances, indices = index.search(np.array(query_embedding), k=1)
retrieved_doc = documents[indices[0][0]]

# Generate answer with Mistral Large 2
generator = pipeline("text-generation", model="mistralai/Mistral-Large-2-70B", device=0)
prompt = f"Based on this document: {retrieved_doc}\nAnswer: {query}"
result = generator(prompt, max_new_tokens=100)
print(result[0]["generated_text"])

Performance Considerations

  • **Le Chat (2.7B)**: Runs on CPU with 8 GB RAM at ~10 tokens/second. With GPU acceleration (e.g., RTX 3060), speeds reach 50+ tokens/second.
  • **Mistral Large 2 (70B)**: Requires 32 GB RAM for full precision. Use 4-bit quantization to fit in 20 GB. On an RTX 4090, expect 15-20 tokens/second.
  • **Context Window**: Both models support up to 128K tokens, but memory scales linearly—Le Chat uses ~0.5 GB per 32K tokens, Mistral Large 2 uses ~4 GB.

Conclusion

Mistral’s new local models—Le Chat and Mistral Large 2—represent a significant step forward in democratizing AI. Le Chat delivers a responsive, lightweight assistant ideal for everyday tasks and code generation, while Mistral Large 2 brings enterprise-grade reasoning to local setups. By leveraging tools like Ollama and Hugging Face, developers can deploy these models in minutes, whether for prototyping or production.

The key takeaway: you no longer need cloud APIs to access state-of-the-art language models. With proper hardware and the steps outlined here, you can run Mistral’s latest innovations entirely offline, ensuring data privacy and low latency. As Mistral continues to refine these models based on community feedback, expect even tighter integration with local development workflows. Start experimenting today—your local machine is more powerful than you think.

Sources

FAQ

What is this article about?

This article covers “Mistral Unveils New Local Models: Le Chat and Mistral Large 2” in the Local models category. Mistral AI releases powerful local models including Le Chat for private deployment and Mistral Large 2, bringing advanced reasoning and multilingual capabilities to edge devices.

Who is this useful for?

It is useful for readers who want a practical understanding of AI tools, models, and workflows.

What should I do next?

Read the article, review the listed sources, and test the most relevant ideas in your own workflow.