Back to home

Build Your Own Local AI Coding Agent with Gemma 4 and OpenCode

Learn to create a free, private AI coding agent on your own machine using Gemma 4 and OpenCode. This guide covers setup, configuration, and practical examples for automated code generation and debugging.

Audio reading is not available in this browser
Build Your Own Local AI Coding Agent with Gemma 4 and OpenCode

Tags

Quick summary

Learn to create a free, private AI coding agent on your own machine using Gemma 4 and OpenCode. This guide covers setup, configuration, and practical examples for automated code generation and debugging.

Build Your Own Local AI Coding Agent with Gemma 4 and OpenCode

The landscape of AI-assisted software development is shifting rapidly. While cloud-based coding assistants like GitHub Copilot and ChatGPT have become indispensable tools, they come with inherent trade-offs: latency, subscription costs, and—most critically—privacy concerns when sending proprietary code to external servers.

Enter the era of local AI coding agents. With the release of Gemma 4—a powerful open-weight language model optimized for code generation and reasoning—and OpenCode, an open-source framework for building autonomous coding agents, you can now run a fully functional AI coding assistant entirely on your own hardware. This article provides a practical, step-by-step guide to building your own local AI coding agent using these two technologies.

Why Build a Local Coding Agent?

Before diving into the technical details, it's worth understanding the advantages of running a coding agent locally:

  • **Data Privacy**: Your code never leaves your machine. This is essential for enterprises with strict compliance requirements or for developers working on proprietary projects.
  • **Zero Latency**: No network round-trips. Responses are generated as fast as your GPU can compute.
  • **No Subscription Fees**: Once you have the hardware, the software is free and open-source.
  • **Full Control**: You can fine-tune the model, customize the agent's behavior, and integrate it with your existing development workflow without API rate limits.

The primary trade-off is hardware requirements: running a capable model like Gemma 4 locally demands a modern GPU with sufficient VRAM (ideally 16GB or more) and a decent CPU.

Requirements

To follow this guide, you will need:

Hardware

  • **GPU**: NVIDIA GPU with at least 12GB VRAM (16GB+ recommended for Gemma 4 9B). Models with less VRAM may be able to run quantized versions.
  • **RAM**: 32GB system RAM (16GB minimum).
  • **Storage**: 20GB free disk space for models and tools.
  • **Internet**: Required only for initial downloads.

Software

  • **Operating System**: Linux (Ubuntu 22.04/24.04 recommended) or Windows with WSL2.
  • **Python**: 3.10 or later.
  • **CUDA**: 12.1 or later (if using NVIDIA GPU).
  • **Git**: For cloning repositories.

Knowledge

  • Basic familiarity with the command line.
  • Understanding of Python virtual environments.

Step-by-Step Installation

We will install the Gemma 4 model using Hugging Face's `transformers` library, and then set up OpenCode as the agent framework.

Step 1: Set Up the Python Environment

Create a dedicated virtual environment to avoid dependency conflicts.

python3 -m venv gemma-opencode-env
source gemma-opencode-env/bin/activate

This creates and activates a clean Python environment. All subsequent installations will be isolated within it.

Step 2: Install Core Dependencies

Install the main libraries required for running the model and the agent framework.

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install transformers accelerate bitsandbytes
pip install opencode

The first line installs PyTorch with CUDA 12.1 support. The second line installs Hugging Face's `transformers` (for loading Gemma 4), `accelerate` (for efficient multi-GPU usage), and `bitsandbytes` (for 4-bit quantization to reduce VRAM usage). The third line installs the OpenCode framework.

Step 3: Download the Gemma 4 Model

Gemma 4 is available on Hugging Face under a permissive license. You need to accept the terms of use on the model card before downloading.

First, install the Hugging Face CLI and log in:

pip install huggingface-hub
huggingface-cli login

This will prompt you for an access token. Generate one from your Hugging Face account settings (Settings → Access Tokens). Then, download the model:

huggingface-cli download google/gemma-4-9b-it --local-dir ./gemma-4-9b-it

This downloads the instruction-tuned version of Gemma 4 (9B parameters). If you have limited VRAM, consider using the 2B version (`google/gemma-4-2b-it`) or apply quantization during loading.

Step 4: Configure OpenCode

OpenCode uses a YAML configuration file. Create a file named `opencode_config.yaml` in your working directory:

model:
  name: "google/gemma-4-9b-it"
  load_in_4bit: true
  device_map: "auto"

agent:
  max_tokens: 2048
  temperature: 0.2
  top_p: 0.95
  system_prompt: "You are an expert software engineer. Write clean, efficient, and well-documented code. Always explain your reasoning briefly."
  
tools:
  - file_reader
  - file_writer
  - shell_executor
  - code_analyzer

This configuration tells OpenCode to:

  • Load Gemma 4 in 4-bit quantized mode (reduces VRAM usage by ~4x).
  • Use a conservative temperature (0.2) for deterministic code generation.
  • Enable file manipulation and shell execution tools.

Step 5: Launch the Agent

With everything configured, start the agent:

opencode --config opencode_config.yaml

You should see output indicating the model is loading and the agent is ready. The first launch may take a few minutes as the model is loaded into memory and quantized.

Usage Examples

Once the agent is running, you can interact with it through a terminal interface. Here are practical examples of what you can ask it to do.

Example 1: Generate a Python Script

**Prompt:** "Create a Python script that reads a CSV file, cleans missing values by filling them with the column mean, and saves the cleaned data to a new file."

The agent will internally call the `file_writer` tool to create the script, then optionally use `shell_executor` to test it. You'll see output like:

Agent: I'll create a script called `clean_csv.py`. Here's the implementation:

import pandas as pd
import sys

def clean_csv(input_file, output_file):
    df = pd.read_csv(input_file)
    for col in df.columns:
        if df[col].dtype in ['float64', 'int64']:
            df[col].fillna(df[col].mean(), inplace=True)
    df.to_csv(output_file, index=False)
    print(f"Cleaned data saved to {output_file}")

if __name__ == "__main__":
    if len(sys.argv) != 3:
        print("Usage: python clean_csv.py <input.csv> <output.csv>")
        sys.exit(1)
    clean_csv(sys.argv[1], sys.argv[2])

Example 2: Debug a Code Snippet

**Prompt:** "Here's a buggy function. Find and fix the issue: `def add(a, b): return a - b`"

The agent will analyze the code and respond:

Agent: The bug is a logical error in the arithmetic operation. The function is named `add` but performs subtraction. Here's the corrected version:

def add(a, b):
    return a + b

Example 3: Refactor a Codebase

**Prompt:** "Refactor the following JavaScript function to use modern ES6+ syntax: `function multiply(x, y) { return x * y; }`"

Agent: Here's the ES6+ refactored version using arrow functions:

const multiply = (x, y) => x * y;

The agent can also handle multi-file projects by using its file reading and writing tools in sequence.

Example 4: Automated Test Generation

**Prompt:** "Generate unit tests in pytest for the `clean_csv` function we created earlier."

The agent will create a `test_clean_csv.py` file with appropriate test cases, covering edge cases like empty files, missing values, and non-numeric columns.

Performance Tuning

If you encounter performance issues (slow responses or out-of-memory errors), try these adjustments:

Reduce Model Size

Use the 2B parameter version instead of 9B:

model:
  name: "google/gemma-4-2b-it"

Increase Quantization

Use 8-bit instead of 4-bit for slightly better quality but higher VRAM usage:

load_in_8bit: true

Limit Context Window

Reduce `max_tokens` in the agent configuration to 1024 or 512 for faster responses on shorter tasks.

CPU Offloading

If you have limited VRAM but ample system RAM, enable CPU offloading:

# In a custom launch script
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-4-9b-it",
    device_map="auto",
    offload_folder="./offload"
)

Integrating with Your IDE

For a seamless development experience, you can connect OpenCode to your editor. The simplest method is to use a terminal multiplexer like `tmux` or `screen` to run the agent alongside your editor.

For VS Code users, install the "OpenCode VS Code Extension" (if available) or use the built-in terminal panel to interact with the agent without leaving your editor.

Conclusion

Building your own local AI coding agent with Gemma 4 and OpenCode is not only feasible but surprisingly straightforward. By following the steps outlined in this article, you now have a fully functional, privacy-preserving, and cost-free coding assistant running on your own hardware.

The key takeaways are:

  • **Gemma 4** provides state-of-the-art code generation capabilities in an open-weight format.
  • **OpenCode** offers a flexible framework for creating autonomous agents with tool-use capabilities.
  • **Local deployment** eliminates privacy concerns and network latency, while giving you complete control.

As the open-source AI ecosystem continues to mature, tools like Gemma 4 and OpenCode are democratizing access to advanced AI assistance. The ability to run capable coding agents locally is no longer a futuristic vision—it's a practical reality available to any developer with a decent GPU.

Start experimenting today. Your local coding agent is just a few commands away.

Sources

FAQ

What is this article about?

This article covers “Build Your Own Local AI Coding Agent with Gemma 4 and OpenCode” in the AI coding category. Learn to create a free, private AI coding agent on your own machine using Gemma 4 and OpenCode. This guide covers setup, configuration, and practical examples for automated code generation and debugging.

Who is this useful for?

It is useful for readers who want a practical understanding of AI tools, models, and workflows.

What should I do next?

Read the article, review the listed sources, and test the most relevant ideas in your own workflow.