NAME
qwen-cli-zig — An attempt to make Gemini 2.5 create a simple chat program with Zig
SYNOPSIS
pip install -rINFO
DESCRIPTION
An attempt to make Gemini 2.5 create a simple chat program with Zig
README
Qwen CLI Zig
This is an attempt at making Gemini 2.5 Pro Exp write usable Zig code. The veredict: do not try unless you have a robust RAG or LoRa solution for the most current Zig documentation or source code.
It will try to write Zig but it doesn´t know what changed in the most current version. If you try to import C header files, it has a very hard time with type casting, pointer casting and makes frequent type and syntax errors. It is a true knighmare.
After spending upwards of USD 50 from OpenRouter.ai, I gave up. This program do load the Qwen3 model in the GPU memory using CUDA, but it's unable to retrieve responses and crashes. And even after dozens of attemps, Gemini was unable to add print/debug probes to figure out the problem.
Introduction
A simple command-line interface written in Zig to interact with Qwen language models (like Qwen3-14B) using the llama.cpp library.
Note: This is currently a skeleton application. The core logic for interacting with the llama.cpp C API still needs to be implemented in src/qwen_cli.zig.
Prerequisites
- Zig Compiler: Ensure you have a recent version of the Zig compiler installed. Check the Ziglang Download Page.
- C/C++ Toolchain: You need
make,cmake, and a C/C++ compiler (gccorclang). - Git: Required for cloning
llama.cpp. - (Optional) GPU SDKs: If you want GPU acceleration:
- NVIDIA: CUDA Toolkit (check Arch Wiki/Manjaro repos for
cuda,nvidia-utils). - AMD: ROCm SDK (check Arch Wiki/Manjaro repos for
rocm-hip-sdk,rocm-opencl-sdk).
- NVIDIA: CUDA Toolkit (check Arch Wiki/Manjaro repos for
Setup
1. Install Build Dependencies (Arch Linux / Manjaro Example)
# Install essential build tools sudo pacman -Syu --needed base-devel cmake gitOptional: Install GPU SDKs if needed (check package names)
sudo pacman -Syu --needed cuda nvidia-utils # For NVIDIA
sudo pacman -Syu --needed rocm-hip-sdk rocm-opencl-sdk # For AMD
2. Prepare llama.cpp Dependency using build.sh
This project includes a helper script build.sh to automate the cloning and building of the required llama.cpp library.
First, make the script executable:
chmod +x build.sh
Then, run the script:
./build.sh
This script will:
- Check for required tools (
git,cmake,cc). - Clone the
llama.cpprepository into avendor/llama.cppsubdirectory if it doesn't exist. - Configure and build
llama.cppas a shared library insidevendor/llama.cpp/build. By default, it builds with NVIDIA CUDA and Flash Attention enabled. Ensure you have the CUDA Toolkit installed (see Prerequisites). The shared library (libllama.so) will be placed invendor/llama.cpp/build/bin. - Print the
zig buildcommand needed to build the main application.
Customizing the llama.cpp Build (Optional):
If you want to disable GPU support or add other specific llama.cpp features (like ROCm for AMD GPUs), you can edit the CMAKE_FLAGS variable near the top of the build.sh script before running it.
- To disable CUDA and build for CPU only:
# build.sh # ... CMAKE_FLAGS="-DBUILD_SHARED_LIBS=ON" # Remove CUDA flags # ... - Example for AMD ROCm (ensure ROCm SDK is installed):
# build.sh # ... CMAKE_FLAGS="-DBUILD_SHARED_LIBS=ON -DLLAMA_ROCM=ON" # Add ROCm flag # ...
Remember to clean the build directory (rm -rf vendor/llama.cpp/build vendor/llama.cpp/bin) before running ./build.sh if you change the CMAKE_FLAGS.
# build.sh
# ...
# Example: CMAKE_FLAGS="-DLLAMA_CUDA=ON -DLLAMA_FLASH_ATTN=ON"
CMAKE_FLAGS="-DBUILD_SHARED_LIBS=ON -DLLAMA_CUDA=ON" # Add your desired flags here
# ...
3. Download Model Files & Convert to GGUF
This application requires the language model to be in the GGUF format to be loaded by llama.cpp.
The provided download_model.sh script helps download the original model files (configuration, tokenizer, and Safetensors weights) from the official Hugging Face repository (Qwen/Qwen3-14B).
Make the script executable:
chmod +x download_model.shRun the script:
./download_model.shThis will download the necessary files into the
models/Qwen3-14B-hfdirectory.Convert the downloaded files to GGUF format:
- IMPORTANT: The downloaded files are NOT directly usable. You must convert them using the
convert.pyscript from thellama.cppproject. - Make sure you have Python installed along with the necessary libraries (usually listed in
llama.cpp'srequirements.txt, e.g.,pip install -r requirements.txtwithin thellama.cppdirectory). - Navigate to your separate
llama.cppclone directory. - Run the conversion script, pointing it to the downloaded model directory and specifying an output file path and type. Example:
(Adjust paths and# Assuming your qwen-cli-zig project is one level up from llama.cpp python convert.py ../qwen-cli-zig/models/Qwen3-14B-hf/ \ --outfile ../qwen-cli-zig/models/qwen3-14b-converted.gguf \ --outtype q5_k_m--outtype(quantization level) as needed. Common types includef16,q4_k_m,q5_k_m,q8_0). Refer tollama.cppdocumentation for details onconvert.py.
- IMPORTANT: The downloaded files are NOT directly usable. You must convert them using the
4. Build qwen_cli
After running ./build.sh successfully to prepare the llama.cpp dependency, simply run the Zig build command:
zig build
The build.zig file is configured to automatically find the llama.cpp headers and library within the vendor directory created by build.sh.
If the build is successful, the executable will be at ./zig-out/bin/qwen_cli.
Running
Execute the compiled program, pointing it to the GGUF file you created during the conversion step:
# Example using the converted file name from the instructions above: ./zig-out/bin/qwen_cli --model models/qwen3-14b-converted.ggufOr provide the full path otherwise:
./zig-out/bin/qwen_cli --model /path/to/your/qwen3-14b-converted.gguf
You can also specify other options:
./zig-out/bin/qwen_cli \
--model /path/to/your/qwen3-14b.Q5_K_M.gguf \
--system "You are a concise assistant." \
--temp 0.5
Type your messages and press Enter. Use Ctrl+D or type /quit to exit.
Development
The main logic for interacting with llama.cpp needs to be implemented in the // TODO: sections within src/qwen_cli.zig. This involves:
- Using the
@cImportedllamafunctions. - Initializing the backend and model context.
- Tokenizing the formatted prompt.
- Setting up sampling parameters.
- Running the inference loop (
llama_decode/llama_eval,llama_sampling_sample,llama_token_to_piece). - Managing context and history.
- Cleaning up resources (
llama_free,llama_free_model,llama_backend_free). Refer to thellama.cppexamples (likeexamples/main/main.cpp) for guidance on using the C API.