GEMINI-CLI-OPENAI(1)

NAME

gemini-cli-openaiExpose Gemini CLI endpoints as OpenAI API with Cloudflare Workers

SYNOPSIS

$npm install -g wrangler

INFO

865 stars
209 forks
0 views
TypeScriptAI & LLM

DESCRIPTION

Expose Gemini CLI endpoints as OpenAI API with Cloudflare Workers

README

🚀 Gemini CLI OpenAI Worker

Transform Google's Gemini models into OpenAI-compatible endpoints using Cloudflare Workers. Access Google's most advanced AI models through familiar OpenAI API patterns, powered by OAuth2 authentication and the same infrastructure that drives the official Gemini CLI.

✨ Features

  • 🔐 OAuth2 Authentication - No API keys required, uses your Google account
  • 🎯 OpenAI-Compatible API - Drop-in replacement for OpenAI endpoints
  • 📚 OpenAI SDK Support - Works with official OpenAI SDKs and libraries
  • 🖼️ Vision Support - Multi-modal conversations with images (base64 & URLs)
  • 🔧 Tool Calling Support - Function calling with Gemini API integration
  • 🧠 Advanced Reasoning - Support for Gemini's thinking capabilities with effort controls
  • 🛡️ Content Safety - Configurable Gemini moderation settings
  • 🌐 Third-party Integration - Compatible with Open WebUI, ChatGPT clients, and more
  • Cloudflare Workers - Global edge deployment with low latency
  • 🔄 Smart Token Caching - Intelligent token management with KV storage
  • 🆓 Free Tier Access - Leverage Google's free tier through Code Assist API
  • 📡 Real-time Streaming - Server-sent events for live responses with token usage
  • 🎭 Multiple Models - Access to latest Gemini models including experimental ones

🤖 Supported Models

Model IDContext WindowMax TokensThinking SupportDescription
gemini-2.5-pro1M65KLatest Gemini 2.5 Pro model with reasoning capabilities
gemini-2.5-flash1M65KFast Gemini 2.5 Flash model with reasoning capabilities
gemini-2.5-flash-lite1M65KLightweight version of Gemini 2.5 Flash model with reasoning capabilities

Note: Gemini 2.5 models have thinking enabled by default. The API automatically manages this:

  • When real thinking is disabled (environment), thinking budget is set to 0 to disable it
  • When real thinking is enabled (environment), thinking budget defaults to -1 (dynamic allocation by Gemini)

Thinking support has two modes:

  • Fake thinking: Set ENABLE_FAKE_THINKING=true to generate synthetic reasoning text (good for testing)
  • Real thinking: Set ENABLE_REAL_THINKING=true to use Gemini's native reasoning capabilities

Real thinking is controlled entirely by the ENABLE_REAL_THINKING environment variable. You can optionally set a "thinking_budget" in your request (token limit for reasoning, -1 for dynamic allocation, 0 to disable thinking entirely).

  • Reasoning Effort Support: You can control the reasoning effort of thinking models by including reasoning_effort in the request body (e.g., extra_body or model_params). This parameter allows you to fine-tune the model's internal reasoning process, balancing between speed and depth of thought.
    • none: Disables thinking (thinking_budget = 0).
    • low: Sets thinking_budget = 1024.
    • medium: Sets thinking_budget = 12288 for flash models, 16384 for other models.
    • high: Sets thinking_budget = 24576 for flash models, 32768 for other models.

      Set STREAM_THINKING_AS_CONTENT=true to stream reasoning as content with <thinking> tags (DeepSeek R1 style) instead of using the reasoning field.

�🛠️ Setup

Prerequisites

  1. Google Account with access to Gemini
  2. Cloudflare Account with Workers enabled
  3. Wrangler CLI installed (npm install -g wrangler)

Step 1: Get OAuth2 Credentials

You need OAuth2 credentials from a Google account that has accessed Gemini. The easiest way to get these is through the official Gemini CLI.

Using Gemini CLI

  1. Install Gemini CLI:

    npm install -g @google/gemini-cli
    
  2. Start the Gemini CLI:

    gemini
    
  3. Authenticate with Google:

    Select ● Login with Google.

    A browser window will now open prompting you to login with your Google account.

  4. Locate the credentials file:

    Windows:

    C:\Users\USERNAME\.gemini\oauth_creds.json
    

    macOS/Linux:

    ~/.gemini/oauth_creds.json
    
  5. Copy the credentials: The file contains JSON in this format:

    {
      "access_token": "ya29.a0AS3H6Nx...",
      "refresh_token": "1//09FtpJYpxOd...",
      "scope": "https://www.googleapis.com/auth/cloud-platform ...",
      "token_type": "Bearer",
      "id_token": "eyJhbGciOiJSUzI1NiIs...",
      "expiry_date": 1750927763467
    }
    

Step 2: Create KV Namespace

# Create a KV namespace for token caching
wrangler kv namespace create "GEMINI_CLI_KV"

Note the namespace ID returned. Update wrangler.toml with your KV namespace ID:

kv_namespaces = [
  { binding = "GEMINI_CLI_KV", id = "your-kv-namespace-id" }
]

Step 3: Environment Setup

Create a .dev.vars file:

# Required: OAuth2 credentials JSON from Gemini CLI authentication
GCP_SERVICE_ACCOUNT={"access_token":"ya29...","refresh_token":"1//...","scope":"...","token_type":"Bearer","id_token":"eyJ...","expiry_date":1750927763467}

Optional: Google Cloud Project ID (auto-discovered if not set)

GEMINI_PROJECT_ID=your-project-id

Optional: API key for authentication (if not set, API is public)

When set, clients must include "Authorization: Bearer <your-api-key>" header

Example: sk-1234567890abcdef1234567890abcdef

OPENAI_API_KEY=sk-your-secret-api-key-here

For production, set the secrets:

wrangler secret put GCP_SERVICE_ACCOUNT
wrangler secret put OPENAI_API_KEY  # Optional, only if you want authentication

Step 4: Deploy

# Install dependencies
npm install

Deploy to Cloudflare Workers

npm run deploy

Or run locally for development

npm run dev

🔧 Configuration

Environment Variables

Core Configuration

VariableRequiredDescription
GCP_SERVICE_ACCOUNTOAuth2 credentials JSON string.
GEMINI_PROJECT_IDGoogle Cloud Project ID (auto-discovered if not set).
OPENAI_API_KEYAPI key for authentication. If not set, the API is public.

Thinking & Reasoning

VariableDescription
ENABLE_FAKE_THINKINGEnable synthetic thinking output for testing (set to "true").
ENABLE_REAL_THINKINGEnable real Gemini thinking output (set to "true").
STREAM_THINKING_AS_CONTENTStream thinking as content with <thinking> tags (DeepSeek R1 style).

Model & Feature Flags

VariableDescription
ENABLE_AUTO_MODEL_SWITCHINGEnable automatic fallback from pro to flash models on rate limits (set to "true").
ENABLE_GEMINI_NATIVE_TOOLSMaster switch to enable all native tools (set to "true").
ENABLE_GOOGLE_SEARCHEnable Google Search native tool (set to "true").
ENABLE_URL_CONTEXTEnable URL Context native tool (set to "true").
GEMINI_TOOLS_PRIORITYSet tool priority: "native_first" or "custom_first".
ALLOW_REQUEST_TOOL_CONTROLAllow request parameters to override tool settings (set to "false" to disable).
ENABLE_INLINE_CITATIONSInject markdown citations for search results (set to "true" to enable).
INCLUDE_GROUNDING_METADATAInclude raw grounding metadata in the stream (set to "false" to disable).

Content Safety

VariableDescription
GEMINI_MODERATION_HARASSMENT_THRESHOLDSets the moderation threshold for harassment content.
GEMINI_MODERATION_HATE_SPEECH_THRESHOLDSets the moderation threshold for hate speech content.
GEMINI_MODERATION_SEXUALLY_EXPLICIT_THRESHOLDSets the moderation threshold for sexually explicit content.
GEMINI_MODERATION_DANGEROUS_CONTENT_THRESHOLDSets the moderation threshold for dangerous content.

For safety thresholds, valid options are: BLOCK_NONE, BLOCK_FEW, BLOCK_SOME, BLOCK_ONLY_HIGH, HARM_BLOCK_THRESHOLD_UNSPECIFIED.

Authentication Security:

  • When OPENAI_API_KEY is set, all /v1/* endpoints require authentication.
  • Clients must include the header: Authorization: Bearer <your-api-key>.
  • Without this environment variable, the API is publicly accessible.
  • Recommended format: sk- followed by a random string (e.g., sk-1234567890abcdef...).

Thinking Models:

  • Fake Thinking: When ENABLE_FAKE_THINKING is set to "true", models marked with thinking: true will generate synthetic reasoning text before their actual response.
  • Real Thinking: When ENABLE_REAL_THINKING is set to "true", requests with include_reasoning: true will use Gemini's native thinking capabilities.
  • Real thinking provides genuine reasoning from Gemini and requires thinking-capable models (like Gemini 2.5 Pro/Flash).
  • You can control the reasoning token budget with the thinking_budget parameter.
  • By default, reasoning output is streamed as reasoning chunks in the OpenAI-compatible response format.
  • When STREAM_THINKING_AS_CONTENT is also set to "true", reasoning will be streamed as regular content wrapped in <thinking></thinking> tags (DeepSeek R1 style).
  • Optimized UX: The </thinking> tag is only sent when the actual LLM response begins, eliminating awkward pauses between thinking and response.
  • If neither thinking mode is enabled, thinking models will behave like regular models.

Auto Model Switching:

  • When ENABLE_AUTO_MODEL_SWITCHING is set to "true", the system will automatically fall back from gemini-2.5-pro to gemini-2.5-flash when encountering rate limit errors (HTTP 429 or 503).
  • This provides seamless continuity when the Pro model quota is exhausted.
  • The fallback is indicated in the response with a notification message.
  • Only applies to supported model pairs (currently: pro → flash).
  • Works for both streaming and non-streaming requests.

KV Namespaces

BindingPurpose
GEMINI_CLI_KVToken caching and session management

🚨 Troubleshooting

Common Issues

401 Authentication Error

  • Check if your OAuth2 credentials are valid
  • Ensure the refresh token is working
  • Verify the credentials format matches exactly

Token Refresh Failed

  • Credentials might be from wrong OAuth2 client
  • Refresh token might be expired or revoked
  • Check the debug cache endpoint for token status

Project ID Discovery Failed

  • Set GEMINI_PROJECT_ID environment variable manually
  • Ensure your Google account has access to Gemini

💻 Usage Examples

Cline Integration

Cline is a powerful AI assistant extension for VS Code. You can easily configure it to use your Gemini models:

  1. Install Cline in VS Code from the Extensions marketplace

  2. Configure OpenAI API settings:

    • Open Cline settings
    • Set API Provider to "OpenAI"
    • Set Base URL to: https://your-worker.workers.dev/v1
    • Set API Key to: sk-your-secret-api-key-here (use your OPENAI_API_KEY if authentication is enabled)
  3. Select a model:

    • Choose gemini-2.5-pro for complex reasoning tasks
    • Choose gemini-2.5-flash for faster responses

Open WebUI Integration

  1. Add as OpenAI-compatible endpoint:

    • Base URL: https://your-worker.workers.dev/v1
    • API Key: sk-your-secret-api-key-here (use your OPENAI_API_KEY if authentication is enabled)
  2. Configure models: Open WebUI will automatically discover available Gemini models through the /v1/models endpoint.

  3. Start chatting: Use any Gemini model just like you would with OpenAI models!

LiteLLM Integration

LiteLLM works seamlessly with this worker, especially when using the DeepSeek R1-style thinking streams:

import litellm

Configure LiteLLM to use your worker

litellm.api_base = "https://your-worker.workers.dev/v1" litellm.api_key = "sk-your-secret-api-key-here"

Use thinking models with LiteLLM

response = litellm.completion( model="gemini-2.5-flash", messages=[ {"role": "user", "content": "Solve this step by step: What is 15 * 24?"} ], stream=True )

for chunk in response: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="")

Pro Tip: Set STREAM_THINKING_AS_CONTENT=true for optimal LiteLLM compatibility. The <thinking> tags format works better with LiteLLM's parsing and various downstream tools.

OpenAI SDK (Python)

from openai import OpenAI

Initialize with your worker endpoint

client = OpenAI( base_url="https://your-worker.workers.dev/v1", api_key="sk-your-secret-api-key-here" # Use your OPENAI_API_KEY if authentication is enabled )

Chat completion

response = client.chat.completions.create( model="gemini-2.5-flash", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain machine learning in simple terms"} ], stream=True )

for chunk in response: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="")

Real thinking mode

response = client.chat.completions.create( model="gemini-2.5-pro", messages=[ {"role": "user", "content": "Solve this step by step: What is the derivative of x^3 + 2x^2 - 5x + 3?"} ], extra_body={ "include_reasoning": True, "thinking_budget": 1024 }, stream=True )

for chunk in response: # Real thinking appears in the reasoning field if hasattr(chunk.choices[0].delta, 'reasoning') and chunk.choices[0].delta.reasoning: print(f"[Thinking] {chunk.choices[0].delta.reasoning}") if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="")

OpenAI SDK (JavaScript/TypeScript)

import OpenAI from 'openai';

const openai = new OpenAI({ baseURL: 'https://your-worker.workers.dev/v1&#39;, apiKey: 'sk-your-secret-api-key-here', // Use your OPENAI_API_KEY if authentication is enabled });

const stream = await openai.chat.completions.create({ model: 'gemini-2.5-flash', messages: [ { role: 'user', content: 'Write a haiku about coding' } ], stream: true, });

for await (const chunk of stream) { const content = chunk.choices[0]?.delta?.content || ''; process.stdout.write(content); }

cURL

curl -X POST https://your-worker.workers.dev/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-your-secret-api-key-here" \
  -d '{
    "model": "gemini-2.5-flash",
    "messages": [
      {"role": "user", "content": "Explain quantum computing"}
    ]
  }'

Raw JavaScript/TypeScript

const response = await fetch('https://your-worker.workers.dev/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'gemini-2.5-flash',
    messages: [
      { role: 'user', content: 'Hello, world!' }
    ]
  })
});

const reader = response.body.getReader(); const decoder = new TextDecoder();

while (true) { const { done, value } = await reader.read(); if (done) break;

const chunk = decoder.decode(value); const lines = chunk.split('\n');

for (const line of lines) { if (line.startsWith('data: ') && line !== 'data: [DONE]') { const data = JSON.parse(line.substring(6)); const content = data.choices[0]?.delta?.content; if (content) { console.log(content); } } } }

Raw Python (without SDK)

import requests
import json

url = "https://your-worker.workers.dev/v1/chat/completions" data = { "model": "gemini-2.5-flash", "messages": [ {"role": "user", "content": "Write a Python function to calculate fibonacci"} ] }

response = requests.post(url, json=data, stream=True)

for line in response.iter_lines(): if line and line.startswith(b'data: '): try: chunk = json.loads(line[6:].decode()) content = chunk['choices'][0]['delta'].get('content', '') if content: print(content, end='') except json.JSONDecodeError: continue

� Tool Calling Support

The worker supports OpenAI-compatible tool calling (function calling) with seamless integration to Gemini's function calling capabilities.

Using Tool Calls

Include tools and optionally tool_choice in your request:

const response = await fetch('/v1/chat/completions', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    model: 'gemini-2.5-pro',
    messages: [
      { role: 'user', content: 'What is the weather in New York?' }
    ],
    tools: [
      {
        type: 'function',
        function: {
          name: 'get_weather',
          description: 'Get weather information for a location',
          parameters: {
            type: 'object',
            properties: {
              location: { type: 'string', description: 'City name' }
            },
            required: ['location']
          }
        }
      }
    ],
    tool_choice: 'auto'
  })
});

Tool Choice Options

  • auto: Let the model decide whether to call a function
  • none: Disable function calling
  • {"type": "function", "function": {"name": "function_name"}}: Force a specific function call

🛡️ Content Safety Settings

Configure Gemini's built-in safety filters using environment variables in the dev.vars:

# Safety threshold options: BLOCK_NONE, BLOCK_FEW, BLOCK_SOME, BLOCK_ONLY_HIGH, HARM_BLOCK_THRESHOLD_UNSPECIFIED
GEMINI_MODERATION_HARASSMENT_THRESHOLD=BLOCK_NONE
GEMINI_MODERATION_HATE_SPEECH_THRESHOLD=BLOCK_NONE  
GEMINI_MODERATION_SEXUALLY_EXPLICIT_THRESHOLD=BLOCK_SOME
GEMINI_MODERATION_DANGEROUS_CONTENT_THRESHOLD=BLOCK_ONLY_HIGH

Safety Categories:

  • HARASSMENT: Content that promotes hatred or violence against individuals/groups
  • HATE_SPEECH: Derogatory or demeaning language targeting specific groups
  • SEXUALLY_EXPLICIT: Content containing sexual or adult material
  • DANGEROUS_CONTENT: Content promoting dangerous or harmful activities

📡 API Endpoints

Base URL

https://your-worker.your-subdomain.workers.dev

List Models

GET /v1/models

Response:

{
  "object": "list",
  "data": [
    {
      "id": "gemini-2.5-pro",
      "object": "model",
      "created": 1708976947,
      "owned_by": "google-gemini-cli"
    }
  ]
}

Chat Completions

POST /v1/chat/completions
Content-Type: application/json

{ "model": "gemini-2.5-flash", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Hello! How are you?" } ] }

Thinking Mode (Real Reasoning)

For models that support thinking, you can enable real reasoning from Gemini:

POST /v1/chat/completions
Content-Type: application/json

{ "model": "gemini-2.5-pro", "messages": [ { "role": "user", "content": "Solve this math problem step by step: What is 15% of 240?" } ], "include_reasoning": true, "thinking_budget": 1024 }

The include_reasoning parameter enables Gemini's native thinking mode, and thinking_budget sets the token limit for reasoning.

Response (Streaming):

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1708976947,"model":"gemini-2.5-flash","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1708976947,"model":"gemini-2.5-flash","choices":[{"index":0,"delta":{"content":"! I'm"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1708976947,"model":"gemini-2.5-flash","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":22,"completion_tokens":553,"total_tokens":575}}

data: [DONE]

Debug Endpoints

Check Token Cache

GET /v1/debug/cache

Test Authentication

POST /v1/token-test
POST /v1/test

Image Support (Vision)

The worker supports multimodal conversations with images for vision-capable models. Images can be provided as base64-encoded data URLs or as external URLs.

Supported Image Formats

  • JPEG, PNG, GIF, WebP
  • Base64 encoded (recommended for reliability)
  • External URLs (may have limitations with some services)

Vision-Capable Models

  • gemini-2.5-pro
  • gemini-2.5-flash
  • gemini-2.0-flash-001
  • gemini-2.0-flash-lite-preview-02-05
  • gemini-2.0-pro-exp-02-05

Example with Base64 Image

from openai import OpenAI
import base64

Encode your image

with open("image.jpg", "rb") as image_file: base64_image = base64.b64encode(image_file.read()).decode('utf-8')

client = OpenAI( base_url="https://your-worker.workers.dev/v1", api_key="sk-your-secret-api-key-here" )

response = client.chat.completions.create( model="gemini-2.5-flash", messages=[ { "role": "user", "content": [ { "type": "text", "text": "What do you see in this image?" }, { "type": "image_url", "image_url": { "url": f"data:image/jpeg;base64,{base64_image}" } } ] } ] )

print(response.choices[0].message.content)

Example with Image URL

curl -X POST https://your-worker.workers.dev/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-your-secret-api-key-here" \
  -d '{
    "model": "gemini-2.5-pro",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Describe this image in detail."
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://example.com/image.jpg",
              "detail": "high"
            }
          }
        ]
      }
    ]
  }'

Multiple Images

You can include multiple images in a single message:

{
  "model": "gemini-2.5-pro",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Compare these two images."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/jpeg;base64,..."
          }
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/png;base64,..."
          }
        }
      ]
    }
  ]
}

Debug Commands

# Check KV cache status
curl https://your-worker.workers.dev/v1/debug/cache

Test authentication only

curl -X POST https://your-worker.workers.dev/v1/token-test

Test full flow

curl -X POST https://your-worker.workers.dev/v1/test

🏗️ How It Works

graph TD
    A[Client Request] --> B[Cloudflare Worker]
    B --> C{Token in KV Cache?}
    C -->|Yes| D[Use Cached Token]
    C -->|No| E[Check Environment Token]
    E --> F{Token Valid?}
    F -->|Yes| G[Cache & Use Token]
    F -->|No| H[Refresh Token]
    H --> I[Cache New Token]
    D --> J[Call Gemini API]
    G --> J
    I --> J
    J --> K[Stream Response]
    K --> L[OpenAI Format]
    L --> M[Client Response]

The worker acts as a translation layer, converting OpenAI API calls to Google's Code Assist API format while managing OAuth2 authentication automatically.

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test thoroughly
  5. Submit a pull request

📄 License

This codebase is provided for personal use and self-hosting only.

Redistribution of the codebase, whether in original or modified form, is not permitted without prior written consent from the author.

You may fork and modify the repository solely for the purpose of running and self-hosting your own instance.

Any other form of distribution, sublicensing, or commercial use is strictly prohibited unless explicitly authorized.

🙏 Acknowledgments


⚠️ Important: This project uses Google's Code Assist API which may have usage limits and terms of service. Please ensure compliance with Google's policies when using this worker.

Star History Chart

SEE ALSO

clihub3/4/2026GEMINI-CLI-OPENAI(1)