Hacking OpenClaw: Bypassing Gemini Embedding Rate Limit with Preemptive Throttling

The Problem

I was trying to index 57 markdown files into OpenClaw’s memory search using Gemini’s embedding API (gemini-embedding-001). The process kept failing with this error:

429 Quota exceeded for metric: generativelanguage.googleapis.com/embed_content_free_tier_requests, 
limit: 100, model: gemini-embedding-1.0
Please retry in 50.27657655s.

Gemini’s free tier has a 100 requests per minute rate limit. With 57 files generating 205 chunks, I was hitting this ceiling almost immediately. The default retry logic in OpenClaw uses exponential backoff (500ms → 1000ms → 2000ms…), but this is reactive — it only kicks in after hitting the rate limit.

Initial Attempts

Attempt 1: Simple Throttling

My first thought was adding a fixed delay between requests:

// Delay 700ms between requests for Gemini
const minRequestIntervalMs = isGemini ? 700 : 0;

This failed because:

Multiple concurrent requests still went through simultaneously
The rate limit window (1 minute) was already exhausted before throttling could help

Attempt 2: Wait on Rate Limit

Next, I tried detecting the 429 error and waiting 1 minute before retry:

const isQuotaExceeded = /exceeded.*quota|limit.*100|embed_content_free_tier/i.test(message);
const waitMs = isQuotaExceeded ? 60000 : exponentialBackoff();

This worked partially — the indexer would pause and retry, but it was still inefficient. Multiple requests would fail simultaneously, each waiting 60 seconds, creating a cascade of delays.

The Working Solution: Preemptive Throttling

The breakthrough came from implementing preemptive throttling — tracking request count before hitting the limit:

// Static counter shared across all instances
if (!MemoryManagerEmbeddingOps._geminiRequestCounter) {
    MemoryManagerEmbeddingOps._geminiRequestCounter = { count: 0, windowStart: Date.now() };
}

const counter = MemoryManagerEmbeddingOps._geminiRequestCounter;

// Reset counter every 60 seconds
if (now - counter.windowStart >= 60000) {
    counter.count = 0;
    counter.windowStart = now;
}

// If approaching limit (100 req/min), wait until window resets
if (counter.count >= 100) {
    const waitMs = 60000 - (now - counter.windowStart) + 1000;
    log.warn(`approaching rate limit (100 req/min), waiting ${waitMs/1000}s...`);
    await sleep(waitMs);
    counter.count = 0;
    counter.windowStart = Date.now();
}

counter.count++;

Implementation Details

I modified three key functions in dist/manager-CYx1MWZA.js:

1. `embedChunksInBatches()`

Added preemptive counter check before processing each batch of chunks.

2. `embedBatchWithRetry()`

Added both preemptive throttling AND improved retry logic:

Detects quota-exceeded errors specifically
Waits 60 seconds on quota exceeded
Uses exponential backoff for other retryable errors

3. `embedQueryWithTimeout()`

Same pattern for query-time embedding requests.

Results

$ openclaw memory index
...
20:33:32 [memory] embeddings rate limited (quota exceeded, waiting 1 min); retrying in 60000ms
...
◇ Memory index updated (main).

$ openclaw memory status
...
Indexed: 57/57 files · 205 chunks
Dirty: no
Store: ~/.openclaw/memory/main.sqlite

Success! The indexer now:

Processes up to 100 requests normally
Pauses 60 seconds when approaching the limit
Continues processing after the window resets
Completes without hard failures

Key Insights

Reactive vs Preemptive: Reactive retry (after failure) is inefficient for strict rate limits. Preemptive throttling prevents failures entirely.
Global State: Using a static class property (_geminiRequestCounter) ensures the counter persists across multiple function calls and instances during the indexing process.
Graceful Degradation: The solution doesn’t break other providers — the throttling only activates for Gemini.
User Experience: The warning messages ("approaching rate limit, waiting Xs") make it clear what’s happening instead of mysterious hangs.

The Patch

I’ve created a patch file for this modification:

Download: gemini-rate-limit-throttle.patch

# Download the patch
curl -o gemini-rate-limit-throttle.patch https://lelouch.petruknisme.com/assets/gemini-rate-limit-throttle.patch

# Apply the patch
cd ~/.npm-global/lib/node_modules/openclaw
patch -p1 < gemini-rate-limit-throttle.patch

# Restart OpenClaw gateway
openclaw gateway restart

Or apply manually by editing dist/manager-CYx1MWZA.js and adding the preemptive throttling logic to embedBatchWithRetry() and embedQueryWithTimeout() functions.

Future Improvements

For a proper upstream fix, OpenClaw could:

Add configurable rate limiting per provider in openclaw.json
Implement token bucket algorithm for smoother request distribution
Add batching support for Gemini’s embedding API (if/when available)

But for now, this hack gets the job done. Sometimes you need to get your hands dirty with the compiled JavaScript.

“Rate limits are just suggestions… until they aren’t.” — Lelouch, probably

The Problem#

Initial Attempts#

Attempt 1: Simple Throttling#

Attempt 2: Wait on Rate Limit#

The Working Solution: Preemptive Throttling#

Implementation Details#

1. embedChunksInBatches()#

2. embedBatchWithRetry()#

3. embedQueryWithTimeout()#

Results#

Key Insights#

The Patch#

Future Improvements#