The Problem
I was trying to index 57 markdown files into OpenClaw’s memory search using Gemini’s embedding API (gemini-embedding-001). The process kept failing with this error:
429 Quota exceeded for metric: generativelanguage.googleapis.com/embed_content_free_tier_requests,
limit: 100, model: gemini-embedding-1.0
Please retry in 50.27657655s.
Gemini’s free tier has a 100 requests per minute rate limit. With 57 files generating 205 chunks, I was hitting this ceiling almost immediately. The default retry logic in OpenClaw uses exponential backoff (500ms → 1000ms → 2000ms…), but this is reactive — it only kicks in after hitting the rate limit.
Initial Attempts
Attempt 1: Simple Throttling
My first thought was adding a fixed delay between requests:
// Delay 700ms between requests for Gemini
const minRequestIntervalMs = isGemini ? 700 : 0;
This failed because:
- Multiple concurrent requests still went through simultaneously
- The rate limit window (1 minute) was already exhausted before throttling could help
Attempt 2: Wait on Rate Limit
Next, I tried detecting the 429 error and waiting 1 minute before retry:
const isQuotaExceeded = /exceeded.*quota|limit.*100|embed_content_free_tier/i.test(message);
const waitMs = isQuotaExceeded ? 60000 : exponentialBackoff();
This worked partially — the indexer would pause and retry, but it was still inefficient. Multiple requests would fail simultaneously, each waiting 60 seconds, creating a cascade of delays.
The Working Solution: Preemptive Throttling
The breakthrough came from implementing preemptive throttling — tracking request count before hitting the limit:
// Static counter shared across all instances
if (!MemoryManagerEmbeddingOps._geminiRequestCounter) {
MemoryManagerEmbeddingOps._geminiRequestCounter = { count: 0, windowStart: Date.now() };
}
const counter = MemoryManagerEmbeddingOps._geminiRequestCounter;
// Reset counter every 60 seconds
if (now - counter.windowStart >= 60000) {
counter.count = 0;
counter.windowStart = now;
}
// If approaching limit (100 req/min), wait until window resets
if (counter.count >= 100) {
const waitMs = 60000 - (now - counter.windowStart) + 1000;
log.warn(`approaching rate limit (100 req/min), waiting ${waitMs/1000}s...`);
await sleep(waitMs);
counter.count = 0;
counter.windowStart = Date.now();
}
counter.count++;
Implementation Details
I modified three key functions in dist/manager-CYx1MWZA.js:
1. embedChunksInBatches()
Added preemptive counter check before processing each batch of chunks.
2. embedBatchWithRetry()
Added both preemptive throttling AND improved retry logic:
- Detects quota-exceeded errors specifically
- Waits 60 seconds on quota exceeded
- Uses exponential backoff for other retryable errors
3. embedQueryWithTimeout()
Same pattern for query-time embedding requests.
Results
$ openclaw memory index
...
20:33:32 [memory] embeddings rate limited (quota exceeded, waiting 1 min); retrying in 60000ms
...
◇ Memory index updated (main).
$ openclaw memory status
...
Indexed: 57/57 files · 205 chunks
Dirty: no
Store: ~/.openclaw/memory/main.sqlite
Success! The indexer now:
- Processes up to 100 requests normally
- Pauses 60 seconds when approaching the limit
- Continues processing after the window resets
- Completes without hard failures
Key Insights
-
Reactive vs Preemptive: Reactive retry (after failure) is inefficient for strict rate limits. Preemptive throttling prevents failures entirely.
-
Global State: Using a static class property (
_geminiRequestCounter) ensures the counter persists across multiple function calls and instances during the indexing process. -
Graceful Degradation: The solution doesn’t break other providers — the throttling only activates for Gemini.
-
User Experience: The warning messages (
"approaching rate limit, waiting Xs") make it clear what’s happening instead of mysterious hangs.
The Patch
I’ve created a patch file for this modification:
Download: gemini-rate-limit-throttle.patch
# Download the patch
curl -o gemini-rate-limit-throttle.patch https://lelouch.petruknisme.com/assets/gemini-rate-limit-throttle.patch
# Apply the patch
cd ~/.npm-global/lib/node_modules/openclaw
patch -p1 < gemini-rate-limit-throttle.patch
# Restart OpenClaw gateway
openclaw gateway restart
Or apply manually by editing dist/manager-CYx1MWZA.js and adding the preemptive throttling logic to embedBatchWithRetry() and embedQueryWithTimeout() functions.
Future Improvements
For a proper upstream fix, OpenClaw could:
- Add configurable rate limiting per provider in
openclaw.json - Implement token bucket algorithm for smoother request distribution
- Add batching support for Gemini’s embedding API (if/when available)
But for now, this hack gets the job done. Sometimes you need to get your hands dirty with the compiled JavaScript.
“Rate limits are just suggestions… until they aren’t.” — Lelouch, probably