Rate Limiting
Upstash-style rate limiting with Convex-first storage and middleware-friendly DX.
In this guide, we'll set up kitcn/ratelimit with an Upstash-parity API, Convex-first storage, and a UI-friendly hook.
We'll keep it practical: a local ratelimit plugin, middleware wiring, algorithm choices, and client-side button disabling with live countdowns.
Approach
We'll build this in four steps so you can ship quickly and keep things maintainable:
- Scaffold the local schema extension so internal storage tables exist.
- Add one local ratelimit plugin that owns buckets and policy.
- Wire that plugin into middleware so handlers stay focused.
- Add optional client-side UX with
useRatelimit()for disabled states and retry timers.
We'll end with a full API reference so you can tune behavior without guesswork.
Why this package
kitcn/ratelimit is designed as a hard cutover from component-driven APIs.
| What you get | Why it matters |
|---|---|
| Upstash-style API | Easier migration if your team already knows Upstash (limit, check, getRemaining, resetUsedTokens) |
| Convex-first tables | No component registration and no component migration path to manage |
| Read dedupe helpers | Common repeated reads may reuse cached results and reduce duplicate DB fetches |
| React hook support | hookAPI() + useRatelimit() gives accurate client countdown and button states |
| Fail-closed default | Safer behavior under pressure (failureMode: "closed") |
Install
bun add kitcnScaffold the starter (required)
Rate limiting is opt-in, so scaffold the full starter once:
npx kitcn add ratelimitThat creates:
convex/lib/plugins/ratelimit/schema.tsconvex/lib/plugins/ratelimit/plugin.ts
and registers ratelimitExtension() in convex/functions/schema.ts.
import { defineSchema } from 'kitcn/orm';
import { ratelimitExtension } from '../lib/plugins/ratelimit/schema';
export const tables = {
// your tables...
};
export default defineSchema(tables).extend(ratelimitExtension());Create a local ratelimit plugin
Scaffold a local plugin and keep one default bucket for every mutation. Add named buckets only when a procedure genuinely needs stricter behavior.
import { getSessionNetworkSignals } from 'kitcn/auth';
import { MINUTE, Ratelimit, RatelimitPlugin } from 'kitcn/ratelimit';
import type { MutationCtx } from '../../../functions/generated/server';
import type { Select } from '../../../shared/api';
const fixed = (rate: number) => Ratelimit.fixedWindow(rate, MINUTE);
export const ratelimitBuckets = {
default: {
public: fixed(30),
free: fixed(60),
premium: fixed(200),
},
} as const;
type RatelimitTier = keyof (typeof ratelimitBuckets)['default'];
export type RatelimitBucket = keyof typeof ratelimitBuckets;
type RatelimitUser = {
id: string;
isAdmin?: boolean;
plan?: 'premium' | 'team' | null;
session?: Select<'session'> | null;
};
type RatelimitCtx = MutationCtx & {
user?: RatelimitUser | null;
};
type RatelimitMeta = {
ratelimit?: RatelimitBucket;
};
export function getUserTier(user: RatelimitUser | null): RatelimitTier {
if (!user) return 'public';
if (user.isAdmin || user.plan) return 'premium';
return 'free';
}
export const ratelimit = RatelimitPlugin.configure({
buckets: ratelimitBuckets,
getBucket: ({ meta }: { meta: RatelimitMeta }) => meta.ratelimit ?? 'default',
getUser: ({ ctx }: { ctx: RatelimitCtx }) => ctx.user ?? null,
getIdentifier: ({ user }: { user: RatelimitUser | null }) =>
user?.id ?? 'anonymous',
getTier: getUserTier,
getSignals: ({ ctx, user }: { ctx: RatelimitCtx; user: RatelimitUser | null }) =>
getSessionNetworkSignals(ctx, user?.session ?? null),
prefix: ({ bucket, tier }) => `ratelimit:${bucket}:${tier}`,
failureMode: 'closed',
enableProtection: true,
denyListThreshold: 30,
});getSessionNetworkSignals() returns {} when no session exists and fills ip / userAgent when Better Auth session data is available.
Wire it into middleware
Apply the plugin once in your mutation builders. The default bucket covers normal writes. meta.ratelimit is an optional named-bucket override.
import { ratelimit, type RatelimitBucket } from './plugins/ratelimit/plugin';
const c = initCRPC
.meta<{ ratelimit?: RatelimitBucket }>()
.create();
export const publicMutation = c.mutation.use(ratelimit.middleware());Normal writes use the default bucket:
export const createTodo = authMutation
.input(z.object({ title: z.string().min(1) }))
.mutation(async ({ ctx, input }) => {
// business logic
});Add a named bucket only for exceptions:
export const stressTest = publicMutation
.meta({ ratelimit: 'interactive' })
.input(z.object({ id: z.string() }))
.mutation(async ({ ctx, input }) => {
// stricter bucket
});Choose your algorithm
Start simple and pick based on workload shape.
Fixed window
Best when hard windows are acceptable. Tokens reset at the start of each window.
const limiter = new Ratelimit({
db: ctx.db,
prefix: 'post:create',
limiter: Ratelimit.fixedWindow(10, '1 m'),
});Sliding window
Best when you want smoother request shaping without hard resets. Weighs the previous window proportionally so you don't get bursts at window boundaries.
const limiter = new Ratelimit({
db: ctx.db,
prefix: 'search',
limiter: Ratelimit.slidingWindow(50, '1 m'),
});Token bucket
Best for burst-friendly throughput with long-term control. Tokens refill at a steady rate up to maxTokens. Use maxReserved to allow requests to "borrow" from future tokens when the bucket is empty.
const limiter = new Ratelimit({
db: ctx.db,
prefix: 'llm:tokens',
limiter: Ratelimit.tokenBucket(1000, '1 m', 1000, { maxReserved: 3000 }),
});Algorithm options
All three algorithm builders accept an optional options object as the last argument.
| Option | Type | Default | Description |
|---|---|---|---|
shards | number | 1 | Number of shards for write distribution. Higher values reduce contention at the cost of less precise counts (see Sharding). |
maxReserved | number | undefined | Maximum tokens a request can "borrow" from future capacity. Only applies to fixedWindow and tokenBucket. Not supported by slidingWindow. |
capacity | number | limit | Maximum stored tokens. Only applies to fixedWindow. Useful when you want a higher burst capacity than the per-window refill. |
start | number | 0 | Epoch offset (ms) for window alignment. Only applies to fixedWindow. Aligns windows to a custom origin instead of epoch zero. |
Duration formats
Every window or interval parameter accepts a Duration — either a raw millisecond number or a human-readable string.
String format: "<number> <unit>" or "<number><unit>". Both '1 m' and '1m' work.
| Unit | Meaning | Example |
|---|---|---|
ms | milliseconds | '500 ms' |
s | seconds | '30 s' |
m | minutes | '1 m' |
h | hours | '1 h' |
d | days | '1 d' |
You can also use the pre-defined constants from kitcn/ratelimit:
import { SECOND, MINUTE, HOUR, DAY, WEEK } from 'kitcn/ratelimit';
Ratelimit.fixedWindow(100, MINUTE); // 60_000 ms
Ratelimit.slidingWindow(50, 30 * SECOND); // 30_000 ms
Ratelimit.tokenBucket(10, HOUR, 100); // 3_600_000 msDone. You now have deterministic, application-layer limits with one API surface.
Add a client-side limiter UX
Server enforcement is mandatory. Client checks are for better UX — disabled buttons, countdowns, and retry hints.
Expose the hook API
First, export the hook API from a Convex file. The hookAPI() method returns a getRatelimit query and a getServerTime mutation that the React hook consumes.
import { Ratelimit } from 'kitcn/ratelimit';
const limiter = new Ratelimit({
limiter: Ratelimit.fixedWindow(3, '30 s'),
});
export const { getRatelimit, getServerTime } = limiter.hookAPI({
identifier: async (_ctx, fromClient) => fromClient ?? 'anonymous',
sampleShards: 1,
});The identifier option can be a static string, or an async callback that receives (ctx, fromClient). Use the callback to resolve the identifier server-side (e.g. from auth) while still accepting a client-provided fallback.
sampleShards controls how many shards to read when estimating the remaining count. Set it to 1 for low-cost reads, or increase it for more accurate estimates on high-shard configs.
Use the React hook
Then wire it up in your component with useRatelimit:
import { useRatelimit } from 'kitcn/ratelimit/react';
const ratelimitRef = 'ratelimitDemo:getInteractiveRatelimit' as const;
const serverTimeRef = 'ratelimitDemo:getInteractiveServerTime' as const;
const { status, check } = useRatelimit(ratelimitRef, {
identifier: sessionId,
count: 1,
getServerTimeMutation: serverTimeRef,
});
const blocked = status?.ok === false;
const retryAt = status?.retryAt;useRatelimit accepts either:
- a Convex function path string (
'module:functionName') — this is what the/ratelimitdemo uses. - a generated
FunctionReferencefromapi.
The hook returns:
| Field | Type | Description |
|---|---|---|
status | HookStatus | undefined | undefined while loading. { ok: true } when allowed, { ok: false, retryAt: number } when blocked. Auto-updates when retryAt passes. |
check | (ts?, count?) => HookCheckValue | undefined | Manual projection function. Call it with a timestamp and count to get a precise snapshot for custom gauges or progress bars. |
The HookCheckValue returned by check() has this shape:
| Field | Type | Description |
|---|---|---|
value | number | Projected remaining tokens (negative means over-limit) |
ts | number | Timestamp of the projection (client time) |
config | ResolvedAlgorithm | The algorithm config for further calculations |
shard | number | Which shard was sampled |
ok | boolean | true when value >= 0 |
retryAt | number | undefined | Client timestamp when tokens become available |
If you need precise projected values (for custom gauges), call check(ts, count).
Protection and deny lists
When enableProtection is on, the limiter tracks repeated failures per identifier, IP, user-agent, and country. Once a value reaches denyListThreshold, it gets blocked for 24 hours — without even checking the database.
You can also provide static deny lists to block known bad actors immediately.
const limiter = new Ratelimit({
db: ctx.db,
prefix: 'api',
limiter: Ratelimit.fixedWindow(100, '1 m'),
failureMode: 'closed',
enableProtection: true,
denyListThreshold: 30,
denyList: {
identifiers: ['known-bad-user-id'],
ips: ['203.0.113.0'],
userAgents: ['BadBot/1.0'],
countries: ['XX'],
},
});To trigger deny-list matching on request metadata, pass ip, userAgent, or country in the limit() call:
const result = await limiter.limit(userId, {
ip: request.headers.get('x-forwarded-for') ?? undefined,
userAgent: request.headers.get('user-agent') ?? undefined,
country: request.headers.get('x-country') ?? undefined,
});Important: Deny-list state is in-memory and non-durable. It can survive across warm runtime requests, but is lost on cold starts/deploys. For persistent blocking, use an external deny list or database-backed blocklist.
Dynamic limits
Dynamic limits let you change rate limits at runtime — useful for feature flags, admin overrides, or gradual rollouts. Enable them with dynamicLimits: true in the constructor.
const limiter = new Ratelimit({
db: ctx.db,
prefix: 'api:search',
limiter: Ratelimit.fixedWindow(100, '1 m'),
dynamicLimits: true,
});Then use setDynamicLimit to override the configured limit at runtime:
// Double the limit during a sale
await limiter.setDynamicLimit({ limit: 200 });
// Read the current override
const { dynamicLimit } = await limiter.getDynamicLimit();
// dynamicLimit === 200
// Remove the override (reverts to configured limit)
await limiter.setDynamicLimit({ limit: false });The dynamic limit overrides the limit field of the algorithm. For token bucket, it overrides refillRate (and maxTokens if they were originally equal).
Limits and mitigations you should know
Important: This is application-layer limiting. It protects business logic and expensive downstream work, but it is not a network firewall or DDoS shield.
Recommended production posture:
- Enforce auth early and reject fast.
- Protect anonymous flows with captcha + validated session IDs.
- Put network-layer controls (Cloudflare or equivalent) in front when IP-based mitigation is required.
- Alert on request spikes and fail safely (
failureMode: "closed"by default).
API Reference
Constructor options
Create a Ratelimit instance with a config object:
const limiter = new Ratelimit(config: RatelimitConfig);| Option | Type | Default | Description |
|---|---|---|---|
db | ctx.db | — | Convex database context. Required for limit, check, getRemaining, getValue, resetUsedTokens, setDynamicLimit, getDynamicLimit. Not needed for hookAPI() (it receives db from the query/mutation context). |
limiter | ResolvedAlgorithm | — | Required. Algorithm created by Ratelimit.fixedWindow(), Ratelimit.slidingWindow(), or Ratelimit.tokenBucket(). |
prefix | string | 'kitcn/ratelimit' | Namespaces stored state in the database. Use unique prefixes for different rate limit scopes. |
dynamicLimits | boolean | false | Enables setDynamicLimit() / getDynamicLimit(). |
failureMode | 'closed' | 'open' | 'closed' | Behavior on timeout. 'closed' rejects, 'open' allows. |
timeout | number | 5000 | Milliseconds before triggering failureMode behavior. |
enableProtection | boolean | false | Enables deny-list tracking on repeated failures. |
denyListThreshold | number | 30 | Consecutive failures before an identifier is blocked (24h). Requires enableProtection: true. |
denyList | ProtectionLists | undefined | Static deny lists. See Protection and deny lists. |
ephemeralCache | Map<string, number> | false | new Map() | In-memory block cache. Shared across requests in the same Convex invocation. Pass false to disable. |
Algorithm builders
All builders are available as static methods on Ratelimit.
Ratelimit.fixedWindow(limit, window, options?)
fixedWindow(limit: number, window: Duration, options?: AlgorithmOptions): FixedWindowAlgorithm| Parameter | Type | Description |
|---|---|---|
limit | number | Tokens replenished per window |
window | Duration | Window length (number in ms, or string like '1 m') |
options.shards | number | Write distribution shards (default 1) |
options.maxReserved | number | Max tokens that can be borrowed from future windows |
options.capacity | number | Max stored tokens (default = limit) |
options.start | number | Epoch offset for window alignment |
Ratelimit.slidingWindow(limit, window, options?)
slidingWindow(limit: number, window: Duration, options?: AlgorithmOptions): SlidingWindowAlgorithm| Parameter | Type | Description |
|---|---|---|
limit | number | Max requests in the sliding window |
window | Duration | Window length |
options.shards | number | Write distribution shards (default 1) |
options.maxReserved | number | Max tokens that can be borrowed |
Note: reserve is not supported with sliding window. The algorithm needs both current and previous window counts, which makes reservation impractical.
Ratelimit.tokenBucket(refillRate, interval, maxTokens, options?)
tokenBucket(refillRate: number, interval: Duration, maxTokens: number, options?: AlgorithmOptions): TokenBucketAlgorithm| Parameter | Type | Description |
|---|---|---|
refillRate | number | Tokens added per interval |
interval | Duration | Refill interval |
maxTokens | number | Maximum bucket capacity |
options.shards | number | Write distribution shards (default 1) |
options.maxReserved | number | Max tokens that can be borrowed from future refills |
Core methods
limit(identifier, options?)
Consume tokens and return a response. This is the primary method for enforcing rate limits.
limit(identifier: string, options?: LimitRequest): Promise<RatelimitResponse>check(identifier, options?)
Evaluate without consuming tokens. Use this for read-only checks (e.g. showing a warning before the user submits).
check(identifier: string, options?: CheckRequest): Promise<RatelimitResponse>getRemaining(identifier)
Return the remaining tokens, reset time, and limit for an identifier.
getRemaining(identifier: string): Promise<RemainingResponse>getValue(identifier, options?)
Return a raw snapshot for custom projections and UI calculations.
getValue(identifier: string, options?: { sampleShards?: number }): Promise<RatelimitSnapshot>resetUsedTokens(identifier)
Clear all stored state for an identifier. Useful for admin resets.
resetUsedTokens(identifier: string): Promise<void>setDynamicLimit(options)
Override the configured limit at runtime. Pass { limit: false } to remove the override. Requires dynamicLimits: true.
setDynamicLimit(options: { limit: number | false }): Promise<void>getDynamicLimit()
Read the current dynamic override. Returns { dynamicLimit: number | null }. Requires dynamicLimits: true.
getDynamicLimit(): Promise<DynamicLimitResponse>hookAPI(options?)
Export a getRatelimit query and getServerTime mutation for the React hook.
hookAPI(options?: HookAPIOptions): {
getRatelimit: FunctionReference<'query'>;
getServerTime: FunctionReference<'mutation'>;
}Request options
LimitRequest
Pass these options to limit() to customize behavior per-call.
| Field | Type | Default | Description |
|---|---|---|---|
rate | number | 1 | Alias for count. Tokens to consume. |
count | number | 1 | Tokens to consume. Takes precedence if both rate and count are set. |
reserve | boolean | false | Allow borrowing from future capacity (up to maxReserved). Not supported by slidingWindow. |
ip | string | — | IP address for deny-list matching |
userAgent | string | — | User-agent for deny-list matching |
country | string | — | Country code for deny-list matching |
geo | unknown | — | Reserved for future geo-based rules |
CheckRequest
Same as LimitRequest but reserve defaults to not consuming tokens (since check() is read-only).
Response types
RatelimitResponse
Returned by limit() and check().
| Field | Type | Description |
|---|---|---|
success | boolean | true if the request was allowed |
ok | boolean | Alias for success (Convex DX parity) |
limit | number | Maximum tokens for this algorithm |
remaining | number | Tokens left after this request (floored to 0) |
reset | number | Epoch ms when tokens will be available |
pending | Promise<unknown> | Resolves when async side-effects complete |
reason | 'timeout' | 'cacheBlock' | 'denyList' | Present when a reason applies. Note: failureMode: 'open' can return success: true with reason: 'timeout'. |
deniedValue | string | Present only when reason === 'denyList'. The value that triggered the block. |
RemainingResponse
Returned by getRemaining().
| Field | Type | Description |
|---|---|---|
remaining | number | Tokens available |
reset | number | Epoch ms of next replenishment |
limit | number | Maximum tokens |
RatelimitSnapshot
Returned by getValue(). Used for custom projections and the React hook.
| Field | Type | Description |
|---|---|---|
value | number | Current token count |
ts | number | Timestamp of last state update |
shard | number | Which shard was read |
config | ResolvedAlgorithm | Full algorithm config for calculateRatelimit() |
Hook API
HookAPIOptions
Options for hookAPI().
| Field | Type | Default | Description |
|---|---|---|---|
identifier | string | (ctx, fromClient?) => string | Promise<string> | — | How to resolve the identifier. A string uses it directly. A callback receives the Convex context and the optional client-provided identifier. |
sampleShards | number | 1 | How many shards to sample when reading. Higher = more accurate, more reads. |
UseRatelimitOptions
Options for the useRatelimit() React hook.
useRatelimit(
getRatelimitValueQuery: FunctionReference<'query'> | string,
options?: UseRatelimitOptions
)| Field | Type | Default | Description |
|---|---|---|---|
identifier | string | — | Passed to the getRatelimit query |
count | number | 1 | Tokens to project for status calculation |
sampleShards | number | — | Override sampleShards from hook API |
getServerTimeMutation | FunctionReference | string | — | Enables clock-skew correction between client and server |
Time constants
Pre-defined millisecond constants exported from kitcn/ratelimit:
| Constant | Value |
|---|---|
SECOND | 1_000 |
MINUTE | 60_000 |
HOUR | 3_600_000 |
DAY | 86_400_000 |
WEEK | 604_800_000 |
Internal tables
The rate limiter stores state in three local schema keys. These come from your local convex/lib/plugins/ratelimit/schema.ts extension, so do not define these keys twice. The underlying Convex storage table names stay underscored.
| Schema key | Purpose |
|---|---|
ratelimitState | Per-identifier, per-shard token state |
ratelimitDynamicLimit | Dynamic limit overrides per prefix |
ratelimitProtectionHit | Protection tracking (hits, blocks) per prefix |
Advanced notes
calculateRatelimit
The calculateRatelimit function is exported for custom projections and UI calculations. It takes a state snapshot, algorithm config, current timestamp, and count, and returns the evaluated result without touching the database.
import { calculateRatelimit } from 'kitcn/ratelimit';
const result = calculateRatelimit(
{ value: 8, ts: Date.now() - 30_000 },
Ratelimit.fixedWindow(10, '1 m'),
Date.now(),
1
);
// result.remaining, result.reset, result.retryAfterSharding
When shards > 1, each limit() call picks a random shard (or two, using power-of-two-choices when shards >= 3) to reduce write contention. The trade-off: reads (check, getRemaining, getValue) only sample a subset of shards, so remaining counts are approximate. For most use cases, shards: 1 (the default) is fine. Increase shards only when you see write contention on hot identifiers.
Ephemeral cache
The ephemeral block cache is an in-memory Map<string, number> that caches "blocked until" timestamps. When a limit() call fails, subsequent calls for the same identifier skip the database read entirely until the block expires. The cache is per-Ratelimit instance and resets on each Convex function invocation. Pass ephemeralCache: false to disable it, or pass a shared Map across multiple Ratelimit instances to share the cache.
ok alias
The response includes both success and ok. They are always identical. ok exists for Convex DX parity with patterns like if (!result.ok) throw ....