Guide
API Rate Limiting: Protect Your SaaS from Abuse
Rate limiting controls how many requests a client can make to your API in a given time window. Common strategies are token bucket, fixed window, and sliding window. Start with per-user limits based on your API's capacity, then add per-IP limits as a secondary layer for unauthenticated endpoints. Always return 429 Too Many Requests with a Retry-After header so clients can back off gracefully.
Why Rate Limiting Matters for Your SaaS
Without rate limiting, a single misbehaving client can degrade performance for everyone. Common abuse scenarios include brute-force login attempts, scraping your entire product catalog, or repeatedly hitting a costly search endpoint. Rate limiting protects your infrastructure, keeps response times predictable, and prevents runaway cloud bills.
It also helps you maintain fair usage across free and paid tiers. For example, you might give free-tier users 100 requests per hour and paid users 10,000 requests per hour. Enforcing these limits programmatically stops one customer from accidentally or maliciously consuming resources meant for others.
Three Common Rate Limiting Algorithms
Choose an algorithm based on your traffic patterns and how strict you need the limit to be.
- Fixed Window: Count requests in a fixed time bucket, e.g., 100 requests per hour. Simple to implement, but can allow bursts at the window boundary. Example: reset counter every hour on the hour.
- Sliding Window: Tracks requests in a rolling time window, e.g., the last 60 minutes. More accurate, but requires storing timestamps per client. Example: using a sorted set in Redis with timestamps as scores.
- Token Bucket: A bucket holds tokens that refill at a constant rate. Each request consumes one token. Allows short bursts up to the bucket size. Example: 10 tokens with a refill of 1 token per second gives a sustained rate of 1 req/s with bursts up to 10.
For most SaaS APIs, sliding window or token bucket are good choices. Fixed window is simpler but can be gamed.
Where to Apply Limits: Per User vs Per IP
Per user limits are the primary strategy for authenticated endpoints. Use the API key or session token to identify the client. This ties limits to a specific account, so abusive behavior can be traced and blocked without affecting other users behind the same NAT.
Per IP limits are a secondary layer for unauthenticated endpoints (like login or signup). They help against distributed attacks from many IPs, but can accidentally block legitimate users sharing a public IP (office, coffee shop). Set per-IP limits higher than per-user limits to reduce false positives.
Example: allow 1000 requests per hour per user, and 100 requests per hour per IP for login attempts.
How to Communicate Rate Limits to Clients
Always return HTTP 429 when a client exceeds the limit. Include these response headers so clients can adjust their behavior:
| Header | Example Value | Purpose |
|---|---|---|
Retry-After | 3600 | Seconds until the limit resets |
X-RateLimit-Limit | 1000 | Maximum requests allowed in the window |
X-RateLimit-Remaining | 42 | Requests left in the current window |
X-RateLimit-Reset | 1700000000 | Unix timestamp when the window resets |
Document these headers in your API docs so developers can build retry logic. A well-behaved client will back off and retry after the Retry-After time.
Implementation Tips for Production
Use a fast, shared store like Redis or Memcached to track counters. Do not rely on in-memory counters in a single process, they will be wrong when you scale to multiple instances.
Start with generous limits and tighten them based on real traffic. Monitor the number of 429 responses. If legitimate users hit the limit often, raise it enough to cover normal behavior.
Exempt certain endpoints from rate limiting if needed. For example, your health check endpoint should not be limited. Or apply a higher limit to webhook endpoints that receive bursts.
Log and alert when a client hits the limit repeatedly. That could indicate a malicious actor or a misconfigured integration. Contact the customer before it escalates.
Testing Your Rate Limiting Implementation
Before deploying, test that limits are enforced correctly. Write automated tests that send requests at the boundary: just under the limit, exactly at the limit, and one over. Verify that the 429 response includes the right headers.
Also test that limits reset after the window expires. For sliding windows, test that a burst at the end of one window does not carry over into the next.
Tools like automated penetration testing can help you verify that rate limiting works under attack patterns, such as rapid sequential requests from a single IP.
Rate Limiting as Part of a Defense-in-Depth Strategy
Rate limiting is one layer. Combine it with input validation, authentication, and proper access controls. No single measure stops all abuse.
If you are building a SaaS product, consider using a service like Kyro to continuously test your API for vulnerabilities, including broken rate limiting. Start a free scan to see if your API has any gaps.
Find these bugs in your own app
Kyro runs an AI security hunter against your SaaS and emails you the moment it confirms a real, reproducible vulnerability.
Start a free scanFrequently asked questions
What is the best rate limiting algorithm for a SaaS API?
Token bucket or sliding window are generally best. Token bucket allows short bursts, which is natural for user behavior. Sliding window is more accurate for fixed-rate limits. Fixed window is simpler but can let through double traffic at boundaries.
Should I rate limit by IP or by user?
Use per-user limits for authenticated endpoints and per-IP limits for unauthenticated ones. Per-user limits tie abuse to an account; per-IP limits catch anonymous attacks but can block legitimate users sharing an IP.
What HTTP status code should I return when rate limiting?
Return 429 Too Many Requests. Include a Retry-After header with the number of seconds the client should wait before retrying.
How do I test if my rate limiting works?
Write automated tests that send requests up to the limit and verify the 429 response. Use tools like Kyro to simulate attack patterns and check that limits are enforced correctly.