Rate limiting and how to handle it on the client side

When publishing an API on the internet, we open ourselves up to a wonderful world of users, but also a world full of threats. Proper design patterns, good security practices, and audits and penetration tests help mitigate many of the risks associated with open API access. There are many design patterns and best practices for writing a globally accessible API, but in this particular post, I would like to focus on one – rate limiting or rate throttling. As you probably know, one of the most common and potentially easiest types of attacks on a network service is a DoS (Denial of Service) or the more dangerous DDoS (Distributed Denial of Service). The foundation for protecting against this type of attack is the early detection of an increase in request volume, directing queries from potentially suspicious traffic sources, and applying physical limits on the number of queries to the API from a specific client.

Traffic limiting often occurs by dividing queries into pools, and assigning each pool a certain number of tokens to be used within a specific time window. Typically, query limits are awarded based on RPS (requests per second). Limiting can be done in the simplest way by assigning a pool per IP of the “caller,” which is not very effective due to the globally implemented NAT (Network Address Translation) where many users can be hidden behind a single IP visible to the service. More effectively, a separate pool is assigned for each authorized user and a separate common pool for those who are not authorized (anonymous).

In either case, when our application makes too many queries within a specified time window (e.g., a second), the API usually returns the HTTP 429 Too Many Requests status.
Many times in my career, I have encountered clients who are not able to process this kind of status correctly, usually reporting an enigmatic error on the API side. This is not a mistake, but a form of protection telling the client application that it needs to temporarily restrain its activities because it has exceeded its assigned limits.
How to deal with this type of status on the server side?

The solution is very simple – backoff wrapping. The client makes a query to the API using an HTTP client. The HTTP client allows you to check the status of queries and often the code using the client is written in a way that checks if the query was successful (HTTP 200) and if not, the function returns an error to the program. To easily solve this problem, the developer should “wrap” the HTTP client function in a way that allows the query to be repeated.

The following code in Go illustrates the simplest way to solve this problem:

func DoRetry(request *http.Request, retries uint8) (*http.Response, error) {
    if retries == 0 {
        return nil, fmt.Errorf("error calling HTTP URL `%s`: too many retries", request.URL.String())
    }

    response, err := http.DefaultClient.Do(request)
    if err != nil {
        return response, err
    }

    if response.StatusCode == http.StatusTooManyRequests {
        time.Sleep(1 * time.Second)
        return DoRetry(request, retries-1)
    }

    return response,err
}

In the example above, the HTTP client code is wrapped in an additional function that retries the query X times before returning an error code to the parent function. It is a recursive function, which makes the query repeat “transparently” for those utilizing this functionality.
The Go code is just an example but similar functionality can be implemented in any other language, reducing the complexity of the client code.

More complex solution to do retry on 404, 500s etc. could be handled by this version:

func DoRetry(request *http.Request, retries uint8, currentRetry uint8) (*http.Response, error) {
	if currentRetry >= retries {
		return nil, fmt.Errorf("error calling HTTP URL `%s`: too many retries", request.URL.String())
	}

	response, err := http.DefaultClient.Do(request)
	if err != nil {
		return nil, err // on network errors etc.
	}

	if response.StatusCode == http.StatusTooManyRequests || response.StatusCode >= 500 {
		// Make sure to close the response body when it's not going to be used
		response.Body.Close()

		// Exponential backoff
		sleepDuration := time.Duration(math.Pow(2, float64(currentRetry))) * time.Second
		time.Sleep(sleepDuration)

		return DoRetry(request, retries, currentRetry+1)
	}

	return response, err
}

In the code above I did added 2 major changes – exponential delay and retry on 400+ HTTP status (404, 500s etc).

The Go code is just an example but similar functionality can be implemented in any other language, reducing the complexity of the client code.