OneApi is a globally deployed load balancer, that balances traffic to all API-Endpoints of the WAX Mainnet, based on the geographical location of the user. It supports all major Api Types (Chain, History, Hyperion & Wallet API) and was built with performance and reliability in mind.
OneApi load balances traffic to all API-Endpoints of the WAX Mainnet, based on the geographical location of the user.
Feel free to test it out yourself. Just make API calls as you would with every other endpoint:
WAX Mainnet Endpoint:
Currently, just a handful of guilds handle the majority of the traffic. This not only results in underutilization of many API endpoints, but downtime of a single guild would result in the malfunction of major services and websites.
OneApi solves these problems, by offering a single endpoint URL. The requests are automatically routed to the optimal endpoint based on the geographical location. This not only improves the performance for the user, but traffic will also be dynamically rerouted to other endpoints in case an endpoint fails.
The service is deployed via Cloudflare Workers. This means that the code is simultaneously deployed in ~40 data centers around the world.
This has two main advantages over a more traditional bare-metal approach:
- Performance: Deployment in 40 Datacenters means that there will always be a datacenter close to the user, reducing latencies and improving speeds. Additionally, the service is automatically scaled by Cloudflare, preventing potential bottlenecks.
- Reliability: When building OneApi, one of the highest priorities was to decouple it as much from a single guild as possible. Using an independent infrastructure such as Cloudflare Workers allows decoupling the service as much as possible from my guild. It should be noted, that there is still some code running on my servers (more on that later).
How it works:
When OneApi receives a request, it forwards it to the optimal Endpoint based on:
- geographical region
- required feature set (Chain, Hyperion, etc.)
The country from which the request originated will be matched to one of the following three load balancing regions:
- Europa & Africa
- Asian Pacific
The traffic is only forwarded to a list of API Endpoints that have successfully passed a set of validations. These validations are performed by the Validationcore, which is running on Blacklusion’s servers. Once every 10 minutes all API endpoints are validated.
The pool of API Endpoints that are used by OneApi is refreshed every 10 minutes as well, by calling the Validationcore API. However, an endpoint has to work for at least 6 validations in a row (~60mins) to be considered healthy. It is not enough to just pass the last validation. The Validationcore API also performs a location lookup based on the IP, to match the API to the correct load balancing zone.
Currently, OneApi offers two main Middleware services:
- Route checking/blocking: The list of allowed routes is hardcoded and only requests that match this list will be forwarded to an API Endpoint. For example, requests for the producer Api or routes that don’t exist will be directly blocked by OneApi.
- JSON check: If a request has a body, it will be checked if the body can be parsed into a valid JSON before forwarding it to an API Endpoint.
In both cases, standardized error messages will be sent and no API endpoint will receive the faulty request.
One unfortunate byproduct of the Cloudflare workers is a couple of headers that are added to the response by Cloudflare. Currently, there is no possibility to remove those. However, some custom headers are added as well:
“x-handled-by” header. This header indicates which API Endpoint has processed the request.
“x-rejected-by” header. This header will only be set if one of the middleware features is triggered, and the request is directly rejected by OneApi. In this case, no x-handled-by header will be set.
OneApi automatically times out an Api request after 2000ms. If a request has timed out or if certain HTTP errors have been returned (e.g. 403 forbidden or rate limit associated errors), a second request will be sent to a different API. The result of the backup request will then be returned to the user. This reduces the chance of a user not getting a response. The API Endpoint, that did not respond successfully is then excluded from the locally cached list and no further requests will be forwarded to that API Endpoint.
As mentioned in the beginning, one of the main objectives was to reduce the dependency on single guilds. Unfortunately, the Validationcore has to run on Blacklusion’s servers. To mitigate the dependency on my guild, the current API list is stored in three separate locations:
The OneApi worker has a locally cached list of the API endpoints in the worker's memory. The worker is able to edit that list and to detect and remove nonfunctioning endpoints. This also has major performance benefits, since a call to the Cloudflare KV storage costs ~100ms. Having locally cached lists, allows the worker to handle a request completely without accessing the KV or other Apis.
Once per minute (or if the locally cached list is empty) the last list is pulled from the KV storage. This mechanism was implemented to overwrite the local list and to add endpoints back into the pool that may have been removed e.g. because of rate-limiting errors.
It should be noted, that the worker is not able to edit the list in the KV storage itself. The list in the KV storage is only updated when a new list is requested from the Validationcore API. This update process is running on a different worker and is decoupled from the actual OneApi Worker.
In a worst-case scenario, where Blacklusion’s servers would stop functioning, the OneApi service would still be able to keep running: The workers can pull the last list from the KV storage and prune nonfunctioning endpoints by themselves. And since the list in the KV storage cannot be edited by the workers, it is prevented that workers remove endpoints from the shared list, that may work for other workers. Of course, this will reduce performance and should not happen in the first place, but there is some room for failure by design. And in case everything is falling apart, every worker is also deployed with a hardcoded fallback list. This list kicks in if the KV storage List is empty or cannot be accessed.