The official AWS documentation does not provide guidance around which endpoint type to use. This becomes relevant when once wants to run their entire website behind Cloudfront. does suggest using a REGIONAL endpoint if you also have a Cloudfront distribution, but is lacking further detail in the area.


For this test, AWS CDK was used to setup all of the infrastructure. It makes it trivial to iterate over a list of options + regions to easy generate everything very quickly (and to update it all when there was a bug). C# was selected as the language only as a change of pace from the Java + Typescript used at work but to preserve static typing.

The all the infrastructure under test was created in us-east-1. Four identical Cloudfront distributions were created with no caching and response compression enabled. Each distribution had one origin pointing to an API Gateway. Each API Gateway was generated with a combination of EDGE/REGIONAL endpoint type and response compression enabled/disabled so as to test all the different combinations. The api had a single lambda integration pointing to the same lambda. The lambda was written in javascript using nodejs 12 and configured with 125MB of ram. The lambda would immediately respond with 256KB of random text (so as to reflect real world non-cachable responses).

For collecting the test data, a lambda was deployed to every AWS commercial region. The test lambda was again written in javascript, using the nodejs 12 runtime with 125MB of memory configured. For the test, it would make GET requests to the given distribution and wait for the request to finish downloading the response. The clientside measured load time was emitted to cloudwatch as a custom metric.

aws architecture diagram


All tests where run async’ly at the same time using the lambda cli on 2020 09 20 21:00:00 UTC. Each test lambda was run for 15mins, and no errors were reported at this time.

test script used


clientside response time metrics

source dataset csv

additional Cloudfront distribution metrics

Origin latency The total time spent from when CloudFront receives a request to when it starts providing a response to the network (not the viewer), for requests that are served from the origin, not the CloudFront cache. This is also known as first byte latency, or time-to-first-byte. src

p50 origin latency

p90 origin latency


REGIONAL is the preferred API Gateway endpoint type when behind a custom Cloudfront distribution

The response times for all regions were lower when the endpoint type was REGIONAL. This is expected because REGIONAL endpoints have one fewer ‘hops’ (no built in Cloudfront distribution) to go through to get to the lambda integration. A ~10% response time improvement was observed during the test when using a REGIONAL endpoint.

API Gateway compression is only suggested for far away users

Nearby users will see a small performance hit, but far away users will see a larger performance gain. Ideally, one would place another API gateway closer to their faraway users if there was enough of them to justify the cost + complexity.

Additionally, compression only added to the response time of EDGE endpoints.


github repo with all related code + results dataset