Cowboy is a small modular webserver written in Erlang. It is at the heart of Elixir’s Phoenix web application framework, as well as many other projects in the Erlang and Elixir ecosystem. Cowboy has evolved over the years from version 1.0 released in 2014 to the latest version 2.7. It is important to note that the change from 1.x to 2.x was the most significant and required changes in the user’s code. Cowboy itself is composed of two main components Cowlib and Ranch. Cowlib is a generic parser and builder for HTTP based protocols. Ranch is a socket acceptor pool that stands directly on top of TCP/IP and TLS functionality exposed by Erlang OTP. Cowlib is sufficiently abstract to be a shared library between HTTP servers and clients such as Gun, while Ranch may underpin other servers implementing protocols on top of TLS and TCP/IP.
In this article, we will benchmark the performance of Cowboy for all releases from 1.1.2 to 2.7. In most cases, we will use Ranch and Cowlib that linked with each release by default. For 2.7, we will benchmark with default Ranch 1.7.1 and separately with the newest Ranch 2.0.0-rc2.
To simulate a generic web application client and server behavior, we have devised the following synthetic workload. The client device opens a connection and sends 100 requests with 900±5% milliseconds in between each one. The server handles a request by sleeping for 100±5% milliseconds, to simulate a backend database request, and then returns 1 kB of payload. Without additional delays, this results in an average connection lifetime of 100 seconds, and an average load of 1 request per second, per device. The following Stressgrid script represents the client side of the workload.
0..100 |> Enum.each(fn _ -> get("/") delay(900, 0.05) end)
In this benchmark, we tested against c5.2xlarge AWS instance that has 8 vCPUs and 16 GiB of RAM. We used Ubuntu 18.04.3 with 4.15.0-1054-aws kernel, and the following sysctld overrides.
fs.file-max = 10000000 net.core.somaxconn = 1024
During the 30 minute test, the load linearly increased from 0 to 100k devices that correspond to 100k requests per second. We specifically selected the maximum load to be outside of the c5.2xlarge capacity so that we can observe the saturation point for each version of Cowboy.
In the first series of tests, we used Erlang OTP 22.1.8.
On this graph, we show the response rate observed by clients or in other words, completed requests per second. The big difference is between 1.1.2 and all 2.x versions. Version 1.1.2 is a surprising leader and has peaked at over 80k requests per second. All 2.x-es stayed between 50k and 60k. Ranch versions had no significant effect on the performance.
99% percentiles for connection and response latencies tell a similar story with 1.1.2 significantly outperforming all of 2.x.
In the second series of tests, we decided to use Cowboy 1.1.2 and 2.7 and then vary between three OTP versions: 126.96.36.199, 188.8.131.52, and 22.1.8.
The OTP 21, combined with Cowboy 1.1.2, shows the best performance peaking at almost 90k requests per second with OTP 22 being a close second. With Cowboy 2.7, the difference between OTP 21 and OTP 22 is less significant. With both versions of Cowboy, the OTP 20 performance is the lowest.
The surprising conclusion of the survey is that the combination of pre-2.x Cowboy and relatively old OTP 21 demonstrated the best performance. Another surprise was no significant difference between various 2.x versions of Cowboy as well as between Ranch 1.x and 2.0.0-rc2.
UPDATE: In the part 2 we analyze the root cause of performance degradation in Cowboy 2.x. We will also point to some solutions and show their effect on the test results.