The Digital Cat - infrastructurehttps://www.thedigitalcatonline.com/2023-10-16T11:00:00+01:00Adventures of a curious cat in the land of programmingDesign and implement a flexible VPC on AWS2023-10-16T11:00:00+01:002023-10-16T11:00:00+01:00Leonardo Giordanitag:www.thedigitalcatonline.com,2023-10-16:/blog/2023/10/16/design-and-implement-a-flexible-vpc-on-aws/<p> An example of network design with an implementation using AWS VPC</p><p>Designing networks is not an easy task, and while cloud computing removes the hassle (and also a bit the fun) of moving around switches and cables, it left untouched the complexity of planning a good structure. But, what does "good structure" mean?</p><p>I think this is crucial question in engineering. A well-designed (or well-architected) system cannot be defined once and for all, because its nature, structure, and components depend on the requirements. Hence the usual answer: "it depends".</p><p>In this post I want to give an example of some business requirements and of the structure of a network that might satisfy them.</p><p>For the implementation I will work with <a href="https://aws.amazon.com/vpc/">AWS VPC</a>, which has several advantages. First of all, AWS is one of the major cloud providers, and this might help beginners to better understand how it works. Second, most of the components of a VPC are free of charge, which means that anyone can apply the structure I will show without having to pay. The only component that AWS will charge you for is a NAT gateway, but the price is around 0.045 $/hour, which means that with a single dollar you can enjoy a trip on a well-architected VPC for approximately 22 hours. After that time you can always remove the NAT and keep working on the free part of your VPC.</p><h2 id="a-quick-recap-of-ip-and-cidrs-4f94">A quick recap of IP and CIDRs<a class="headerlink" href="#a-quick-recap-of-ip-and-cidrs-4f94" title="Permanent link">¶</a></h2><p>You should be familiar with the IP protocol to use VPC effectively, but if your knowledge is rusty I give you a quick recap.</p><p>IPv4 addresses are made of 32 bits, thus spanning 2<sup>32</sup> values, between 0 and 4,294,967,295 (2<sup>32</sup>-1). To simplify their usage, we split IPv4 addresses into 4 chunks of 8 bits (octets) and convert each into a decimal number which is thus between 0 and 255 (2<sup>8</sup>-1). The classic form of an IPv4 address is thus <code>A.B.C.D</code>, e.g. <code>1.2.3.4</code>, <code>134.32.175.52</code>, <code>255.255.0.0</code>.</p><p>When considering ranges of addresses, giving the first and last address might be tedious and difficult to read. The CIDR notation was introduced to simplify this. A CIDR (Classless Inter-Domain Routing) is expressed in the form <code>A.B.C.D/N</code>, where <code>N</code> is a number of bits between 0 and 32 and represents how many bits of the address remain fixed. A CIDR like <code>134.73.28.196/32</code> represents only the address <code>134.73.28.196</code>, as 32 bits out of 32 are fixed. Conversely, the CIDR <code>0.0.0.0/0</code> represents all IPv4 addresses as 0 bits out of 32 are fixed.</p><p>The range of addresses corresponding to a CIDR is in general not easy to compute manually, but those corresponding to the 4 octets are trivial</p><ul><li>The CIDR <code>A.B.C.D/32</code> corresponds to the address <code>A.B.C.D</code>.</li><li>The CIDR <code>A.B.C.0/24</code> corresponds to the addresses between <code>A.B.C.0</code> and <code>A.B.C.255</code> (255 addresses, 2<sup>32-24</sup> or 2<sup>8</sup>). Here, the first 24 bits (the 3 octets <code>A</code>, <code>B</code>, and <code>C</code>) are fixed.</li><li>The CIDR <code>A.B.0.0/16</code> corresponds to the addresses between <code>A.B.0.0</code> and <code>A.B.255.255</code> (65,536 addresses, 2<sup>32-16</sup> or 2<sup>16</sup>). Here, the first 16 bits (the 2 octets <code>A</code> and <code>B</code>) are fixed.</li><li>The CIDR <code>A.0.0.0/8</code> corresponds to the addresses between <code>A.0.0.0</code> and <code>A.255.255.255</code> (16,777,216 addresses, 2<sup>32-8</sup> or 2<sup>24</sup>). Here, the first 8 bits (the octet <code>A</code>) are fixed.</li></ul><p>Please note that by convention we set the variable octets to 0. The CIDR <code>A.B.C.0/24</code> is exactly the same as the CIDR <code>A.B.C.D/24</code>, as the octet <code>D</code> is not fixed. For this reason it's deceiving and useless to set it. For example, I would never write <code>153.23.95.34/24</code>, as this means all addresses between <code>153.23.95.0</code> and <code>153.23.95.255</code>, so the final <code>34</code> is just misleading. <code>153.23.95.0/24</code> is much better in this case.</p><p>You can use the <a href="https://jodies.de/ipcalc">IP Calculator</a> by Krischan Jodies to explore CIDRs.</p><p>As the number of IPv4 addresses quickly proved to be insufficient we developed IPv6, but in the meantime we also created private network spaces. In IPv4 there are 3 different ranges of addresses that are considered "private", which means that they can be duplicated and that they are not reachable from the Internet. The difference between public and private addresses is the same between "London, UK" (there is only one in the world) and "kitchen" (every house has one).</p><p>The three private ranges in IPv4 are:</p><ul><li><code>192.168.0.0/16</code> - 65536 addresses between <code>192.168.0.0</code> and <code>192.168.255.255</code></li><li><code>172.16.0.0/12</code> - 1,048,576 addresses between <code>172.16.0.0</code> and <code>172.31.255.255</code> (this is not easily computed manually because 12 is not a multiple of 8)</li><li><code>10.0.0.0/8</code> - 16,777,216 addresses between <code>10.0.0.0</code> and <code>10.255.255.255</code></li></ul><p>This means that IP addresses like <code>192.168.6.1</code>, <code>172.17.123.45</code>, and <code>10.34.168.20</code> are all private. Take care of the second range, as it goes from <code>172.16</code> to <code>172.31</code>, so an address like <code>172.32.123.45</code> is definitely public.</p><p>Now that your knowledge of IP addresses has been fully restored we can dive into network design.</p><h2 id="requirements-dd57">Requirements<a class="headerlink" href="#requirements-dd57" title="Permanent link">¶</a></h2><p>As I mentioned in the introduction, the most important part of a system design are requirements, both the present and the future ones.</p><p>If you design a road, it is crucial to understand how many vehicles will travel on it per minute (or per hour, day, month) and the type of vehicle. I'm not an expert of highway engineering, but I'm sure a road for mining trucks has to be different from a cycle lane, and the same is true for a computer system. Surely you want to store information in a database, but the size and the type of it depend on the amount of data you have, the usage pattern, the required reliability, and so on.</p><p>We are designing a network that will host cloud computing resources such as computing instances, databases, load balancers, and so on. I will globally refer to them as <em>resources</em> or <em>instances</em>, without paying to much attention to the concrete nature of each of them. From the networking point of view they are all just a bunch of network cards.</p><p>As an example, we have the following business requirements for a company called ZooSoft:</p><ul><li>There are currently <strong>three main products</strong>: Alligator Accounting, Barracuda Blogging, Coyote CAD.</li><li>There might be <strong>more products</strong> in the future, we are in the first design stages of Dragonfly Draw and Echidna Email.</li><li>We need <strong>four environments</strong> for each product: Live, Staging, Demo, UAT.<ul><li>Live is the application accessed by clients</li><li>Staging is a clone of Live that is used to run extensive pre-release tests and to perform initial support debugging</li><li>Demo runs the application with the same configuration as Live but with fake data, used to showcase the application to new customers</li><li>UAT contains on-demand instances used by developers and QA to test new features</li></ul></li><li>Some data or services are <strong>shared among the products</strong>, and the infrastructure team needs a space where to deploy their tools.</li></ul><h2 id="initial-analysis-c228">Initial analysis<a class="headerlink" href="#initial-analysis-c228" title="Permanent link">¶</a></h2><p>As you see, I highlighted some of the most important points we need to keep in mind.</p><ul><li><strong>There are currently 3 products</strong>. Not a single one, not 1 hundred. It is important to understand this number because we probably want to have separate spaces for each product, with different teams working on each one. If the company had one single product we might expect it to create a new one in the future, but it might not be that urgent to have space to grow. On the other hand, if the company had already 100 products we might want to design things with a completely different approach.</li><li><strong>There might be more products in the future</strong>. Again, it is important to have a good idea of the future requirements, as most of the problems of a system will come when the usage patterns change. It's generally a good idea to leave space for growth, but overdoing it might lead to a waste of resources and ultimately money. Understanding the growth expectation is paramount to find a good balance between inflexibility and waste of resources.</li><li><strong>There are 4 different usage patterns for each application</strong>, each one with its own requirements. The Live environment clearly needs a lot of power and redundancy to provide a stable service for users, while environments like UAT and Demo will certainly have more relaxed parameters in terms of availability or reliability.</li><li><strong>We need space to deploy internal tools</strong> used to monitor the application and to test new solutions. The architecture of the application might change in the future so we need space to try out different structures and products.</li></ul><p>In general, it's a good idea to <strong>isolate anything that doesn't need to be shared</strong> across teams or products, as it reduces the risk of errors and exposes less resources to attacks. In AWS, the concept of account allows us to completely separate environments at the infrastructure level. Resources in separate accounts can still communicate, but this requires a certain amount of work to set up the connection, which ultimately promotes isolation.</p><p>So, the initial idea might be to give each product a different account. However, we also have 4 different environments for each product, and given the relative simplicity involved in the creation of an AWS account it sounds like a good idea to have one of them for each combination of product and environment. AWS provides a tool called <a href="https://aws.amazon.com/controltower/">Control Tower</a> that can greatly simplify the creation and management of accounts, which makes this choice even more reasonable.</p><p>A VPC (Virtual Private Network) is, as the name suggests, a private network that allows different products to use the same IP address pool without clashing, which is not new to anyone is familiar with private IP address spaces. This means that we could easily create in each account a VPC with a CIDR <code>10.0.0.0/8</code> that grants 2<sup>24</sup> (more than 16M) different IP addresses, plenty enough to host instances and databases for any type of application.</p><p>However, it might be useful in the future to connect different VPCs, for example to perform data migrations, and this is done in AWS through <a href="https://docs.aws.amazon.com/vpc/latest/peering/what-is-vpc-peering.html">VPC peering</a>. In simple words, this is a way to create a single network out of two different VPCs, but it can't be done if the two VPCs have overlapping CIDR blocks. This means that while we keep VPCs separate in different accounts, we might also want to assign different CIDRs to each one.</p><p>Avoiding overlap clearly reduces the size of a VPC, so let's have a look at some figures to have an idea of what we can create.</p><p>If we assign to each account a CIDR <code>10.X.0.0/16</code>, with X being a number assigned to the specific account, we can create up to 256 different accounts (from <code>10.0.0.0/16</code> to <code>10.255.0.0/16</code>). Out of an abundance of caution we might reserve the first 10 CIDRs for future use and internal needs, which leaves us with 246 non-overlapping CIDRs (from <code>10.10.0.0/16</code> to <code>10.255.0.0/16</code>). This means that we have space for several combinations of product/environment, for example we might have up to 41 products with 6 environments each or 30 products with 8 environments each (with leftovers).</p><p>Since at the moment we have 3 products with 4 environments each, this choice looks reasonable. At the same time, a <code>/16</code> CIDR grants us space for 2<sup></sup>16 (65536) resources, which again looks more than enough to host a standard web application.</p><h2 id="assignment-plan-3693">Assignment plan<a class="headerlink" href="#assignment-plan-3693" title="Permanent link">¶</a></h2><p>To simplify the schema, let's grant space for 20 products and group CIDRs by environment. This means we will have 20 CIDRs for Live environments, 20 for Staging, and so on. The assignment plan is then</p><div class="code"><div class="content"><div class="highlight"><pre>10.0.0.0/16 reserved
...
10.9.0.0/16 reserved
10.10.0.0/16 alligator-accounting-live
10.11.0.0/16 barracuda-blogging-live
10.12.0.0/16 coyote-cad-live
...
10.30.0.0/16 alligator-accounting-staging
10.31.0.0/16 barracuda-blogging-staging
10.32.0.0/16 coyote-cad-staging
...
10.50.0.0/16 alligator-accounting-demo
10.51.0.0/16 barracuda-blogging-demo
10.52.0.0/16 coyote-cad-demo
...
10.70.0.0/16 alligator-accounting-uat
10.71.0.0/16 barracuda-blogging-uat
10.72.0.0/16 coyote-cad-uat
...
10.250.0.0/16 infrastructure-team
...
10.255.0.0/16 infrastructure-team
</pre></div> </div> </div><p>As I mentioned, the initial CIDRs are reserved for future use, but we also kept the final 6 CIDRs for the needs of the infrastructure team. Keep in mind that this is only an example and that we are clearly free to change any of these figure to match our needs more closely. Each one of these CIDRs will be assigned to a specific account.</p><p>Should we create new products we will continue with the same pattern, e.g.</p><div class="code"><div class="content"><div class="highlight"><pre>10.0.0.0/16 reserved
...
10.10.0.0/16 alligator-accounting-live
10.11.0.0/16 barracuda-blogging-live
10.12.0.0/16 coyote-cad-live
<span class="hll">10.13.0.0/16 dragonfly-draw-live
</span>...
10.30.0.0/16 alligator-accounting-staging
10.31.0.0/16 barracuda-blogging-staging
10.32.0.0/16 coyote-cad-staging
<span class="hll">10.33.0.0/16 dragonfly-draw-staging
</span>...
10.50.0.0/16 alligator-accounting-demo
10.51.0.0/16 barracuda-blogging-demo
10.52.0.0/16 coyote-cad-demo
<span class="hll">10.53.0.0/16 dragonfly-draw-demo
</span>...
10.70.0.0/16 alligator-accounting-uat
10.71.0.0/16 barracuda-blogging-uat
10.72.0.0/16 coyote-cad-uat
<span class="hll">10.73.0.0/16 dragonfly-draw-uat
</span>...
10.250.0.0/16 infrastructure-team
...
</pre></div> </div> </div><p>We are planning to use IaC tools to implement this, but it's nevertheless interesting to spot the patterns in this schema that make it easier to debug network connections.</p><p>All environments of a certain type belong to a specific range, so an address like <code>10.15.123.456</code> is definitely in a Live environment. At the same time, IP addresses across the same product have the same final digit in the second part of the address, so if <code>10.12.456.789</code> is a Live instance, the corresponding Staging instance will have an address like <code>10.32.X.Y</code>.</p><p>While this is not crucial, I wouldn't underestimate the value of a regular structure that can give precious information at a glance. While debugging during an emergency things like this might be a blessing.</p><p>The last thing to note is that in this schema the 160 CIDRs between <code>10.90.0.0/16</code> and <code>10.249.0.0/16</code> are not allocated. This might give you a better idea of how wide a <code>10.0.0.0/8</code> network space is! Such accounts can be used to host up to other 8 environments for each product.</p><h2 id="address-space-bad2">Address space<a class="headerlink" href="#address-space-bad2" title="Permanent link">¶</a></h2><p>Let's focus on a single CIDR in the form <code>10.N.0.0/16</code>. As we know this provides 65536 addresses (2<sup>16</sup>) that we need to split into subnets. In AWS, subnets correspond to different Availability Zones, that are "distinct locations within an AWS Region that are engineered to be isolated from failures in other Availability Zones" (from the docs). In other words, they are separate data centres built so that if one blows up the others should be unaffected. I guess this depends on the size of the explosion, but within reason this is the idea.</p><p>So, each account gets 65536 addresses (<code>/16</code>), split into:</p><ul><li>1 public subnet for the NAT gateway to live in (<code>nat</code>).</li><li>3 private subnets for the computing resources (<code>private_a</code>, <code>private_b</code>, <code>private_c</code>).</li><li>3 public subnets for the load balancer (<code>public_a</code>, <code>public_b</code>, <code>public_c</code>).</li><li>1 public subnet for the bastion instance (<code>bastion</code>).</li></ul><p>Now, if you are not familiar with subnets they are the simplest of concepts. You get the address space of a network (say for example <code>10.10.0.0/16</code>, that is the addresses from <code>10.10.0.0</code> to <code>10.10.255.255</code>) and split it into chunks. The fact that each chunk is assigned to a different data centre is an AWS addition and is not part of the definition of subnet in principle. However, the reason behind subnetting is exactly to create small <em>physical</em> networks that are therefore more efficient. If two computers are on the same subnet the routing of IP packets exchanged by them is simpler and thus faster. For similar reasons, and to increase security, it's a good idea to keep your subnets as small as possible.</p><p>In this case, we might create subnets in a <code>/23</code> space (512 addresses each), which looks wide enough to host the web applications of ZooSoft. Before we have a look at the actual figures let me clarify what this means. I assume each application (Alligator Accounting, Barracuda Blogging, and so on) has been containerised, maybe using ECS or EKS, which however means that there are EC2 instances behind the scenes running the containers. If we are using Fargate we do not provide EC2 instances and in that case we might set up our network in a different way.</p><p>EC2 instances are computers, and they all have at least one network interface, which corresponds to an IP address. So, when I say that a subnet contains 512 addresses I mean that in a single subnet I can run up to 507 EC2 instances (remember that AWS reserves some addresses, see <a href="https://docs.aws.amazon.com/vpc/latest/userguide/subnet-sizing.html">https://docs.aws.amazon.com/vpc/latest/userguide/subnet-sizing.html</a>). Assuming instances with 8 GiB of memory each (e.g. <code>m7g.large</code>) and containers that require 1 GiB of memory we can easily host 3042 containers (507*6) leaving 2 GiB for each instance to host newly created containers (for example to run blue-green deployments). These are clearly examples and you have to adapt them to the requirements of your specific application, but I hope you get an idea of how to roughly estimate this sort of quantities.</p><p>Remember that in AWS the difference between public and private networks is only in the gateway they are connected to. Public networks are connected to an Internet Gateway and thus are reachable from Internet, while private networks are either disconnected from Internet or connected with a NAT, which allows them to access Internet but not to be accessed from outside.</p><p>The <code>bastion</code> subnet might or might not be useful. In general, <code>bastion</code> hosts are very secure instances that can be accessed using SSH, and from which you can access the rest of the instances. As from the point of view of security they are a weak point of the whole infrastructure you might not want to have them or replace them with more ephemeral solutions. In any case, I left the network there as an example of a space that hosts tools not directly connected with the application.</p><p>Let's have a deeper look at the figures. A <code>/16</code> space can be split into 128 <code>/23</code> spaces (2<sup>23-16</sup>), but given the list of subnets I showed before we need only 8 of them, which leaves again a lot of space for further expansion, and there are two types of expansion we might consider. One is increasing the number of subnets, the other is increasing the size of subnets themselves. With the amount of space granted by the current size of the networks we have plenty of options to cover both cases. We might reach a good balance between the size of the network and the number of networks increasing the potential size to <code>/21</code> (2048 addresses), which grants us space for 32 subnetworks.</p><p>Here, I show a possible schema for the account <code>alligator-accounting-live</code> that is granted the space <code>10.10.0.0/16</code>.</p><div class="code"><div class="content"><div class="highlight"><pre>NAME CIDR ADDRESSES NUM ADDRESSES
reserved 10.10.0.0/21 (10.10.0.0 - 10.10.7.255) {2048}
nat 10.10.8.0/23 (10.10.8.0 - 10.10.9.255) {512}
expandable to
10.10.8.0/21 (10.10.8.0 - 10.10.15.255) {2048}
PUBLIC
reserved 10.10.16.0/21 (10.10.10.0 - 10.10.15.255) {2048}
reserved 10.10.24.0/21 (10.10.10.0 - 10.10.31.255) {2048}
private-a 10.10.32.0/23 (10.10.32.0 - 10.10.33.255) {512}
expandable to
10.10.32.0/21 (10.10.32.0 - 10.10.39.255) {2048}
private-b 10.10.40.0/23 (10.10.40.0 - 10.10.41.255) {512}
expandable to
10.10.40.0/21 (10.10.40.0 - 10.10.47.255) {2048}
private-c 10.10.48.0/23 (10.10.48.0 - 10.10.49.255) {512}
expandable to
10.10.48.0/21 (10.10.48.0 - 10.10.55.255) {2048}
reserved 10.10.56.0/21 (10.10.56.0 - 10.10.63.255) {2048}
RESERVED PRIVATE 4
reserved 10.10.63.0/21 (10.10.63.0 - 10.10.71.255) {2048}
RESERVED PRIVATE 5
...
reserved 10.10.128.0/21 (10.10.128.0 - 10.10.135.255) {2048}
RESERVED PRIVATE 13
public-a 10.10.136.0/23 (10.10.136.0 - 10.10.137.255) {512}
expandable to
10.10.136.0/21 (10.10.136.0 - 10.10.143.255) {2048}
PUBLIC
public-b 10.10.144.0/23 (10.10.144.0 - 10.10.145.255) {512}
expandable to
10.10.144.0/21 (10.10.144.0 - 10.10.151.255) {2048}
PUBLIC
public-c 10.10.152.0/23 (10.10.152.0 - 10.10.153.255) {512}
expandable to
10.10.152.0/21 (10.10.152.0 - 10.10.159.255) {2048}
PUBLIC
reserved 10.10.160.0/21 (10.10.160.0 - 10.10.167.255) {2048}
RESERVED PUBLIC 4
reserved 10.10.168.0/21 (10.10.167.0 - 10.10.175.255) {2048}
RESERVED PUBLIC 5
...
reserved 10.10.232.0/21 (10.10.232.0 - 10.10.239.255) {2048}
RESERVED PUBLIC 13
bastion 10.10.240.0/23(10.10.240.0 - 10.10.241.255) {512}
expandable to
10.10.240.0/21 (10.10.240.0 - 10.10.247.255) {2048}
PUBLIC
reserved 10.10.248.0/21(10.10.248.0 - 10.10.255.255) {2048}
</pre></div> </div> </div><h2 id="routing-4daa">Routing<a class="headerlink" href="#routing-4daa" title="Permanent link">¶</a></h2><p>Routing of each VPC is very simple:</p><ul><li>All resources in the private subnets will be routed into the NAT to grant them Internet access but to isolate them from outside.</li><li>All resources in the public subnets will be routed into the default Internet Gateway.</li><li>The NAT network has to be public, so it is routed into the default Internet Gateway.</li><li>The bastion subnet is public, so it routed into the default Internet Gateway.</li></ul><p>The NAT is a device that translates internet addresses, hiding the internal ones through some clever hacking of TCP/IP. This means that it has to live in a public network so that it can access the Internet.</p><p>The bastion (if present) is a machine that can be accessed from Internet, so it has to be in a public subnet. It's customary to grant access to the bastion to a specific set of IPs (e.g. the personal IPs of some developers) but this is done through Security Groups.</p><h2 id="relevant-figures-92bd">Relevant figures<a class="headerlink" href="#relevant-figures-92bd" title="Permanent link">¶</a></h2><p>In summary the current design grants us the following:</p><ul><li>246 non-overlapping accounts</li><li>20 different products each with 12 environments</li><li>32 subnets for each account</li><li>512 addresses per subnet, upgradable to 2048 without overlapping</li></ul><h2 id="a-simple-terraform-module-d757">A simple Terraform module<a class="headerlink" href="#a-simple-terraform-module-d757" title="Permanent link">¶</a></h2><p>The following code is a simple Terraform module intended to showcase how to create a well-designed VPC with that tool. I decided to avoid using complex loops or other clever hacks to keep it simple and accessible to anyone might be moving their first steps into AWS, VPC, network design, and Terraform. You are clearly free to build on top of it and to come up with a different or more clever implementation.</p><p>Remember that the NAT is the only resource that is not free of charge, so don't leave it up and running if don't use it. Don't be afraid of creating one and having a look in the AWS console though.</p><p>I assume the following files are all created in the same directory that I will conventionally call <code>modules/vpc</code>.</p><h3 id="vpc-2303">VPC</h3><div class="code"><div class="title"><code>modules/vpc/vpc.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_vpc"</span><span class="w"> </span><span class="nv">"main"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">cidr_block</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.cidr_prefix}.0.0/16"</span>
<span class="w"> </span><span class="nb">tags</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">Name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">var.name</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><div class="code"><div class="title"><code>modules/vpc/variables.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">variable</span><span class="w"> </span><span class="nv">"cidr_prefix"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">description</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"The first two octets of the CIDR, e.g. 10.10 (will become 10.10.0.0/16)"</span>
<span class="w"> </span><span class="na">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kt">string</span>
<span class="p">}</span>
<span class="kr">variable</span><span class="w"> </span><span class="nv">"name"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">description</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"The name of this VPC and the prefix/tag for its related resources"</span>
<span class="w"> </span><span class="na">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kt">string</span>
<span class="p">}</span>
</pre></div> </div> </div><p>When you call the module you will have to pass these two variables, e.g.</p><div class="code"><div class="title"><code>alligator-accounting-live/vpc/main.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">module</span><span class="w"> </span><span class="nv">"vpc"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">source</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"../../modules/vpc"</span>
<span class="w"> </span><span class="na">cidr_prefix</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"10.10"</span>
<span class="w"> </span><span class="na">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"alligator-accounting-live"</span>
<span class="p">}</span>
</pre></div> </div> </div><h3 id="internet-gateway-e563">Internet Gateway </h3><div class="code"><div class="title"><code>modules/vpc/gateway.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_internet_gateway"</span><span class="w"> </span><span class="nv">"main"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">vpc_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_vpc.main.id</span>
<span class="w"> </span><span class="nb">tags</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">Name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">var.name</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><h3 id="subnets-03e5">Subnets</h3><div class="code"><div class="title"><code>modules/vpc/subnets.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_subnet"</span><span class="w"> </span><span class="nv">"nat"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">vpc_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_vpc.main.id</span>
<span class="w"> </span><span class="na">cidr_block</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.cidr_prefix}.8.0/23"</span>
<span class="w"> </span><span class="na">availability_zone</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"eu-west-1a"</span>
<span class="w"> </span><span class="nb">tags</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">Name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.name}-nat"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_subnet"</span><span class="w"> </span><span class="nv">"private_a"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">vpc_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_vpc.main.id</span>
<span class="w"> </span><span class="na">cidr_block</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.cidr_prefix}.32.0/23"</span>
<span class="w"> </span><span class="na">availability_zone</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"eu-west-1a"</span>
<span class="w"> </span><span class="nb">tags</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">Name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.name}-private-a"</span>
<span class="w"> </span><span class="na">Tier</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"private"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_subnet"</span><span class="w"> </span><span class="nv">"private_b"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">vpc_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_vpc.main.id</span>
<span class="w"> </span><span class="na">cidr_block</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.cidr_prefix}.40.0/23"</span>
<span class="w"> </span><span class="na">availability_zone</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"eu-west-1b"</span>
<span class="w"> </span><span class="nb">tags</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">Name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.name}-private-b"</span>
<span class="w"> </span><span class="na">Tier</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"private"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_subnet"</span><span class="w"> </span><span class="nv">"private_c"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">vpc_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_vpc.main.id</span>
<span class="w"> </span><span class="na">cidr_block</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.cidr_prefix}.48.0/23"</span>
<span class="w"> </span><span class="na">availability_zone</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"eu-west-1c"</span>
<span class="w"> </span><span class="nb">tags</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">Name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.name}-private-c"</span>
<span class="w"> </span><span class="na">Tier</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"private"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_subnet"</span><span class="w"> </span><span class="nv">"public_a"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">vpc_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_vpc.main.id</span>
<span class="w"> </span><span class="na">cidr_block</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.cidr_prefix}.136.0/23"</span>
<span class="w"> </span><span class="na">availability_zone</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"eu-west-1a"</span>
<span class="w"> </span><span class="na">map_public_ip_on_launch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="no">true</span>
<span class="w"> </span><span class="nb">tags</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">Name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.name}-public-a"</span>
<span class="w"> </span><span class="na">Tier</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"public"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_subnet"</span><span class="w"> </span><span class="nv">"public_b"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">vpc_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_vpc.main.id</span>
<span class="w"> </span><span class="na">cidr_block</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.cidr_prefix}.144.0/23"</span>
<span class="w"> </span><span class="na">availability_zone</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"eu-west-1b"</span>
<span class="w"> </span><span class="na">map_public_ip_on_launch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="no">true</span>
<span class="w"> </span><span class="nb">tags</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">Name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.name}-public-b"</span>
<span class="w"> </span><span class="na">Tier</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"public"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_subnet"</span><span class="w"> </span><span class="nv">"public_c"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">vpc_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_vpc.main.id</span>
<span class="w"> </span><span class="na">cidr_block</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.cidr_prefix}.152.0/23"</span>
<span class="w"> </span><span class="na">availability_zone</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"eu-west-1c"</span>
<span class="w"> </span><span class="na">map_public_ip_on_launch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="no">true</span>
<span class="w"> </span><span class="nb">tags</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">Name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.name}-public-c"</span>
<span class="w"> </span><span class="na">Tier</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"public"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_subnet"</span><span class="w"> </span><span class="nv">"bastion"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">vpc_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_vpc.main.id</span>
<span class="w"> </span><span class="na">cidr_block</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.cidr_prefix}.240.0/23"</span>
<span class="w"> </span><span class="na">availability_zone</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"eu-west-1a"</span>
<span class="w"> </span><span class="na">map_public_ip_on_launch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="no">true</span>
<span class="w"> </span><span class="nb">tags</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">Name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.name}-bastion"</span>
<span class="w"> </span><span class="na">Tier</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"bastion"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><p>We need to associate the public subnets with the Internet Gateway.</p><div class="code"><div class="title"><code>modules/vpc/gateway.tf</code></div><div class="content"><div class="highlight"><pre><span class="p">[...]</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_route_table"</span><span class="w"> </span><span class="nv">"main"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">vpc_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_vpc.main.id</span>
<span class="w"> </span><span class="nb">route</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">cidr_block</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"0.0.0.0/0"</span>
<span class="w"> </span><span class="na">gateway_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_internet_gateway.main.id</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nb">tags</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">Name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.name} Internet Gateway"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_route_table_association"</span><span class="w"> </span><span class="nv">"public_a_to_igw"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">subnet_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_subnet.public_a.id</span>
<span class="w"> </span><span class="na">route_table_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_route_table.main.id</span>
<span class="p">}</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_route_table_association"</span><span class="w"> </span><span class="nv">"public_b_to_igw"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">subnet_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_subnet.public_b.id</span>
<span class="w"> </span><span class="na">route_table_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_route_table.main.id</span>
<span class="p">}</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_route_table_association"</span><span class="w"> </span><span class="nv">"public_c_to_igw"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">subnet_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_subnet.public_c.id</span>
<span class="w"> </span><span class="na">route_table_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_route_table.main.id</span>
<span class="p">}</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_route_table_association"</span><span class="w"> </span><span class="nv">"bastion_to_igw"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">subnet_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_subnet.bastion.id</span>
<span class="w"> </span><span class="na">route_table_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_route_table.main.id</span>
<span class="p">}</span>
</pre></div> </div> </div><h3 id="nat-gateway-3743">NAT Gateway</h3><p>We need a NAT to grant private networks access to the Internet.</p><div class="code"><div class="title"><code>modules/vpc/nat.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_eip"</span><span class="w"> </span><span class="nv">"nat_gateway"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">vpc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="no">true</span>
<span class="w"> </span><span class="nb">tags</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">Name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">var.name</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_route_table_association"</span><span class="w"> </span><span class="nv">"nat_to_igw"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">subnet_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_subnet.nat.id</span>
<span class="w"> </span><span class="na">route_table_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_route_table.main.id</span>
<span class="p">}</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_nat_gateway"</span><span class="w"> </span><span class="nv">"nat_gateway"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">allocation_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_eip.nat_gateway.id</span>
<span class="w"> </span><span class="na">subnet_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_subnet.nat.id</span>
<span class="w"> </span><span class="na">depends_on</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="nv">aws_internet_gateway.main</span><span class="p">]</span>
<span class="w"> </span><span class="nb">tags</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">Name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">var.name</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_route_table"</span><span class="w"> </span><span class="nv">"nat_gateway"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">vpc_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_vpc.main.id</span>
<span class="w"> </span><span class="nb">route</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">cidr_block</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"0.0.0.0/0"</span>
<span class="w"> </span><span class="na">nat_gateway_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_nat_gateway.nat_gateway.id</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nb">tags</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">Name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.name} NAT Gateway"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_route_table_association"</span><span class="w"> </span><span class="nv">"private_a_to_nat"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">subnet_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_subnet.private_a.id</span>
<span class="w"> </span><span class="na">route_table_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_route_table.nat_gateway.id</span>
<span class="p">}</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_route_table_association"</span><span class="w"> </span><span class="nv">"private_b_to_nat"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">subnet_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_subnet.private_b.id</span>
<span class="w"> </span><span class="na">route_table_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_route_table.nat_gateway.id</span>
<span class="p">}</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_route_table_association"</span><span class="w"> </span><span class="nv">"private_c_to_nat"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">subnet_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_subnet.private_c.id</span>
<span class="w"> </span><span class="na">route_table_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_route_table.nat_gateway.id</span>
<span class="p">}</span>
</pre></div> </div> </div><h2 id="subnet-groups-bcd9">Subnet groups<a class="headerlink" href="#subnet-groups-bcd9" title="Permanent link">¶</a></h2><p>As an optional step, you might want to create <em>subnet groups</em> for RDS. In AWS, you can create RDS instances in public networks out of the box, but if you want to put them in a private network (and <em>you want</em> to put them there) you need to build a subnet group. See <a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_VPC.WorkingWithRDSInstanceinaVPC.html">the documentation</a>.</p><div class="code"><div class="title"><code>modules/vpc/subnets.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_db_subnet_group"</span><span class="w"> </span><span class="nv">"rds_group"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"rds_private"</span>
<span class="w"> </span><span class="na">subnet_ids</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="nv">aws_subnet.private_a.id</span><span class="p">,</span>
<span class="w"> </span><span class="nv">aws_subnet.private_b.id</span><span class="p">,</span>
<span class="w"> </span><span class="nv">aws_subnet.private_c.id</span>
<span class="w"> </span><span class="p">]</span>
<span class="w"> </span><span class="nb">tags</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">Name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.name} RDS subnet group"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><h2 id="final-words-9803">Final words<a class="headerlink" href="#final-words-9803" title="Permanent link">¶</a></h2><p>I hope this was a useful and interesting trip into network design. As an example, this might sound trivial and simple compared to what is needed in certain contexts, but it is definitely a good setup that you can build on. I think VPC is often overlooked as it is assumed developers are familiar with networks. As networking is a crucial part of a system and will pop up in other technologies like Docker or Kubernetes, I recommend any mid-level or senior developer to make sure they are familiar with the main concepts of IP. Happy learning!</p><h2 id="feedback-d845">Feedback<a class="headerlink" href="#feedback-d845" title="Permanent link">¶</a></h2><p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>From Docker CLI to Docker Compose2022-02-19T15:00:00+01:002022-03-17T10:00:00+00:00Leonardo Giordanitag:www.thedigitalcatonline.com,2022-02-19:/blog/2022/02/19/from-docker-cli-to-docker-compose/<p> A hands-on post that shows how to build a system with Docker and which problems Docker Compose solves</p><p>In this post I will show you how and why Docker Compose is useful, building a simple application written in Python that uses PostgreSQL. I think it is worth going through such an exercise to see how technologies that we might be already familiar with actually simplify workflows that would otherwise definitely be more complicated.</p><p>The name of the demo application I will develop is a very unimaginative <code>whale</code>, that shouldn't clash with any other name introduced by the tools I will use. Every time you see something with <code>whale</code> in it you know that I am referring to a value that you can change according to your setup.</p><p>Before we start, please create a directory to host all the files we will create. I will refer to this directory as the "project directory". </p><h2 id="postgresql-090e">PostgreSQL<a class="headerlink" href="#postgresql-090e" title="Permanent link">¶</a></h2><p>Since the application will connect to a PostgreSQL database the first thing we can explore is how to run that in a Docker container.</p><p>The official Postgres image can be found <a href="https://hub.docker.com/_/postgres">here</a>, and I highly recommend taking the time to properly read the documentation, as it contains a myriad of details that you should be familiar with.</p><p>For the time being, let's focus on the environment variables that the image requires you to set.</p><h3 id="password-cd2a">Password</h3><p>The first variable is <code>POSTGRES_PASSWORD</code>, which is the only mandatory configuration value (unless you disable authentication which is not recommended). Indeed, if you run the image without setting this value, you get this message</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker run postgres
Error: Database is uninitialized and superuser password is not specified.
You must specify POSTGRES_PASSWORD to a non-empty value for the
superuser. For example, "-e POSTGRES_PASSWORD=password" on "docker run".
You may also use "POSTGRES_HOST_AUTH_METHOD=trust" to allow all
connections without a password. This is *not* recommended.
See PostgreSQL documentation about "trust":
https://www.postgresql.org/docs/current/auth-trust.html
</pre></div> </div> </div><p>This value is very interesting because it's a secret. So, while I will treat it as a simple configuration value in the first stages of the setup, later we will need to discuss how to manage it properly.</p><h3 id="superuser-93cc">Superuser</h3><p>Being a production-grade database, Postgres allows you to specify users, groups, and permissions in a fine-grained fashion. I won't go into that as it's usually more a matter of database administration and application development, but we need to define at least the superuser. The default value for this image is <code>postgres</code>, but you can change it setting <code>POSTGRES_USER</code>.</p><h3 id="database-name-796b">Database name</h3><p>If you do not specify the value of <code>POSTGRES_DB</code>, this image will create a default database with the name of the superuser.</p><hr><p>A note of warning here. If you omit both the database name and the user you will end up with the superuser <code>postgres</code> and database <code>postgres</code>. The <a href="https://www.postgresql.org/docs/current/creating-cluster.html">official documentation</a> states that</p><div class="code"><div class="content"><div class="highlight"><pre>After initialization, a database cluster will contain a database named
postgres, which is meant as a default database for use by utilities,
users and third party applications. The database server itself does not
require the postgres database to exist, but many external utility programs
assume it exists.
</pre></div> </div> </div><p>This mean that it is not ideal to use that as the database for our application. So, unless you are just trying out a quick piece of code, my recommendation is to always configure all three values: <code>POSTGRES_PASSWORD</code>, <code>POSTGRES_USER</code>, and <code>POSTGRES_DB</code>.</p><p>We can run the image with</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker run -d \
-e POSTGRES_PASSWORD=whale_password \
-e POSTGRES_DB=whale_db \
-e POSTGRES_USER=whale_user \
postgres:13
</pre></div> </div> </div><p>As you can see I run the image in <a href="https://docs.docker.com/engine/reference/run/#detached--d">detached mode</a>. This image is not meant to be interactive, as Postgres is by it's very nature a daemon. To connect in an interactive way we need to use the tool <code>psql</code>, which is provided by this image. Please note that I'm running <code>postgres:13</code> only to keep the post consistent with what you will see if you read it in the future, you are clearly free to use any version of the engine.</p><p>The ID of the container is returned by <code>docker run</code> but we can retrieve it any time running <code>docker ps</code>. Using IDs is however pretty complicated, and looking at the command history is not immediately clear what you have been doing at a certain point in time. For this reason, it's a good idea to name the containers.</p><p>Stop the previous container and run it again with</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker run -d \
--name whale-postgres \
-e POSTGRES_PASSWORD=whale_password \
-e POSTGRES_DB=whale_db \
-e POSTGRES_USER=whale_user \
postgres:13
</pre></div> </div> </div><div class="infobox"><i class="fa fa-info-circle"></i><div class="title">Stopping containers</div><div><p>You can stop containers using <code>docker stop ID</code>. This <a href="https://docs.docker.com/engine/reference/commandline/stop/#extended-description">gives containers a grace period</a> to react to the <code>SIGTERM</code> signal, for example to properly close files and terminate connections, and then terminates it with <code>SIGKILL</code>. You can also force it to stop unconditionally using <code>docker kill ID</code> which sends <code>SIGKILL</code> immediately.</p>
<p>In either case, however, you might want to remove the container, that otherwise will be kept indefinitely by Docker. This can become a problem when containers are named, as you can't reuse a name that is currently assigned to a container.</p>
<p>To remove a container you have to run <code>docker rm ID</code>, but you can leverage the fact that both <code>docker stop</code> and <code>docker kill</code> return the ID of the container to pipe the termination and the removal</p>
<div class="code"><div class="content"><div class="highlight"><pre>$ docker stop ID | xargs docker rm
</pre></div> </div> </div>
<p>Otherwise, you can use <code>docker rm -f ID</code>, which corresponds to <code>docker kill</code> followed by <code>docker rm</code>. If you name a container, however, you can use its name instead of the ID.</p></div></div><hr><p>Now we can connect to the database using the executable <code>psql</code> provided in the image itself. To execute a command inside a container we use <code>docker exec</code> and this time we will specify <code>-it</code> to open an interactive session. <code>psql</code> uses by default the user name <code>root</code>, and the database with the same name as the user, so we need to specify both. The header informs me that the image is running PostgreSQL 13.5 on Debian.</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker exec -it whale-postgres psql -U whale_user whale_db
psql (13.5 (Debian 13.5-1.pgdg110+1))
Type "help" for help.
whale_db=#
</pre></div> </div> </div><div class="infobox"><i class="fa fa-info-circle"></i><div class="title">Postgres trust</div><div><p>You might be surprised by the fact that <code>psql</code> didn't ask for the password that we set when we run the container. This happens because the server trusts local connections, and when we run <code>psql</code> inside the container we are on <code>localhost</code>.</p>
<p>If you are curious about trust in Postgres you can see the configuration file with</p>
<div class="code"><div class="content"><div class="highlight"><pre>$ docker exec -it whale-postgres \
cat /var/lib/postgresql/data/pg_hba.conf
</pre></div> </div> </div>
<p>where you can spot the lines</p>
<div class="code"><div class="content"><div class="highlight"><pre># TYPE DATABASE USER ADDRESS METHOD
# "local" is for Unix domain socket connections only
local all all trust
</pre></div> </div> </div>
<p>You can find more information about Postgres trust in <a href="https://www.postgresql.org/docs/current/auth-trust.html">the official documentation</a>.</p></div></div><p>Here, I can list all the databases with <code>\l</code>. You can see all <code>psql</code> commands and the rest of the documentation at <a href="https://www.postgresql.org/docs/current/app-psql.html">https://www.postgresql.org/docs/current/app-psql.html</a>.</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker exec -it whale-postgres psql -U whale_user whale_db
psql (13.5 (Debian 13.5-1.pgdg110+1))
Type "help" for help.
whale_db=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+------------+----------+------------+------------+---------------------------
postgres | whale_user | UTF8 | en_US.utf8 | en_US.utf8 |
template0 | whale_user | UTF8 | en_US.utf8 | en_US.utf8 | =c/whale_user +
| | | | | whale_user=CTc/whale_user
template1 | whale_user | UTF8 | en_US.utf8 | en_US.utf8 | =c/whale_user +
| | | | | whale_user=CTc/whale_user
whale_db | whale_user | UTF8 | en_US.utf8 | en_US.utf8 |
(4 rows)
whale_db=#
</pre></div> </div> </div><p>As you can see, the database called <code>postgres</code> has been created as part of the initialisation, as clarified previously. You can exit <code>psql</code> with <code>Ctrl-D</code> or <code>\q</code>.</p><hr><p>If we want the database to be accessible from outside we need to publish a port. The image <strong>exposes</strong> port 5432 (see the <a href="https://github.com/docker-library/postgres/blob/master/13/alpine/Dockerfile#L190">source code</a>), which tells us where the server is listening. To <strong>publish</strong> the port towards the host system we can add <code>-p 5432:5432</code>. Please remember that exposing a port in Docker basically means to add some metadata that informs the user of the image, but doesn't affect the way it runs.</p><p>Stop the container (you can use its name now) and run it again with</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker run -d \
--name whale-postgres \
-e POSTGRES_PASSWORD=whale_password \
-e POSTGRES_DB=whale_db \
-e POSTGRES_USER=whale_user \
-p 5432:5432 postgres:13
</pre></div> </div> </div><p>Running <code>docker ps</code> we can see that the container publishes the port now (<code>0.0.0.0:5432->5432/tcp</code>). We can double-check it with <code>ss</code> ("socket statistics")</p><div class="code"><div class="content"><div class="highlight"><pre>$ ss -nulpt | grep 5432
tcp LISTEN 0 4096 0.0.0.0:5432 0.0.0.0:*
tcp LISTEN 0 4096 [::]:5432 [::]:*
</pre></div> </div> </div><p>Please note that usually <code>ss</code> won't tell you the name of the process using that port because the process is run by <code>root</code>. If you run <code>ss</code> with <code>sudo</code> you will see it</p><div class="code"><div class="content"><div class="highlight"><pre>$ sudo ss -nulpt | grep 5432
tcp LISTEN 0 4096 0.0.0.0:5432 0.0.0.0:* users:(("docker-proxy",pid=1262717,fd=4))
tcp LISTEN 0 4096 [::]:5432 [::]:* users:(("docker-proxy",pid=1262724,fd=4))
</pre></div> </div> </div><p>Unfortunately, <code>ss</code> is not available on macOS. On that platform (and on Linux as well) you can use <code>lsof</code> with <code>grep</code></p><div class="code"><div class="content"><div class="highlight"><pre>$ sudo lsof -i -p -n | grep 5432
docker-pr 219643 root 4u IPv4 2945982 0t0 TCP *:5432 (LISTEN)
docker-pr 219650 root 4u IPv6 2952986 0t0 TCP *:5432 (LISTEN)
</pre></div> </div> </div><p>or directly using the option <code>-i</code></p><div class="code"><div class="content"><div class="highlight"><pre>$ sudo lsof -i :5432
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
docker-pr 219643 root 4u IPv4 2945982 0t0 TCP *:postgresql (LISTEN)
docker-pr 219650 root 4u IPv6 2952986 0t0 TCP *:postgresql (LISTEN)
</pre></div> </div> </div><p>Please note that <code>docker-pr</code> in the output above is just <code>docker-proxy</code> truncated, matching what we saw with <code>ss</code> previously.</p><p>If you want to publish the container's port 5432 to a different port on the host you can just use <code>-p ANY_NUMBER:5432</code>. Remember however that port numbers under 1024 are <em>privileged</em> or <em>well-known</em>, which means that they are assigned by default to specific services (<a href="https://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers#Well-known_ports">listed here</a>).</p><p>This means that in theory you can use <code>-p 80:5432</code> for your database container, exposing it on port 80 of your host. In practice this will result in a lot of headaches and a bunch of developers chasing you with spikes and shovels.</p><hr><p>Now that we exposed a port we can connect to the database running <code>psql</code> in an ephemeral container. "Ephemeral" means that a resource (in this case a Docker container) is run just for the time necessary to serve a specific purpose, as opposed to "permanent". This way we can simulate someone that tries to connect to the Docker container from a different computer on the network.</p><p>Since <code>psql</code> is provided by the image <code>postgres</code> we can in theory run that passing the hostname with <code>-h localhost</code>, but if you try it you will be disappointed.</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker run -it postgres:13 psql -h localhost -U whale_user whale_db
psql: error: connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?
connection to server at "localhost" (::1), port 5432 failed: Cannot assign requested address
Is the server running on that host and accepting TCP/IP connections?
</pre></div> </div> </div><p>This is correct, as that container runs in a bridge network where <code>localhost</code> is the container itself. To make it work we need to run the container as part of the host network (that is the same network our computer is running on). This can be done with <code>--network=host</code></p><div class="code"><div class="content"><div class="highlight"><pre>$ docker run -it \
--network=host postgres:13 \
psql -h localhost -U whale_user whale_db
Password for user whale_user:
psql (13.5 (Debian 13.5-1.pgdg110+1))
Type "help" for help.
whale_db=#
</pre></div> </div> </div><p>Please note that now <code>psql</code> asks for a password (that you know because you set it when we run the container <code>whale-postgres</code>). This happens because the tool is not run on the same node as the database server any more, so PostgreSQL doesn't trust it.</p><h2 id="volumes-0cfc">Volumes<a class="headerlink" href="#volumes-0cfc" title="Permanent link">¶</a></h2><p>If we used a structured framework in Python, we could leverage an ORM like SQLAlchemy to map classes to database tables. The model definitions (or changes) can be captured into little scripts called migrations that are applied to the database, and those can also be used to insert some initial data. For this example I will go a simpler route, that is to initialise the database using SQL directly.</p><p>I do not recommend this approach for a real project but it should be good enough in this case. In particular, it will allow me to demonstrate how to use volumes in Docker.</p><p>Make sure the container <code>whale-postgres</code> is running (with or without publishing the port, it's not important at the moment). Connect to the container using <code>psql</code> and run the following two SQL commands (make sure you are connected to the database <code>whale_db</code>)</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">recipes</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="n">recipe_id</span><span class="w"> </span><span class="nb">INT</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span><span class="p">,</span>
<span class="w"> </span><span class="n">recipe_name</span><span class="w"> </span><span class="nb">VARCHAR</span><span class="p">(</span><span class="mi">30</span><span class="p">)</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span><span class="p">,</span>
<span class="w"> </span><span class="k">PRIMARY</span><span class="w"> </span><span class="k">KEY</span><span class="w"> </span><span class="p">(</span><span class="n">recipe_id</span><span class="p">),</span>
<span class="w"> </span><span class="k">UNIQUE</span><span class="w"> </span><span class="p">(</span><span class="n">recipe_name</span><span class="p">)</span>
<span class="p">);</span>
<span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">recipes</span><span class="w"> </span>
<span class="w"> </span><span class="p">(</span><span class="n">recipe_id</span><span class="p">,</span><span class="w"> </span><span class="n">recipe_name</span><span class="p">)</span><span class="w"> </span>
<span class="k">VALUES</span><span class="w"> </span>
<span class="w"> </span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="s1">'Tacos'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="s1">'Tomato Soup'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mi">3</span><span class="p">,</span><span class="s1">'Grilled Cheese'</span><span class="p">);</span>
</pre></div> </div> </div><p>This code creates a table called <code>recipes</code> and inserts 3 rows with an <code>id</code> and a <code>name</code>. The output of the above commands should be</p><div class="code"><div class="content"><div class="highlight"><pre>CREATE TABLE
INSERT 0 3
</pre></div> </div> </div><p>You can double check that the database contains the table with <code>\dt</code></p><div class="code"><div class="content"><div class="highlight"><pre>whale_db=# \dt
List of relations
Schema | Name | Type | Owner
--------+---------+-------+------------
public | recipes | table | whale_user
(1 row)
</pre></div> </div> </div><p>and that the table contains three rows with a <code>select</code>.</p><div class="code"><div class="content"><div class="highlight"><pre>whale_db=# select * from recipes;
recipe_id | recipe_name
-----------+----------------
1 | Tacos
2 | Tomato Soup
3 | Grilled Cheese
(3 rows)
</pre></div> </div> </div><p>Now, the problem with containers is that they do not store data permanently. While the container is running there are no issues, as a matter of fact you can terminate <code>psql</code>, connect, and run the <code>select</code> again, and you will see the same data.</p><p>If we stop the container and run it again, though, we will quickly realise that the values stored in the database are gone.</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker stop whale-postgres | xargs docker rm
whale-postgres
$ docker run -d \
--name whale-postgres \
-e POSTGRES_PASSWORD=whale_password \
-e POSTGRES_DB=whale_db \
-e POSTGRES_USER=whale_user \
-p 5432:5432 postgres:13
4a647ebef78e32bb4733484a6e435780e17a69b643e872613ca50115d60d54ce
$ docker exec -it whale-postgres \
psql -U whale_user whale_db -c "select * from recipes"
ERROR: relation "recipes" does not exist
LINE 1: select * from recipes
^
</pre></div> </div> </div><hr><p>Containers have been created with isolation in mind, which is why by default nothing of what happens inside the container is connected with the host and is preserved when the container is destroyed.</p><p>As happened with ports, however, we need to establish some communication between containers and the host system, and we also want to keep data after the container has been destroyed. The solution in Docker is to use volumes.</p><p>There are three types of volumes in Docker: <em>host</em>, <em>anonymous</em>, and <em>named</em>. Host volumes are a way to mount inside the container a path on the host's filesystem, and while they are useful to exchange data between the host and the container, they also often have permissions issues. Generally speaking, containers define users whose IDs are not mapped to the host's ones, which means that the files written by the container might end up belonging to non-existing users.</p><p>Anonymous and named volumes are simply virtual filesystems created and managed independently from containers. These can be connected with a running container so the latter can use the data contained in them and store data that will survive its termination. The only difference between named an anonymous volumes is the name that allows you to easily manage them. For this reason, I think it's not really useful to consider anonymous volumes, which is why I will focus on named ones.</p><p>You can manage volumes using <code>docker volume</code>, that provides several subcommands such as <code>create</code>, and <code>rm</code>. You can then <a href="https://docs.docker.com/engine/reference/run/#volume-shared-filesystems">attach a named volume to a container</a> when you run it using the option <code>-v</code> of <code>docker run</code>. This creates the volume if it's not already existing, so this is the standard way many of us create a volume.</p><p>Stop and remove the running Postgres container and run it again with a named volume</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker stop whale-postgres | xargs docker rm
$ docker run -d \
--name whale-postgres \
-e POSTGRES_PASSWORD=whale_password \
-e POSTGRES_DB=whale_db \
-e POSTGRES_USER=whale_user \
-p 5432:5432 \
-v whale_dbdata:/var/lib/postgresql/data \
postgres:13
</pre></div> </div> </div><p>This will create the volume named <code>whale_dbdata</code> and connect it to the path <code>/var/lib/postgresql/data</code> in the container that we are running. That path happens to be the one where Postgres stores the actual database, as you can see from <a href="https://www.postgresql.org/docs/current/storage-file-layout.html">the official documentation</a>. There is a specific reason why I used the prefix <code>whale_</code> for the name of the volume, which will be clear later when we will introduce Docker Compose.</p><p><code>docker ps</code> doesn't give any information on volumes, so to see what is connected to your container you need to use <code>docker inspect</code></p><div class="code"><div class="content"><div class="highlight"><pre>$ docker inspect whale-postgres
[...]
"Mounts": [
{
"Type": "volume",
"Name": "whale_dbdata",
"Source": "/var/lib/docker/volumes/whale_dbdata/_data",
"Destination": "/var/lib/postgresql/data",
"Driver": "local",
"Mode": "z",
"RW": true,
"Propagation": ""
}
],
[...]
</pre></div> </div> </div><p>The value for <code>"Source"</code> is where the volume is stored in the host, that is on your computer, but generally speaking you can ignore that detail. You can see all volumes using <code>docker volume ls</code> (using <code>grep</code> if the list is long as it is in my case)</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker volume ls | grep whale
local whale_dbdata
</pre></div> </div> </div><p>Now that the container is running and is connected to a volume, we can try to initialise the database again. Connect with <code>psql</code> using the command line we developed before and run the SQL commands that create the table <code>recipes</code> and insert three rows.</p><p>The whole point of using a volume is to make information permanent, so now terminate and remove the Postgres container, and run it again using the same volume. You can check that the database still contains data using the query shown previously.</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker rm -f whale-postgres
whale-postgres
$ docker run -d \
--name whale-postgres \
-e POSTGRES_PASSWORD=whale_password \
-e POSTGRES_DB=whale_db \
-e POSTGRES_USER=whale_user \
-p 5432:5432 \
-v whale_dbdata:/var/lib/postgresql/data \
postgres:13
893378f044204e5c1a87473a038b615a08ad08e5da9225002a470caeac8674a8
$ docker exec -it whale-postgres \
psql -U whale_user whale_db \
-c "select * from recipes"
recipe_id | recipe_name
-----------+----------------
1 | Tacos
2 | Tomato Soup
3 | Grilled Cheese
(3 rows)
</pre></div> </div> </div><h2 id="python-application-4d3a">Python application<a class="headerlink" href="#python-application-4d3a" title="Permanent link">¶</a></h2><p>Great! Now that we have a database that can be restarted without losing data we can create a Python application that interacts with it. Again, please remember that the goal of this post is to show what container orchestration is and how Docker compose can simplify it, so the application developed in this section is absolutely minimal.</p><p>I will first create an application and run it in the host, leveraging the port exposed by the container to connect to the database. Later, I will move the application in its own container.</p><p>To create the application, first create a Python virtual environment using your preferred method. I currently use <code>pyenv</code> (<a href="https://github.com/pyenv/pyenv">https://github.com/pyenv/pyenv</a>).</p><div class="code"><div class="content"><div class="highlight"><pre>pyenv virtualenv whale_docker
pyenv activate whale_docker
</pre></div> </div> </div><p>Now we need to put our requirements in a file and install them. I prefer to keep things tidy from day zero, so create the directory <code>whaleapp</code> in the project directory and inside it the file <code>requirements.txt</code>.</p><div class="code"><div class="content"><div class="highlight"><pre>mkdir whaleapp
touch whaleapp/requirements.txt
</pre></div> </div> </div><p>The only requirement we have for this simple application is <code>psycopg2</code>, so I add it to the file and then install it. Since we are installing requirements is useful to update <code>pip</code> as well.</p><div class="code"><div class="content"><div class="highlight"><pre>echo "psycopg2" >> whaleapp/requirements.txt
pip install -U pip
pip install -r whaleapp/requirements.txt
</pre></div> </div> </div><hr><p>Now create the file <code>whaleapp/whaleapp.py</code> and put this code in it</p><div class="code"><div class="title">whaleapp/whaleapp.py</div><div class="content"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">time</span>
<span class="kn">import</span> <span class="nn">psycopg2</span>
<span class="n">connection_data</span> <span class="o">=</span> <span class="p">{</span> <span class="callout">1</span>
<span class="s2">"host"</span><span class="p">:</span> <span class="s2">"localhost"</span><span class="p">,</span>
<span class="s2">"database"</span><span class="p">:</span> <span class="s2">"whale_db"</span><span class="p">,</span>
<span class="s2">"user"</span><span class="p">:</span> <span class="s2">"whale_user"</span><span class="p">,</span>
<span class="s2">"password"</span><span class="p">:</span> <span class="s2">"whale_password"</span><span class="p">,</span>
<span class="p">}</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">conn</span> <span class="o">=</span> <span class="kc">None</span>
<span class="c1"># Connect to the PostgreSQL server</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Connecting to the PostgreSQL database..."</span><span class="p">)</span>
<span class="n">conn</span> <span class="o">=</span> <span class="n">psycopg2</span><span class="o">.</span><span class="n">connect</span><span class="p">(</span><span class="o">**</span><span class="n">connection_data</span><span class="p">)</span> <span class="callout">2</span>
<span class="c1"># Create a cursor</span>
<span class="n">cur</span> <span class="o">=</span> <span class="n">conn</span><span class="o">.</span><span class="n">cursor</span><span class="p">()</span>
<span class="c1"># Execute the query</span>
<span class="n">cur</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="s2">"select * from recipes"</span><span class="p">)</span> <span class="callout">3</span>
<span class="c1"># Fetch all results</span>
<span class="n">results</span> <span class="o">=</span> <span class="n">cur</span><span class="o">.</span><span class="n">fetchall</span><span class="p">()</span>
<span class="nb">print</span><span class="p">(</span><span class="n">results</span><span class="p">)</span> <span class="callout">4</span>
<span class="c1"># Close the connection</span>
<span class="n">cur</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
<span class="k">except</span> <span class="p">(</span><span class="ne">Exception</span><span class="p">,</span> <span class="n">psycopg2</span><span class="o">.</span><span class="n">DatabaseError</span><span class="p">)</span> <span class="k">as</span> <span class="n">error</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="n">error</span><span class="p">)</span>
<span class="k">finally</span><span class="p">:</span>
<span class="k">if</span> <span class="n">conn</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
<span class="n">conn</span><span class="o">.</span><span class="n">close</span><span class="p">()</span> <span class="callout">5</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Database connection closed."</span><span class="p">)</span>
<span class="c1"># Wait three seconds</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span>
</pre></div> </div> </div><p>As you can see the code is not complicated. The application is an endless <code>while</code> loop that every 3 seconds establishes a connection with the DB <span class="callout">2</span> using the configuration in <span class="callout">1</span>. After this, the query <code>select * from recipes</code> is run <span class="callout">3</span> , all the results are printed on the standard output <span class="callout">4</span>, and the connection is closed <span class="callout">5</span>.</p><p>If the Postgres container is running and publishing port 5432, this application can be run directly on the host</p><div class="code"><div class="content"><div class="highlight"><pre>$ python whaleapp.py
Connecting to the PostgreSQL database...
[(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
Database connection closed.
Connecting to the PostgreSQL database...
[(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
Database connection closed.
</pre></div> </div> </div><p>and will go on indefinitely until we press <code>Ctrl-C</code> to stop it.</p><hr><p>For the same reasons of isolation and security that we discussed previously, we want to run the application in a Docker container. This can be done pretty easily, but we will run into the same issues that we had when we where trying to run <code>psql</code> in a separate container. At the moment, the application tries to connect to the database on <code>localhost</code>, which is fine while the application is running on the host directly, but won't work any more once that is transported into a Docker container.</p><p>To face one problem at a time, let's first containerise the application and run it using the <code>host</code> network. Once this works, we can see how to solve the communication problem between containers.</p><p>The easiest way to containerise a Python application is to create a new image starting from the image <code>python:3</code>. The following <code>Dockerfile</code> goes into the application directory</p><div class="code"><div class="title"><code>whaleapp/Dockerfile</code></div><div class="content"><div class="highlight"><pre><span class="k">FROM</span><span class="w"> </span><span class="s">python:3</span> <span class="callout">1</span>
<span class="k">WORKDIR</span><span class="w"> </span><span class="s">/usr/src/app</span> <span class="callout">2</span>
<span class="k">COPY</span><span class="w"> </span>requirements.txt<span class="w"> </span>. <span class="callout">3</span>
<span class="k">RUN</span><span class="w"> </span>pip<span class="w"> </span>install<span class="w"> </span>--no-cache-dir<span class="w"> </span>-r<span class="w"> </span>requirements.txt <span class="callout">4</span>
<span class="k">COPY</span><span class="w"> </span>.<span class="w"> </span>. <span class="callout">5</span>
<span class="k">CMD</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="s2">"python"</span><span class="p">,</span><span class="w"> </span><span class="s2">"-u"</span><span class="p">,</span><span class="w"> </span><span class="s2">"./whaleapp.py"</span><span class="w"> </span><span class="p">]</span> <span class="callout">6</span>
</pre></div> </div> </div><p>A Docker file contains the description of the layers that build an image. Here, we start from the official Python 3 image <span class="callout">1</span> (<a href="https://hub.docker.com/_/python">https://hub.docker.com/_/python</a>), set a working directory <span class="callout">2</span>, copy the requirements file <span class="callout">3</span> and install the requirements <span class="callout">4</span>, then copy the rest of the application <span class="callout">5</span>, and run the application <span class="callout">6</span>. The Python option <code>-u</code> avoids output buffering, see <a href="https://docs.python.org/3/using/cmdline.html#cmdoption-u">https://docs.python.org/3/using/cmdline.html#cmdoption-u</a>.</p><p>It is important to keep in mind the layered nature of Docker images, as this can lead to simple optimisation tricks. In this case, loading the requirements file and installing them creates a layer out of a file that doesn't change very often, while the layer created at <span class="callout">5</span> is probably changing very quickly while we develop the application. If we run something like</p><div class="code"><div class="content"><div class="highlight"><pre><span class="o">[</span>...<span class="o">]</span>
<span class="k">COPY</span><span class="w"> </span>.<span class="w"> </span>.
<span class="k">RUN</span><span class="w"> </span>pip<span class="w"> </span>install<span class="w"> </span>--no-cache-dir<span class="w"> </span>-r<span class="w"> </span>requirements.txt
<span class="k">CMD</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="s2">"python"</span><span class="p">,</span><span class="w"> </span><span class="s2">"-u"</span><span class="p">,</span><span class="w"> </span><span class="s2">"./app.py"</span><span class="w"> </span><span class="p">]</span>
</pre></div> </div> </div><p>we would have to install the requirements every time we change the application code, as this would rebuild the <code>COPY</code> layer and thus invalidate the layer containing the <code>RUN</code> command.</p><p>Once the <code>Dockerfile</code> is in place we can build the image</p><div class="code"><div class="content"><div class="highlight"><pre>$ cd whaleapp
$ docker build -t whaleapp .
Sending build context to Docker daemon 6.144kB
Step 1/6 : FROM python:3
---> 768307cdb962
Step 2/6 : WORKDIR /usr/src/app
---> Using cache
---> b00189756ddb
Step 3/6 : COPY requirements.txt .
---> a7aef12f562c
Step 4/6 : RUN pip install --no-cache-dir -r requirements.txt
---> Running in 153a3ca6a1b2
Collecting psycopg2
Downloading psycopg2-2.9.3.tar.gz (380 kB)
Building wheels for collected packages: psycopg2
Building wheel for psycopg2 (setup.py): started
Building wheel for psycopg2 (setup.py): finished with status 'done'
Created wheel for psycopg2: filename=psycopg2-2.9.3-cp39-cp39-linux_x86_64.whl size=523502 sha256=1a3aac3cf72cc86b63a3e0f42b9b788c5237c3e5d23df649ca967b29bf89ecf5
Stored in directory: /tmp/pip-ephem-wheel-cache-ow3d1yop/wheels/b3/a1/6e/5a0e26314b15eb96a36263b80529ce0d64382540ac7b9544a9
Successfully built psycopg2
Installing collected packages: psycopg2
Successfully installed psycopg2-2.9.3
WARNING: You are using pip version 20.2.4; however, version 21.3.1 is available.
You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.
Removing intermediate container 153a3ca6a1b2
---> b18aead1ef15
Step 5/6 : COPY . .
---> be7c3c11e608
Step 6/6 : CMD [ "python", "-u", "./app.py" ]
---> Running in 9e2f4f30b59e
Removing intermediate container 9e2f4f30b59e
---> b735eece4f86
Successfully built b735eece4f86
Successfully tagged whaleapp:latest
</pre></div> </div> </div><p>You can see the layers being built one by one (marked as <code>Step x/6</code> here). Once the image has been build you should be able to see it in the list of images present in your system</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker image ls | grep whale
whaleapp latest 969b15466905 9 minutes ago 894MB
</pre></div> </div> </div><div class="infobox"><i class="fa fa-info-circle"></i><div class="title">Size of containers</div><div><p>You might want to observe 1 minute of silence meditating on the fact that we used almost 900 megabytes of space to run 40 lines of Python. As you can see benefits come with a cost, and you should not underestimate those. 900 megabytes might not seem a lot nowadays, but if you keep building images you will soon use up the space on your hard drive or end up paying a lot for the space on your remote repository.</p>
<p>By the way, this is the reason why Docker splits image into layers and reuses them. For now we can ignore this part of the game, but remember that keeping the system clean and removing past artefacts is important.</p></div></div><p>As I mentioned before we can run this image but we need to use the <code>host</code> network configuration.</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker run -it --rm --network=host --name whale-app whaleapp
Connecting to the PostgreSQL database...
[(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
Database connection closed.
Connecting to the PostgreSQL database...
[(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
Database connection closed.
</pre></div> </div> </div><p>Please note that I used <code>--rm</code> to make Docker remove the container automatically when it is terminated. This way I can run it again with the same name without having to explicitly remove the past container with <code>docker rm</code>.</p>
<div class="advertisement">
<a href="https://www.thedigitalcat.academy/freebie-first-class-objects">
<img src="/images/first-class-objects/cover.jpg" />
</a>
<div class="body">
<h2 id="first-class-objects-in-python-fffa">First-class objects in Python<a class="headerlink" href="#first-class-objects-in-python-fffa" title="Permanent link">¶</a></h2>
<p>Higher-order functions, wrappers, and factories</p>
<p>Learn all you need to know to understand first-class citizenship in Python, the gateway to grasp how decorators work and how functional programming can supercharge your code.</p>
<div class="actions">
<a class="action" href="https://www.thedigitalcat.academy/freebie-first-class-objects">Get your FREE copy</a>
</div>
</div>
</div>
<h2 id="run-containers-in-the-same-network-deb7">Run containers in the same network<a class="headerlink" href="#run-containers-in-the-same-network-deb7" title="Permanent link">¶</a></h2><p>Docker containers are isolated from the host and from other containers by default. This however doesn't mean that they can't communicate with each other if we run them in a specific configuration. In particular, an important part in Docker networking is played by bridge networks.</p><p>Whenever containers are run in the same custom bridge network, Docker provides them DNS resolution using the container names. This means that we can make the application communicate with the database without having to run the former in the host network.</p><p>A custom network can be created using <code>docker network</code></p><div class="code"><div class="content"><div class="highlight"><pre>$ docker network create whale
</pre></div> </div> </div><p>As always, Docker will return the ID of the object it just created, but we can ignore it for now, as we can refer to the network by name.</p><p>Stop and remove the Postgres container, and run it again using the network <code>whale</code></p><div class="code"><div class="content"><div class="highlight"><pre>$ docker rm -f whale-postgres
whale-postgres
$ docker run -d \
--name whale-postgres \
-e POSTGRES_PASSWORD=whale_password \
-e POSTGRES_DB=whale_db \
-e POSTGRES_USER=whale_user \
--network=whale \
-v whale_dbdata:/var/lib/postgresql/data \
postgres:13
</pre></div> </div> </div><p>Please note that there is no need to publish the port 5432 in this setup, as the host doesn't need to access the container. Should this be a requirement, add the option <code>-p 5432:5432</code> again.</p><p>As happened with volumes, <code>docker ps</code> doesn't give information about the network that containers are using, so you have to use <code>docker inspect</code> again</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker inspect whale-postgres
[...]
"NetworkSettings": {
"Networks": {
"whale": {
[...]
</pre></div> </div> </div><div class="infobox"><i class="fa fa-info-circle"></i><div class="title">Docker network management</div><div><p>The command <code>docker network</code> can be used to change the network configuration of <em>running</em> containers.</p>
<p>You can disconnect a running container from a network with</p>
<div class="code"><div class="content"><div class="highlight"><pre>$ docker network disconnect NETWORK_ID CONTAINER_ID
</pre></div> </div> </div>
<p>and connect it with</p>
<div class="code"><div class="content"><div class="highlight"><pre>$ docker network connect NETWORK_ID CONTAINER_ID
</pre></div> </div> </div>
<p>You can see which containers are using a given network inspecting it</p>
<div class="code"><div class="content"><div class="highlight"><pre>$ docker network inspect NETWORK_ID
</pre></div> </div> </div>
<p>Remember that disconnecting a container from a network makes it unreachable, so while it is good that we can do this on running containers, maintenance shall be always carefully planned to avoid unexpected downtime.</p></div></div><p>As I mentioned before, Docker bridge networks provide DNS resolution using the container's name. We can double check this running a container and using <code>ping</code>.</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker run -it --rm --network=whale whaleapp ping whale-postgres
PING whale-postgres (172.19.0.2) 56(84) bytes of data.
64 bytes from whale-postgres.whale (172.19.0.2): icmp_seq=1 ttl=64 time=0.064 ms
64 bytes from whale-postgres.whale (172.19.0.2): icmp_seq=2 ttl=64 time=0.100 ms
64 bytes from whale-postgres.whale (172.19.0.2): icmp_seq=3 ttl=64 time=0.115 ms
64 bytes from whale-postgres.whale (172.19.0.2): icmp_seq=4 ttl=64 time=0.101 ms
^C
--- whale-postgres ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 80ms
rtt min/avg/max/mdev = 0.064/0.095/0.115/0.018 ms
</pre></div> </div> </div><p>What I did here was to run the image <code>whaleapp</code> that we built previously, but overriding the default command and running <code>ping whale-postgres</code> instead. This is a good way to check if a host can resolve a name on the network (<code>dig</code> is another useful tool but is not installed by default in that image).</p><p>As you can see the Postgres container is reachable and we also know that it currently runs with the IP <code>172.19.0.2</code>. This value might be different on your system, but it will match the information you get if you run <code>docker network inspect whale</code>.</p><p>The point of all this talk about DNS is that we can now change the code of the Python application so that it connects to <code>whale-postgres</code> instead of <code>localhost</code></p><div class="code"><div class="content"><div class="highlight"><pre><span class="n">connection_data</span> <span class="o">=</span> <span class="p">{</span>
<span class="hll"> <span class="s2">"host"</span><span class="p">:</span> <span class="s2">"whale-postgres"</span><span class="p">,</span>
</span> <span class="s2">"database"</span><span class="p">:</span> <span class="s2">"whale_db"</span><span class="p">,</span>
<span class="s2">"user"</span><span class="p">:</span> <span class="s2">"whale_user"</span><span class="p">,</span>
<span class="s2">"password"</span><span class="p">:</span> <span class="s2">"whale_password"</span><span class="p">,</span>
<span class="p">}</span>
</pre></div> </div> </div><p>Once this is done, rebuild the image and run it in the <code>whale</code> network</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker build -t whaleapp .
[...]
$ docker run -it --rm --network=whale --name whale-app whaleapp
Connecting to the PostgreSQL database...
[(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
Database connection closed.
Connecting to the PostgreSQL database...
[(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
Database connection closed.
</pre></div> </div> </div><p>You can also take the network directly from another container, which is a useful shortcut.</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker build -t whaleapp .
[...]
$ docker run -it --rm \
--network=container:whale-postgres \
--name whale-app whaleapp
Connecting to the PostgreSQL database...
[(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
Database connection closed.
Connecting to the PostgreSQL database...
[(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
Database connection closed.
</pre></div> </div> </div><h2 id="run-time-configuration-7c07">Run time configuration<a class="headerlink" href="#run-time-configuration-7c07" title="Permanent link">¶</a></h2><p>Hardcoding configuration values into the application is never a great idea, and while this is a very simple example it is worth pushing the setup a bit further to make it tidy.</p><p>In particular, we can replace the connection data <code>host</code>, <code>database</code>, and <code>user</code> with environment variables, which allow us to reuse the application configuring it at run time. For simplicity's sake I will store the password in an environment variable as well, and pass it in clear text when we run the container. See the box for more information about how to manage secret values.</p><p>Reading values from environment variables is easy in Python</p><div class="code"><div class="content"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">time</span>
<span class="kn">import</span> <span class="nn">psycopg2</span>
<span class="n">DB_HOST</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"WHALEAPP__DB_HOST"</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span>
<span class="n">DB_NAME</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"WHALEAPP__DB_NAME"</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span>
<span class="n">DB_USER</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"WHALEAPP__DB_USER"</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span>
<span class="n">DB_PASSWORD</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"WHALEAPP__DB_PASSWORD"</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span>
<span class="n">connection_data</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">"host"</span><span class="p">:</span> <span class="n">DB_HOST</span><span class="p">,</span>
<span class="s2">"database"</span><span class="p">:</span> <span class="n">DB_NAME</span><span class="p">,</span>
<span class="s2">"user"</span><span class="p">:</span> <span class="n">DB_USER</span><span class="p">,</span>
<span class="s2">"password"</span><span class="p">:</span> <span class="n">DB_PASSWORD</span><span class="p">,</span>
<span class="p">}</span>
</pre></div> </div> </div><p>Please note that I prefixed all environment variables with <code>WHALEAPP__</code>. This is not mandatory, and has no special meaning for the operating system. In my experience, complicated systems can have many environment variables, and using prefixes is a simple and effective way to keep track of which part of the system needs that particular value.</p><p>We already know how to pass environment variables to Docker containers as we did it when we run the Postgres container. Build the image again, and then run it passing the correct variables</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker build -t whaleapp .
[...]
$ docker run -it --rm --network=whale \
-e WHALEAPP__DB_HOST=whale-postgres \
-e WHALEAPP__DB_NAME=whale_db \
-e WHALEAPP__DB_USER=whale_user \
-e WHALEAPP__DB_PASSWORD=password \
--name whale-app whaleapp
Connecting to the PostgreSQL database...
[(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
Database connection closed.
Connecting to the PostgreSQL database...
[(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
Database connection closed.
</pre></div> </div> </div><div class="infobox"><i class="fa fa-info-circle"></i><div class="title">Managing secrets</div><div><p>A "secret" is a value that should never be shown in plain text, as it is used to grant access to a system. This can be a password or a private key such as the ones you have to run SSH, and as happens with everything related to security, managing them is complicated. Please keep in mind that security is hard and that the best attitude to have is: <em>every time you think something in security is straightforward this means you got it wrong</em>.</p>
<p>Generally speaking, you want secrets to be encrypted and stored in a safe place where access is granted to a narrow set of people. These secrets should be accessible to your application in a secure way, and it shouldn't be possible to access the secrets hosted in the memory of the application.</p>
<p>For example, many posts online show how you can use AWS Secrets Manager to store your secrets and access them from your application using <a href="https://stedolan.github.io/jq/">jq</a> to fetch them at run time. While this works, if the JSON secret contains a syntax error, <code>jq</code> dumps the whole value in the standard output of the application, which means that the logs contain the secret in plain text.</p>
<p><a href="https://hub.docker.com/_/vault">Vault</a> is a tool created by Hashicorp that many use to store secrets needed by containers. It is interesting to read in the description of the image that with a specific configuration the container prevents memory from being swapped to disk, which would leak the unencrypted values. As you see, security is hard.</p>
<p>Orchestration tools always provide a way to manage secrets and to pass them to containers. For example, see <a href="https://docs.docker.com/engine/swarm/secrets/">Docker Swarm secrets</a>, <a href="https://kubernetes.io/docs/concepts/configuration/secret/">Kubernetes secrets</a>, and <a href="https://docs.aws.amazon.com/AmazonECS/latest/developerguide/specifying-sensitive-data-secrets.html">secrets for AWS Elastic Container Service</a>.</p></div></div><h2 id="enter-docker-compose-58a7">Enter Docker Compose<a class="headerlink" href="#enter-docker-compose-58a7" title="Permanent link">¶</a></h2><p>The setup we created in the past sections is good, but is far from being optimal. We had to create a custom bridge network and then start the Postgres and the application containers connected to it. To stop the system we need to terminate containers manually and to remember to remove them to avoid blocking the container name. We also have to manually remove the network if we want to keep the system clean.</p><p>The next step would then be to create a bash script, then to evolve it to a Makefile or similar solution. Fortunately, Docker provides a better solution with Docker Compose.</p><p>Docker Compose can be described as a single-host orchestration tool. Orchestration tools are pieces of software that allow us to deal with the problems described previously, such as starting and terminating multiple containers, creating networks and volumes, managing secrets, and so on. Docker Compose works in a single-host mode, so it's a great solution for development environment, while for production multi-host environments it's better to move to more advanced tools such as AWS ECS or Kubernetes.</p><p>Docker Compose reads the configuration of a system from the file <code>docker-compose.yml</code> (the default value, it can be changed) that captures all we did manually in the previous sections in a compact and readable way.</p><p>To install Docker Compose follow the instructions you find at <a href="https://docs.docker.com/compose/install/">https://docs.docker.com/compose/install/</a>. Before we start using Docker Compose make sure you kill the Postgres container if you are still running it, and remove the network we created</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker rm -f whale-postgres
whale-postgres
$ docker network remove whale
whale
</pre></div> </div> </div><p>Then create the file <code>docker-compose.yml</code> in the project directory (not the app directory) and put the following code in it</p><div class="code"><div class="title"><code>docker-compose.yml</code></div><div class="content"><div class="highlight"><pre><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="s">'3.8'</span>
<span class="nt">services</span><span class="p">:</span>
</pre></div> </div> </div><p>This is not a valid Docker Compose file, yet, but you can see that there is a value that specifies the syntax version and one that lists services. You can find the Compose file reference at <a href="https://docs.docker.com/compose/compose-file/">https://docs.docker.com/compose/compose-file/</a>, together with a detailed description of the various versions.</p><p>The first service we want to run is Postgres, and a basic configuration for that is</p><div class="code"><div class="title"><code>docker-compose.yml</code></div><div class="content"><div class="highlight"><pre><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="s">'3.8'</span>
<span class="nt">services</span><span class="p">:</span>
<span class="w"> </span><span class="nt">db</span><span class="p">:</span>
<span class="w"> </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">postgres:13</span>
<span class="w"> </span><span class="nt">environment</span><span class="p">:</span>
<span class="w"> </span><span class="nt">POSTGRES_DB</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">whale_db</span>
<span class="w"> </span><span class="nt">POSTGRES_PASSWORD</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">whale_password</span>
<span class="w"> </span><span class="nt">POSTGRES_USER</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">whale_user</span>
<span class="w"> </span><span class="nt">volumes</span><span class="p">:</span> <span class="callout">2</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">dbdata:/var/lib/postgresql/data</span>
<span class="nt">volumes</span><span class="p">:</span> <span class="callout">1</span>
<span class="w"> </span><span class="nt">dbdata</span><span class="p">:</span>
</pre></div> </div> </div><p>As you can see, this file contains the environment variables that we passed to the Postgres container and the volume configuration. The final <code>volumes</code> <span class="callout">1</span> declares which volumes have to be present (so it creates them if they are not), while <code>volumes</code> <span class="callout">2</span> inside the service <code>db</code> creates the connection just like the option <code>-v</code> did previously.</p><p>Now, from the project directory, you can run Docker Compose with</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker-compose -p whale up -d
Creating network "whale_default" with the default driver
Creating whale_db_1 ... done
</pre></div> </div> </div><p>The option <code>-p</code> sets the name of the project, which otherwise would be by default that of the directory you are at the moment (which might or might not be meaningful), while the command <code>up -d</code> starts all the containers in a detached mode.</p><p>As you can see from the output, Docker Compose creates a (bridge) network called <code>whale_default</code>. Normally, you would see a message like <code>Creating volume "whale_dbdata" with default driver</code> as well, but in this case the volume is already present as we created it previously. Both the network and the volume are prefixed with <code>PROJECTNAME_</code>, and this is the reason why when we first created the volume I named it <code>whale_dbdata</code>. Keep in mind however that all these default behaviours can be customised in the Compose file.</p><p>If you run <code>docker ps</code> you will see that the container is named <code>whale_db_1</code>. This comes from the project name (<code>whale_</code>), the service name in the Compose file (<code>db_</code>) and the container number, which is 1 because at the moment we are running only one container for that service.</p><p>To stop the services you have to run</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker-compose -p whale down
Stopping whale_db_1 ... done
Removing whale_db_1 ... done
Removing network whale_default
</pre></div> </div> </div><p>As you can see from the output, Docker Compose stops and removes the container, then removes the network. This is very convenient, as it already removes a lot of the work we had to do manually earlier.</p><hr><p>We can now add the application container to the Compose file</p><div class="code"><div class="content"><div class="highlight"><pre><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="s">'3.8'</span>
<span class="nt">services</span><span class="p">:</span>
<span class="w"> </span><span class="nt">db</span><span class="p">:</span>
<span class="w"> </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">postgres:13</span>
<span class="w"> </span><span class="nt">environment</span><span class="p">:</span>
<span class="w"> </span><span class="nt">POSTGRES_DB</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">whale_db</span>
<span class="w"> </span><span class="nt">POSTGRES_PASSWORD</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">whale_password</span>
<span class="w"> </span><span class="nt">POSTGRES_USER</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">whale_user</span>
<span class="w"> </span><span class="nt">volumes</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">dbdata:/var/lib/postgresql/data</span>
<span class="hll"><span class="w"> </span><span class="nt">app</span><span class="p">:</span>
</span><span class="hll"><span class="w"> </span><span class="nt">build</span><span class="p">:</span>
</span><span class="hll"><span class="w"> </span><span class="nt">context</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">whaleapp</span>
</span><span class="hll"><span class="w"> </span><span class="nt">dockerfile</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Dockerfile</span>
</span><span class="hll"><span class="w"> </span><span class="nt">environment</span><span class="p">:</span>
</span><span class="hll"><span class="w"> </span><span class="nt">WHALEAPP__DB_HOST</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">db</span>
</span><span class="hll"><span class="w"> </span><span class="nt">WHALEAPP__DB_NAME</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">whale_db</span>
</span><span class="hll"><span class="w"> </span><span class="nt">WHALEAPP__DB_USER</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">whale_user</span>
</span><span class="hll"><span class="w"> </span><span class="nt">WHALEAPP__DB_PASSWORD</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">whale_password</span>
</span>
<span class="nt">volumes</span><span class="p">:</span>
<span class="w"> </span><span class="nt">dbdata</span><span class="p">:</span>
</pre></div> </div> </div><p>This definition is slightly different, as the application container has to be built using the Dockerfile we created. Docker Compose allows us to store here the build configuration so that we don't need to pass al the options to <code>docker build</code> manually, but please note that configuring the build here doesn't mean that Docker Compose will build the image for you every time. You still need to run <code>docker-compose -p whale build</code> every time you need to rebuild it. </p><p>Please note that the variable <code>WHALEAPP__DB_HOST</code> is set to the service name, and not to the container name. Now, when we run Docker Compose we get</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker-compose -p whale up -d
Creating network "whale_default" with the default driver
Creating whale_db_1 ... done
Creating whale_app_1 ... done
</pre></div> </div> </div><p>and the output tells us that also the container <code>whale_app_1</code> has been created this time. We can see the logs of a container with <code>docker logs</code>, but using <code>docker-compose</code> allows us to call services by name instead of by ID</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker-compose -p whale logs -f app
Attaching to whale_app_1
app_1 | Connecting to the PostgreSQL database...
app_1 | [(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
app_1 | Database connection closed.
app_1 | Connecting to the PostgreSQL database...
app_1 | [(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
app_1 | Database connection closed.
</pre></div> </div> </div><h2 id="health-checks-and-dependencies-bc9b">Health checks and dependencies<a class="headerlink" href="#health-checks-and-dependencies-bc9b" title="Permanent link">¶</a></h2><p>You might have noticed that at the very beginning of the application logs there are some connection errors, and that after a while the application manages to connect to the database</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker-compose -p whale logs -f app
Attaching to whale_app_1
app_1 | Connecting to the PostgreSQL database...
app_1 | could not translate host name "db" to address: Name or service not known
app_1 |
app_1 | Connecting to the PostgreSQL database...
app_1 | could not translate host name "db" to address: Name or service not known
app_1 |
app_1 | Connecting to the PostgreSQL database...
app_1 | Connecting to the PostgreSQL database...
app_1 | could not connect to server: Connection refused
app_1 | Is the server running on host "db" (172.31.0.3) and accepting
app_1 | TCP/IP connections on port 5432?
app_1 |
app_1 | Connecting to the PostgreSQL database...
app_1 | [(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
app_1 | Database connection closed.
app_1 | Connecting to the PostgreSQL database...
app_1 | [(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
app_1 | Database connection closed.
</pre></div> </div> </div><p>These errors come from the fact that the application container is up and running before the database is ready to serve connections. In a production setup this usually doesn't happen because the database is up and running much before the application gets deployed for the first time, and then runs (hopefully) without interruption. In a development environment, instead, such a situation is normal.</p><p>Please note that this might not happen in your setup, as this is tightly connected with the speed of Docker Compose and the containers. Time-sensitive bugs are one of the worst types to deal with, and this is the reason why managing distributed systems is hard. It is important that you realise that even though this might work now on your system, the problem is there and we need to find a solution.</p><p>The standard solution when part of a system depends on another is to create a <em>health check</em> that periodically tests the first service, and to start the second service only when the check is successful. We can do this in the Compose file using <code>healthcheck</code> and <code>depends_on</code></p><div class="code"><div class="content"><div class="highlight"><pre><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="s">'3.8'</span>
<span class="nt">services</span><span class="p">:</span>
<span class="w"> </span><span class="nt">db</span><span class="p">:</span>
<span class="w"> </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">postgres:13</span>
<span class="w"> </span><span class="nt">environment</span><span class="p">:</span>
<span class="w"> </span><span class="nt">POSTGRES_DB</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">whale_db</span>
<span class="w"> </span><span class="nt">POSTGRES_PASSWORD</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">whale_password</span>
<span class="w"> </span><span class="nt">POSTGRES_USER</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">whale_user</span>
<span class="w"> </span><span class="nt">volumes</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">dbdata:/var/lib/postgresql/data</span>
<span class="hll"><span class="w"> </span><span class="nt">healthcheck</span><span class="p">:</span>
</span><span class="hll"><span class="w"> </span><span class="nt">test</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">[</span><span class="s">"CMD-SHELL"</span><span class="p p-Indicator">,</span><span class="w"> </span><span class="s">"pg_isready"</span><span class="p p-Indicator">]</span>
</span><span class="hll"><span class="w"> </span><span class="nt">interval</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">10s</span>
</span><span class="hll"><span class="w"> </span><span class="nt">timeout</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">5s</span>
</span><span class="hll"><span class="w"> </span><span class="nt">retries</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">5</span>
</span><span class="w"> </span><span class="nt">app</span><span class="p">:</span>
<span class="w"> </span><span class="nt">build</span><span class="p">:</span>
<span class="w"> </span><span class="nt">context</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">whaleapp</span>
<span class="w"> </span><span class="nt">dockerfile</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Dockerfile</span>
<span class="w"> </span><span class="nt">environment</span><span class="p">:</span>
<span class="w"> </span><span class="nt">WHALEAPP__DB_HOST</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">db</span>
<span class="w"> </span><span class="nt">WHALEAPP__DB_NAME</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">whale_db</span>
<span class="w"> </span><span class="nt">WHALEAPP__DB_USER</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">whale_user</span>
<span class="w"> </span><span class="nt">WHALEAPP__DB_PASSWORD</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">whale_password</span>
<span class="hll"><span class="w"> </span><span class="nt">depends_on</span><span class="p">:</span>
</span><span class="hll"><span class="w"> </span><span class="nt">db</span><span class="p">:</span>
</span><span class="hll"><span class="w"> </span><span class="nt">condition</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">service_healthy</span>
</span>
<span class="nt">volumes</span><span class="p">:</span>
<span class="w"> </span><span class="nt">dbdata</span><span class="p">:</span>
</pre></div> </div> </div><p>The health check for the Postgres container leverages the command line tool <code>pg_isready</code> that is successful only when the database is ready to accept connections, and tries every 10 seconds for 5 times. Now, when you run <code>up -d</code> this time you should notice a clear delay before the application is run, but the logs won't contain any connection error.</p><h2 id="final-words-9803">Final words<a class="headerlink" href="#final-words-9803" title="Permanent link">¶</a></h2><p>Well, this was a long one, but I hope you enjoyed the trip and you ended up having a better picture of what problems Docker Compose solve, along with a feeling of how complicated it might be to design an architecture. Everything we did was for a "simple" development environment with a couple of containers, so you can figure what is involved when we get to live environments.</p><h2 id="updates-0083">Updates<a class="headerlink" href="#updates-0083" title="Permanent link">¶</a></h2><p>2022-03-17: Thanks to my colleague Joanna Stadnik for a thorough review, for spotting typos, and for giving me several suggestions based on her experience. Thank you!</p><h2 id="feedback-d845">Feedback<a class="headerlink" href="#feedback-d845" title="Permanent link">¶</a></h2><p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>AWS Log Insights as CloudWatch metrics with Python and Terraform2021-03-22T17:00:00+01:002021-03-22T17:00:00+01:00Leonardo Giordanitag:www.thedigitalcatonline.com,2021-03-22:/blog/2021/03/22/aws-log-insights-as-cloudwatch-metrics-with-python-and-terraform/<p> A step-by-step report on how to build a Lambda function with Terraform and Python to convert Log Insights queries into CloudWatch metrics</p><p>Recently I started using <a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AnalyzingLogData.html">AWS CloudWatch Log Insights</a> and I find the tool really useful to extract data about the systems I'm running without having to set up dedicated monitoring tools, which come with their own set of permissions, rules, configuration language, and so forth.</p><p>Log Insights allow you to query log outputs with a language based on regular expressions with hints of SQL and to produce tables or graphs of quantities that you need to monitor. For example, the system I am monitoring runs Celery in ECS containers that log received tasks with a line like the following</p><div class="code"><div class="content"><div class="highlight"><pre>16:39:11,156 [32mINFO [0m [34m[celery.worker.strategy][0m [01mReceived task: lib.tasks.lists.trigger_list_log_notification[9b33b464-d4f9-4909-8d4e-1a3134fead97] [0m
</pre></div> </div> </div><p>In this case the specific function in the system that was triggered is <code>lib.tasks.log_notification</code>, and I'm interested in knowing which functions are called the most, so I can easily count them with</p><div class="code"><div class="content"><div class="highlight"><pre>parse @message /\[celery\.(?<source>[a-z.]+)\].*Received task: (?<task>[a-z._]+)\[/
| filter not isblank(source)
| stats count(*) as number by task
| sort number desc
| limit 9
</pre></div> </div> </div><p>This gives me a nice table of the top 9 <code>source</code> functions and the number of <code>task</code> submitted for each, and the time frame can be adjusted with the usual CloudWatch controls</p><div class="code"><div class="content"><div class="highlight"><pre>1 lib.tasks.lists.trigger_list_log_notification 4559
2 lib.tasks.notify.notify_recipient 397
3 lib.message._send_mobile_push_notification 353
4 lib.tasks.jobs.check_job_cutoffs 178
5 lib.tasks.notify.check_message_cutoffs 177
6 lib.tasks.notify.check_notification_retry 177
7 lib.tasks.notify.async_list_response 81
8 lib.tasks.hmrc_poll.govtalk_periodic_poll 59
9 lib.tasks.lists.recalculate_list_entry 56
</pre></div> </div> </div><p>Using time bins, quantities can also be easily plotted. For example, I can process and visualise the number of received tasks with</p><div class="code"><div class="content"><div class="highlight"><pre>parse @message /\[celery\.(?<source>[a-z.]+)\].*Received task: (?<task>[a-z._]+)\[/
| filter not isblank(source)
| stats count(*) by bin(30s)
</pre></div> </div> </div><p>Unfortunately I quickly discovered an important limitation of Log Insights, that is <strong>queries are not metrics</strong>. Which also immediately implies that I can't set up alarms on those queries. As fun as it is to look at nice plots, I need something automatic that sends me messages or scales up systems in reaction to specific events such as "too many submitted tasks".</p><p>The standard solution to this problem suggested by AWS is to write a Lambda that runs the query and stores the value into a custom CloudWatch metric, which I can then use to satisfy my automation needs. I did it, and in this post I will show you exactly how, using Terraform, Python and Zappa, CloudWatch, and DynamoDB. At the end of the post I will also briefly discuss the cost of the solution.</p><h2 id="the-big-picture-f6bc">The big picture<a class="headerlink" href="#the-big-picture-f6bc" title="Permanent link">¶</a></h2><p>Before I get into the details of the specific tools or solutions that I decided to implement, let me have a look at the bigger picture. The initial idea is very simple: a Lambda function can run a specific Log Insights query and store the results in a custom metric, which can in turn be used to trigger alarms and other actions.</p><p>For a single system I already have 4 or 5 of these queries that I'd like to run, and I have multiple systems, so I'd prefer to have a solution that doesn't require me to deploy and maintain a different Lambda for each query. The maintenance can be clearly automated as well, but such a solution smells of duplicated code miles away, and if there is no specific reason to go down that road I prefer to avoid it.</p><p>Since Log Insights queries are just strings of code, however, we can store them somewhere and then simply loop on all of them within the same Lambda function. To implement this, I created a DynamoDB table and every element contains all the data I need to run each query, such as the log group that I want to investigate and the name of the target metric.</p><h2 id="terraform-a3cb">Terraform<a class="headerlink" href="#terraform-a3cb" title="Permanent link">¶</a></h2><p>In the following sections I will discuss the main components of the solution from the infrastructural point of view, showing how I created them with Terraform. The four main AWS services that I will use are: <a href="https://aws.amazon.com/dynamodb/">DynamoDB</a>, <a href="https://aws.amazon.com/lambda/">Lambda</a>, <a href="https://aws.amazon.com/iam/">IAM</a>, <a href="https://aws.amazon.com/cloudwatch/">CloudWatch</a>.</p><p>I put the bulk of the code in a module so that I can easily create the same structure for multiple AWS accounts. While my current setup is a bit more complicated that that, the structure of the code can be simplified as</p><div class="code"><div class="content"><div class="highlight"><pre>+ common
+ lambda-loginsights2metrics
+ cloudwatch.tf
+ dynamodb.tf
+ iam.tf
+ lambda.tf
+ variables.tf
+ account1
+ lambda-loginsights2metrics
+ main.tf
+ variables.tf
</pre></div> </div> </div><h3 id="variables-7edf">Variables</h3><p>Since I will refer to them in the following sections, let me show you the four variables I defined for this module.</p><p>First I need to receive the items that I need to store in the DynamoDB table</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/variables.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">variable</span><span class="w"> </span><span class="nv">"items"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"list"</span>
<span class="w"> </span><span class="na">default</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[]</span>
<span class="p">}</span>
</pre></div> </div> </div><p>I prefer to have a prefix in front of my components that allows me to duplicate them without clashes</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/variables.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">variable</span><span class="w"> </span><span class="nv">"prefix"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"string"</span>
<span class="w"> </span><span class="na">default</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"loginsights2metrics"</span>
<span class="p">}</span>
</pre></div> </div> </div><p>The Lambda function will require a list of security groups that grant access to specific network components</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/variables.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">variable</span><span class="w"> </span><span class="nv">"security_groups"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"list"</span>
<span class="w"> </span><span class="na">default</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[]</span>
<span class="p">}</span>
</pre></div> </div> </div><p>Finally, Lambda functions need to be told which VPC subnets they can use to run</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/variables.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">variable</span><span class="w"> </span><span class="nv">"vpc_subnets"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"list"</span>
<span class="w"> </span><span class="na">default</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[]</span>
<span class="p">}</span>
</pre></div> </div> </div><h4 id="resources-8ec3">Resources</h4><ul><li><a href="https://www.terraform.io/docs/configuration-0-11/variables.html">Terraform variables</a>.</li><li>An <a href="https://spacelift.io/blog/how-to-use-terraform-variables">in-depth post</a> that explains how to use variables in Terraform, by Sumeet Ninawe</li></ul><h3 id="dynamodb-55e8">DynamoDB</h3><p>Let's start with the corner stone, which is the DynamoDB table that contains data for the queries. As DynamoDB is not a SQL database we don't need to define columns in advance. This clearly might get us into trouble later, so we need to be careful and be consistent when we write items, adding everything is needed by the Lambda code.</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/dynamodb.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_dynamodb_table"</span><span class="w"> </span><span class="nv">"loginsights2metrics"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.prefix}-items"</span>
<span class="w"> </span><span class="na">billing_mode</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"PAY_PER_REQUEST"</span>
<span class="w"> </span><span class="na">hash_key</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"SlotName"</span>
<span class="w"> </span><span class="nb">attribute</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"SlotName"</span>
<span class="w"> </span><span class="na">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"S"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><p>Speaking of items, I assume I will pass them when I call the module, so here I just need to loop on the input variable <code>items</code></p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/dynamodb.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_dynamodb_table_item"</span><span class="w"> </span><span class="nv">"item"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">count</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">length</span><span class="p">(</span><span class="nv">var.items</span><span class="p">)</span>
<span class="w"> </span><span class="na">table_name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_dynamodb_table.loginsights2metrics.name</span>
<span class="w"> </span><span class="na">hash_key</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_dynamodb_table.loginsights2metrics.hash_key</span>
<span class="w"> </span><span class="na">item</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">jsonencode</span><span class="p">(</span><span class="nf">element</span><span class="p">(</span><span class="nv">var.items</span><span class="p">,</span><span class="w"> </span><span class="nv">count.index</span><span class="p">))</span>
<span class="p">}</span>
</pre></div> </div> </div><p>Since the query is written as a Terraform string and will be read from Python there are two small caveats here. To be consistent with Terraform's syntax we need to escape double quotes in the query, and to avoid fights with Python we need to escape backslashes. So for example a valid query like</p><div class="code"><div class="content"><div class="highlight"><pre>parse @message /\[celery\.(?<source>[a-z.]+)\].*Received task: (?<task>[a-z._]+)\[/
| filter not isblank(source)
| stats count(*) as Value by bin(1m)
</pre></div> </div> </div><p>will be stored as</p><div class="code"><div class="content"><div class="highlight"><pre>"parse @message /\\[celery\\.(?<source>[a-z.]+)\\].*Received task: (?<task>[a-z._]+)\\[/ | filter not isblank(source) | stats count(*) as Value by bin(1m)"
</pre></div> </div> </div><p>Another remark is that the Lambda I will write in Python will read data plotted with the name <code>Value</code> on bins of 1 minute, so the query should end with <code>stats X as Value by bin(1m)</code> where <code>X</code> is a specific stat, for example <code>stats count(*) as Value by bin(1m)</code>.</p><p>The reason behind 1 minute is that the maximum standard resolution of CloudWatch metrics is 1 minute. Should you want more you need to have a look at <a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/publishingMetrics.html#high-resolution-metrics">CloudWatch High-Resolution Metrics</a>.</p><h4 id="resources-8ec3">Resources</h4><ul><li><a href="https://aws.amazon.com/dynamodb/">Amazon DynamoDB</a></li><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/dynamodb_table">aws_dynamodb_table documentation</a></li><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/dynamodb_table_item">aws_dynamodb_table_item documentation</a></li></ul><h3 id="iam-part-1-cde2">IAM part 1</h3><p>IAM roles are central in AWS. In this specific case we have the so-called <a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-intro-execution-role.html">Lambda execution role</a>, which is the IAM role that the Lambda assumes when you run it. In AWS users or services (that is humans or AWS components) <em>assume</em> a role, receiving the permissions connected with it. To assume roles, however, they need to have a specific permission, a so-called <em>trust policy</em>.</p><p>Let's define a trust policy that allows the Lambda service to assume the role that we will define</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/iam.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">data</span><span class="w"> </span><span class="nc">"aws_iam_policy_document"</span><span class="w"> </span><span class="nv">"trust"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">statement</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">actions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="s2">"sts:AssumeRole"</span><span class="p">]</span>
<span class="w"> </span><span class="nb">principals</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Service"</span>
<span class="w"> </span><span class="na">identifiers</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="s2">"lambda.amazonaws.com"</span>
<span class="w"> </span><span class="p">]</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><p>and after that the role in question</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/iam.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_iam_role"</span><span class="w"> </span><span class="nv">"loginsights2metrics"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">var.prefix</span>
<span class="w"> </span><span class="na">assume_role_policy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">data.aws_iam_policy_document.trust.json</span>
<span class="p">}</span>
</pre></div> </div> </div><p>To run, Lambdas need an initial set of permissions which can be found in the canned policy <code>AWSLambdaVPCAccessExecutionRole</code>. You can see the content of the policy in the IAM console or dumping it with <code>aws iam get-policy</code> and <code>aws iam get-policy-version</code></p><div class="code"><div class="content"><div class="highlight"><pre>$ aws iam get-policy --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole
{
"Policy": {
"PolicyName": "AWSLambdaVPCAccessExecutionRole",
"PolicyId": "ANPAJVTME3YLVNL72YR2K",
"Arn": "arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole",
"Path": "/service-role/",
"DefaultVersionId": "v2",
"AttachmentCount": 0,
"PermissionsBoundaryUsageCount": 0,
"IsAttachable": true,
"Description": "Provides minimum permissions for a Lambda function to execute while accessing a resource within a VPC - create, describe, delete network interfaces and write permissions to CloudWatch Logs. ",
"CreateDate": "2016-02-11T23:15:26Z",
"UpdateDate": "2020-10-15T22:53:03Z"
}
}
$ aws iam get-policy-version --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole --version-id v2
{
"PolicyVersion": {
"Document": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents",
"ec2:CreateNetworkInterface",
"ec2:DescribeNetworkInterfaces",
"ec2:DeleteNetworkInterface",
"ec2:AssignPrivateIpAddresses",
"ec2:UnassignPrivateIpAddresses"
],
"Resource": "*"
}
]
},
"VersionId": "v2",
"IsDefaultVersion": true,
"CreateDate": "2020-10-15T22:53:03Z"
}
}
</pre></div> </div> </div><p>Attaching a canned policy is just a matter of creating a specific <code>aws_iam_role_policy_attachment</code> resource</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/iam.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_iam_role_policy_attachment"</span><span class="w"> </span><span class="nv">"loginsights2metrics-"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">role</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_iam_role.loginsights2metrics.name</span>
<span class="w"> </span><span class="na">policy_arn</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole"</span>
<span class="p">}</span>
</pre></div> </div> </div><p>Now that we have the IAM role and the basic policy we can assign custom permissions to it. We need to grant the Lambda permissions on other AWS components, namely CloudWatch to run Log Insights queries and to store metrics and DynamoDB to retrieve all the items from the queries table.</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/iam.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">data</span><span class="w"> </span><span class="nc">"aws_iam_policy_document"</span><span class="w"> </span><span class="nv">"loginsights2metrics"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">statement</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">actions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="s2">"cloudwatch:PutMetricData"</span><span class="p">,</span>
<span class="w"> </span><span class="s2">"cloudwatch:PutMetricAlarm"</span><span class="p">,</span>
<span class="w"> </span><span class="s2">"logs:StartQuery"</span><span class="p">,</span>
<span class="w"> </span><span class="s2">"logs:GetQueryResults"</span><span class="p">,</span>
<span class="w"> </span><span class="s2">"logs:GetLogEvents"</span><span class="p">,</span>
<span class="w"> </span><span class="p">]</span>
<span class="w"> </span><span class="na">resources</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="s2">"*"</span><span class="p">]</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nb">statement</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">actions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="s2">"dynamodb:Scan"</span>
<span class="w"> </span><span class="p">]</span>
<span class="w"> </span><span class="na">resources</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="nv">aws_dynamodb_table.loginsights2metrics.arn</span><span class="p">]</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><p>Through <code>aws_iam_role_policy</code> we can create and assign the policy out of a <code>data</code> structure</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/iam.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_iam_role_policy"</span><span class="w"> </span><span class="nv">"loginsights2metrics"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">var.prefix</span>
<span class="w"> </span><span class="na">role</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_iam_role.loginsights2metrics.name</span>
<span class="w"> </span><span class="na">policy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">data.aws_iam_policy_document.loginsights2metrics.json</span>
<span class="p">}</span>
</pre></div> </div> </div><h4 id="resources-8ec3">Resources</h4><ul><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document">aws_iam_policy_document documentation</a></li><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role">aws_iam_role documentation</a></li><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy_attachment">aws_iam_role_policy_attachment documentation</a></li><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy">aws_iam_role_policy documentation</a></li><li><a href="https://docs.aws.amazon.com/cli/latest/reference/iam/get-policy.html">AWS CLI iam get-policy documentation</a></li><li><a href="https://docs.aws.amazon.com/cli/latest/reference/iam/get-policy-version.html">AWS CLI iam get-policy-version documentation</a></li></ul><h3 id="lambda-0ea2">Lambda</h3><p>We can now create the Lambda function container. I do not use Terraform as a deployer, as I think it should be used to define static infrastructure only, so I will use a dummy function here and later deploy the real code using the AWS CLI.</p><p>The dummy function can be easily created with</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/lambda.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">data</span><span class="w"> </span><span class="nc">"archive_file"</span><span class="w"> </span><span class="nv">"dummy"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"zip"</span>
<span class="w"> </span><span class="na">output_path</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${path.module}/lambda.zip"</span>
<span class="w"> </span><span class="nb">source</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">content</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"dummy"</span>
<span class="w"> </span><span class="na">filename</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"dummy.txt"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><p>The Lambda function is a bit more complicated. As I mentioned, I'll use Zappa to package the function, so the <code>handler</code> has to be <code>"zappa.handler.lambda_handler"</code>. The IAM role given to the function is the one we defined previously, while <code>memory_size</code> and <code>timeout</code> clearly depend on the specific function. Lambdas should run in private networks, and I won't cover here the steps to create them. The AWS docs contains a lot of details on this topic, e.g. <a href="https://aws.amazon.com/premiumsupport/knowledge-center/internet-access-lambda-function/">https://aws.amazon.com/premiumsupport/knowledge-center/internet-access-lambda-function/</a>.</p><p>The environment variables allow me to inject the name of the DynamoDB table so that I don't need to hardcode it. I also pass another variable, the <a href="https://sentry.io/welcome/">Sentry DSN</a> that I use in my configuration. This is not essential for the problem at hand, but I left it there to show how to pass such values.</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/lambda.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_lambda_function"</span><span class="w"> </span><span class="nv">"loginsights2metrics"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">function_name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"loginsights2metrics"</span>
<span class="w"> </span><span class="na">handler</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"zappa.handler.lambda_handler"</span>
<span class="w"> </span><span class="na">runtime</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"python3.8"</span>
<span class="w"> </span><span class="na">filename</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">data.archive_file.dummy.output_path</span>
<span class="w"> </span><span class="na">role</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_iam_role.loginsights2metrics.arn</span>
<span class="w"> </span><span class="na">memory_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">128</span>
<span class="w"> </span><span class="na">timeout</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">300</span>
<span class="w"> </span><span class="nb">vpc_config</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">subnet_ids</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">var.vpc_subnets</span>
<span class="w"> </span><span class="na">security_group_ids</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">var.security_groups</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nb">environment</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">variables</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"SENTRY_DSN"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"https://XXXXXX:@sentry.io/YYYYYY"</span><span class="p">,</span>
<span class="w"> </span><span class="s2">"DYNAMODB_TABLE"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_dynamodb_table.loginsights2metrics.name</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nb">lifecycle</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">ignore_changes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="nb">last_modified, filename</span><span class="p">]</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><p>Please note that I instructed Terraform to ignore changes to the two attributes <code>last_modified</code> and <code>filename</code>, and that I haven't used any <code>source_code_hash</code>. This way I can safely apply Terraform to change parameters like <code>memory_size</code> or <code>timeout</code> without affecting what I deployed with the CI.</p><p>Since I want to trigger the function from AWS CloudWatch Events I need to grant the service <code>events.amazonaws.com</code> the <code>lambda:InvokeFunction</code> permission.</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/lambda.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_lambda_permission"</span><span class="w"> </span><span class="nv">"loginsights2metrics"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">statement_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"AllowExecutionFromCloudWatch"</span>
<span class="w"> </span><span class="na">action</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"lambda:InvokeFunction"</span>
<span class="w"> </span><span class="na">function_name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_lambda_function.loginsights2metrics.function_name</span>
<span class="w"> </span><span class="na">principal</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"events.amazonaws.com"</span>
<span class="w"> </span><span class="na">source_arn</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_cloudwatch_event_rule.rate.arn</span>
<span class="p">}</span>
</pre></div> </div> </div><h4 id="resources-8ec3">Resources</h4><ul><li><a href="https://registry.terraform.io/providers/hashicorp/archive/latest/docs/data-sources/archive_file">archive_file documentation</a></li><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lambda_function">aws_lambda_function documentation</a></li><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lambda_permission">aws_lambda_permission documentation</a></li></ul><h3 id="iam-part-2-7f1e">IAM part 2</h3><p>Since 2018 Lambdas have a maximum execution time of 15 minutes (900 seconds), which is more than enough for many services, but to be conservative I preferred to leverage Zappa's asynchronous calls and to make the main Lambda call itself for each query. The Lambda doesn't clearly call the same Python function (it's not recursive), but from AWS's point of view we have a Lambda that calls itself, so we need to give it a specific permission to do this.</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/iam.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">data</span><span class="w"> </span><span class="nc">"aws_iam_policy_document"</span><span class="w"> </span><span class="nv">"loginsights2metrics_exec"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">statement</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">actions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="s2">"lambda:InvokeAsync"</span><span class="p">,</span>
<span class="w"> </span><span class="s2">"lambda:InvokeFunction"</span>
<span class="w"> </span><span class="p">]</span>
<span class="w"> </span><span class="na">resources</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="nv">aws_lambda_function.loginsights2metrics.arn</span><span class="p">]</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><p>I could not define this when I defined the rest of the IAM components because this needs the Lambda to be defined, but the resource is in the same file. Terraform doesn't care about which resource we defined first and where we define it as long as there are no loops in the definitions.</p><p>We can now assign the newly created policy document to the IAM role we created previously</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/iam.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_iam_role_policy"</span><span class="w"> </span><span class="nv">"loginsights2metrics_exec"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.prefix}-exec"</span>
<span class="w"> </span><span class="na">role</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_iam_role.loginsights2metrics.name</span>
<span class="w"> </span><span class="na">policy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">data.aws_iam_policy_document.loginsights2metrics_exec.json</span>
<span class="p">}</span>
</pre></div> </div> </div><h4 id="resources-8ec3">Resources</h4><ul><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document">aws_iam_policy_document documentation</a> documentation")</li><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy">aws_iam_role_policy documentation</a></li></ul><h3 id="cloudwatch-518e">CloudWatch</h3><p>Whenever you need to run Lambdas (or other things) periodically, the standard AWS solution is to use CloudWatch Events, which work as the AWS cron system. CloudWatch Events are made of rules and targets, so first of all I defined a rule that gets triggered every 2 minutes</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/cloudwatch.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_cloudwatch_event_rule"</span><span class="w"> </span><span class="nv">"rate"</span><span class="w"> </span><span class="p">{</span>
<span class="c1"> # Zappa requires the name to match the processing function</span>
<span class="w"> </span><span class="na">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"main.loginsights2metrics"</span>
<span class="w"> </span><span class="na">description</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Trigger Lambda ${var.prefix}"</span>
<span class="w"> </span><span class="na">schedule_expression</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"rate(2 minutes)"</span>
<span class="p">}</span>
</pre></div> </div> </div><p>Please note that Zappa has a specific requirement for CloudWatch Events, so I left a comment to clarify this to my future self. The second part of the event is the target, which is the Lambda function that we defined in the previous section.</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/cloudwatch.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_cloudwatch_event_target"</span><span class="w"> </span><span class="nv">"lambda"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">rule</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_cloudwatch_event_rule.rate.name</span>
<span class="w"> </span><span class="na">target_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.prefix}-target"</span>
<span class="w"> </span><span class="na">arn</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_lambda_function.loginsights2metrics.arn</span>
<span class="p">}</span>
</pre></div> </div> </div><h4 id="resources-8ec3">Resources</h4><ul><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_event_rule">aws_cloudwatch_event_rule documentation</a></li><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_event_target">aws_cloudwatch_event_target documentation</a></li></ul><h3 id="using-the-module-5d88">Using the module</h3><p>Now the module is finished, so I just need to create some items for the DynamoDB table and to call the module itself</p><div class="code"><div class="title"><code>account1/lambda-loginsights2metrics/main.tf</code></div><div class="content"><div class="highlight"><pre><span class="nb">locals</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">items</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"SlotName"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"Celery Logs submitted tasks"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"LogGroup"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"mycluster/celery"</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"ClusterName"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"mycluster"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"Query"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"parse @message /\\[celery\\.(?<source>[a-z.]+)\\].*Received task: (?<task>[a-z._]+)\\[/ | filter not isblank(source) | stats count(*) as Value by bin(1m)"</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"Namespace"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"Custom"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"MetricName"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"Submitted tasks"</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"SlotName"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"Celery Logs succeeded tasks"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"LogGroup"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"mycluster/celery"</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"ClusterName"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"mycluster"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"Query"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"parse @message /\\[celery.(?<source>[a-z\\._]+)].*Task (?<task>[a-z\\._]+)\\[.*\\] (?<event>[a-z]+)/ | filter source = \"app.trace\" | filter event = \"succeeded\" | stats count(*) as Value by bin(1m)"</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"Namespace"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"Custom"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"MetricName"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"Succeeded tasks"</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"SlotName"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"Celery Logs retried tasks"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"LogGroup"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"mycluster/celery"</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"ClusterName"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"mycluster"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"Query"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"parse @message /\\[celery.(?<source>[a-z\\._]+)].*Task (?<task>[a-z\\._]+)\\[.*\\] (?<event>[a-z]+)/ | filter source = \"app.trace\" | filter event = \"retry\" | stats count(*) as Value by bin(1m)"</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"Namespace"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"Custom"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"MetricName"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"Retried tasks"</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">]</span>
<span class="p">}</span>
</pre></div> </div> </div><p>I need to provide a security group for the Lambda, and in this case I can safely use the default one provided by the VPC</p><div class="code"><div class="title"><code>account1/lambda-loginsights2metrics/main.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">data</span><span class="w"> </span><span class="nc">"aws_security_group"</span><span class="w"> </span><span class="nv">"default"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"default"</span>
<span class="w"> </span><span class="na">vpc_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">var.vpc_id</span>
<span class="p">}</span>
</pre></div> </div> </div><p>And I can finally call the module</p><div class="code"><div class="title"><code>account1/lambda-loginsights2metrics/main.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">module</span><span class="w"> </span><span class="nv">"loginsights2metrics"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">source</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"../../common/lambda-loginsights2metrics"</span>
<span class="w"> </span><span class="na">items</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">local.items</span>
<span class="w"> </span><span class="na">security_groups</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="nv">data.aws_security_group.default.id</span><span class="p">]</span>
<span class="w"> </span><span class="na">vpc_subnets</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">var.vpc_private_subnets</span>
<span class="p">}</span>
</pre></div> </div> </div><p>Please note that the variable <code>vpc_private_subnets</code> is a list of subnet names that I created in another module.</p><h4 id="resources-8ec3">Resources</h4><ul><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/security_group">aws_security_group documentation</a></li><li><a href="https://www.terraform.io/docs/language/modules/develop/index.html">Creating Terraform modules</a></li></ul><h2 id="python-43d2">Python<a class="headerlink" href="#python-43d2" title="Permanent link">¶</a></h2><p>As I mentioned before, the Python code of the Lambda function is contained in a different repository and deployed with the CI using <a href="https://github.com/zappa/Zappa">Zappa</a>. Given we are interacting with AWS I am clearly using Boto3, the <a href="https://boto3.amazonaws.com/v1/documentation/api/latest/index.html">AWS SDK for Python</a>. The code was developed locally without Zappa's support, to test out the Boto3 functions I wanted to use, then quickly adjusted to be executed in a Lambda.</p><p>I think the code is pretty straightforward, but I left my original comments to be sure everything is clear. </p><div class="code"><div class="content"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">time</span>
<span class="kn">import</span> <span class="nn">json</span>
<span class="kn">from</span> <span class="nn">datetime</span> <span class="kn">import</span> <span class="n">datetime</span><span class="p">,</span> <span class="n">timedelta</span>
<span class="kn">import</span> <span class="nn">boto3</span>
<span class="kn">from</span> <span class="nn">zappa.asynchronous</span> <span class="kn">import</span> <span class="n">task</span>
<span class="c1"># CONFIG</span>
<span class="n">logs</span> <span class="o">=</span> <span class="n">boto3</span><span class="o">.</span><span class="n">client</span><span class="p">(</span><span class="s2">"logs"</span><span class="p">,</span> <span class="n">region_name</span><span class="o">=</span><span class="s2">"eu-west-1"</span><span class="p">)</span>
<span class="n">cw</span> <span class="o">=</span> <span class="n">boto3</span><span class="o">.</span><span class="n">client</span><span class="p">(</span><span class="s2">"cloudwatch"</span><span class="p">,</span> <span class="n">region_name</span><span class="o">=</span><span class="s2">"eu-west-1"</span><span class="p">)</span>
<span class="n">dynamodb</span> <span class="o">=</span> <span class="n">boto3</span><span class="o">.</span><span class="n">resource</span><span class="p">(</span><span class="s2">"dynamodb"</span><span class="p">,</span> <span class="n">region_name</span><span class="o">=</span><span class="s2">"eu-west-1"</span><span class="p">)</span>
<span class="nd">@task</span>
<span class="k">def</span> <span class="nf">put_metric_data</span><span class="p">(</span><span class="n">item</span><span class="p">):</span> <span class="callout">3</span>
<span class="n">slot_name</span> <span class="o">=</span> <span class="n">item</span><span class="p">[</span><span class="s2">"SlotName"</span><span class="p">]</span>
<span class="n">log_group</span> <span class="o">=</span> <span class="n">item</span><span class="p">[</span><span class="s2">"LogGroup"</span><span class="p">]</span>
<span class="n">cluster_name</span> <span class="o">=</span> <span class="n">item</span><span class="p">[</span><span class="s2">"ClusterName"</span><span class="p">]</span>
<span class="n">query</span> <span class="o">=</span> <span class="n">item</span><span class="p">[</span><span class="s2">"Query"</span><span class="p">]</span>
<span class="n">namespace</span> <span class="o">=</span> <span class="n">item</span><span class="p">[</span><span class="s2">"Namespace"</span><span class="p">]</span>
<span class="n">metric_name</span> <span class="o">=</span> <span class="n">item</span><span class="p">[</span><span class="s2">"MetricName"</span><span class="p">]</span>
<span class="c1"># This runs the Log Insights query fetching data</span>
<span class="c1"># for the last 15 minutes.</span>
<span class="c1"># As we deal with logs processing it's entirely possible</span>
<span class="c1"># for the metric to be updated, for example because</span>
<span class="c1"># a log was received a bit later.</span>
<span class="c1"># When we put multiple values for the same timestamp</span>
<span class="c1"># in the metric CW can show max, min, avg, and percentiles.</span>
<span class="c1"># Since this is an update of a count we should then always</span>
<span class="c1"># use "max".</span>
<span class="n">start_query_response</span> <span class="o">=</span> <span class="n">logs</span><span class="o">.</span><span class="n">start_query</span><span class="p">(</span> <span class="callout">4</span>
<span class="n">logGroupName</span><span class="o">=</span><span class="n">log_group</span><span class="p">,</span>
<span class="n">startTime</span><span class="o">=</span><span class="nb">int</span><span class="p">((</span><span class="n">datetime</span><span class="o">.</span><span class="n">now</span><span class="p">()</span> <span class="o">-</span> <span class="n">timedelta</span><span class="p">(</span><span class="n">minutes</span><span class="o">=</span><span class="mi">15</span><span class="p">))</span><span class="o">.</span><span class="n">timestamp</span><span class="p">()),</span>
<span class="n">endTime</span><span class="o">=</span><span class="nb">int</span><span class="p">(</span><span class="n">datetime</span><span class="o">.</span><span class="n">now</span><span class="p">()</span><span class="o">.</span><span class="n">timestamp</span><span class="p">()),</span>
<span class="n">queryString</span><span class="o">=</span><span class="n">query</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">query_id</span> <span class="o">=</span> <span class="n">start_query_response</span><span class="p">[</span><span class="s2">"queryId"</span><span class="p">]</span>
<span class="c1"># Just polling the API. 5 seconds seems to be a good</span>
<span class="c1"># compromise between not pestering the API and not paying</span>
<span class="c1"># too much for the Lambda.</span>
<span class="n">response</span> <span class="o">=</span> <span class="kc">None</span>
<span class="k">while</span> <span class="n">response</span> <span class="ow">is</span> <span class="kc">None</span> <span class="ow">or</span> <span class="n">response</span><span class="p">[</span><span class="s2">"status"</span><span class="p">]</span> <span class="o">==</span> <span class="s2">"Running"</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">slot_name</span><span class="si">}</span><span class="s2">: waiting for query to complete ..."</span><span class="p">)</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">logs</span><span class="o">.</span><span class="n">get_query_results</span><span class="p">(</span><span class="n">queryId</span><span class="o">=</span><span class="n">query_id</span><span class="p">)</span>
<span class="c1"># Data comes in a strange format, a dictionary of</span>
<span class="c1"># {"field":name,"value":actual_value}, so this converts</span>
<span class="c1"># it into something that can be accessed through keys</span>
<span class="n">data</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">d</span> <span class="ow">in</span> <span class="n">response</span><span class="p">[</span><span class="s2">"results"</span><span class="p">]:</span> <span class="callout">5</span>
<span class="n">sample</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">d</span><span class="p">:</span>
<span class="n">field</span> <span class="o">=</span> <span class="n">i</span><span class="p">[</span><span class="s2">"field"</span><span class="p">]</span>
<span class="n">value</span> <span class="o">=</span> <span class="n">i</span><span class="p">[</span><span class="s2">"value"</span><span class="p">]</span>
<span class="n">sample</span><span class="p">[</span><span class="n">field</span><span class="p">]</span> <span class="o">=</span> <span class="n">value</span>
<span class="n">data</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">sample</span><span class="p">)</span>
<span class="c1"># Now that we have the data, let's put them into a metric.</span>
<span class="k">for</span> <span class="n">d</span> <span class="ow">in</span> <span class="n">data</span><span class="p">:</span>
<span class="n">timestamp</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">strptime</span><span class="p">(</span><span class="n">d</span><span class="p">[</span><span class="s2">"bin(1m)"</span><span class="p">],</span> <span class="s2">"%Y-%m-</span><span class="si">%d</span><span class="s2"> %H:%M:%S.000"</span><span class="p">)</span>
<span class="n">value</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">d</span><span class="p">[</span><span class="s2">"Value"</span><span class="p">])</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">slot_name</span><span class="si">}</span><span class="s2">: putting </span><span class="si">{</span><span class="n">value</span><span class="si">}</span><span class="s2"> on </span><span class="si">{</span><span class="n">timestamp</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="n">cw</span><span class="o">.</span><span class="n">put_metric_data</span><span class="p">(</span> <span class="callout">6</span>
<span class="n">Namespace</span><span class="o">=</span><span class="n">namespace</span><span class="p">,</span>
<span class="n">MetricData</span><span class="o">=</span><span class="p">[</span>
<span class="p">{</span>
<span class="s2">"MetricName"</span><span class="p">:</span> <span class="n">metric_name</span><span class="p">,</span>
<span class="s2">"Dimensions"</span><span class="p">:</span> <span class="p">[{</span><span class="s2">"Name"</span><span class="p">:</span> <span class="s2">"Cluster"</span><span class="p">,</span> <span class="s2">"Value"</span><span class="p">:</span> <span class="n">cluster_name</span><span class="p">}],</span>
<span class="s2">"Timestamp"</span><span class="p">:</span> <span class="n">timestamp</span><span class="p">,</span>
<span class="s2">"Value"</span><span class="p">:</span> <span class="n">value</span><span class="p">,</span>
<span class="s2">"Unit"</span><span class="p">:</span> <span class="s2">"None"</span><span class="p">,</span>
<span class="p">}</span>
<span class="p">],</span>
<span class="p">)</span>
<span class="k">def</span> <span class="nf">loginsights2metrics</span><span class="p">(</span><span class="n">event</span><span class="p">,</span> <span class="n">context</span><span class="p">):</span> <span class="callout">1</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s2">"package_info.json"</span><span class="p">,</span> <span class="s2">"r"</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">package_info</span> <span class="o">=</span> <span class="n">json</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="n">build_timestamp</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">package_info</span><span class="p">[</span><span class="s2">"build_time"</span><span class="p">])</span>
<span class="n">build_datetime</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">fromtimestamp</span><span class="p">(</span><span class="n">build_timestamp</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"###################################"</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span>
<span class="s2">"LogInsights2Metrics - Build date: "</span>
<span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">build_datetime</span><span class="o">.</span><span class="n">strftime</span><span class="p">(</span><span class="s2">"%Y/%m/</span><span class="si">%d</span><span class="s2"> %H:%M:%S"</span><span class="p">)</span><span class="si">}</span><span class="s1">'</span>
<span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"###################################"</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'Reading task from DynamoDB table </span><span class="si">{</span><span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s2">"DYNAMODB_TABLE"</span><span class="p">]</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
<span class="n">table</span> <span class="o">=</span> <span class="n">dynamodb</span><span class="o">.</span><span class="n">Table</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s2">"DYNAMODB_TABLE"</span><span class="p">])</span>
<span class="c1"># This is the simplest way to get all entries in the table</span>
<span class="c1"># The next loop will asynchronously call `put_metric_data`</span>
<span class="c1"># on each entry.</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">table</span><span class="o">.</span><span class="n">scan</span><span class="p">(</span><span class="n">Select</span><span class="o">=</span><span class="s2">"ALL_ATTRIBUTES"</span><span class="p">)</span> <span class="callout">2</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">response</span><span class="p">[</span><span class="s2">"Items"</span><span class="p">]:</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"* Processing item </span><span class="si">{</span><span class="n">i</span><span class="p">[</span><span class="s1">'SlotName'</span><span class="p">]</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="n">put_metric_data</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
</pre></div> </div> </div><p>So, when the Lambda is executed, the entry point is the function <code>loginsights2metrics</code> <span class="callout">1</span> which queries the DynamoDB table <span class="callout">2</span> and loops over all the items contained in it. The loop executes the function <code>put_metric_data</code> <span class="callout">3</span> which being a Zappa <code>task</code> runs it in a new Lambda invocation. This function runs the Log Insights query <span class="callout">4</span>, adjusts Boto3's output <span class="callout">5</span>, and finally puts the values in the custom metric <span class="callout">6</span>.</p><p>The problem I mention in the comment just before I run <code>logs.start_query</code> is interesting. Log Insights are queries, and since they extract data from logs the result can change between two calls of the same query. This means that, since there is an overlap between calls (we run a query on the last 15 minutes every 2 minutes), the function will put multiple values in the same bin of the metric. This is perfectly normal, and it's the reason why CloudWatch allows you to show the maximum, minimum, average, or various percentiles of the same metric. When it comes to counting events, the number can only increase or stay constant in time, but never decrease, so it's sensible to look at the maximum. This is not true if you are looking at execution times, for example, so pay attention to the nature of the underlying query when you graph the metric.</p><p>The Zappa settings I use for the function are</p><div class="code"><div class="title"><code>zappa_settings.json</code></div><div class="content"><div class="highlight"><pre><span class="p">{</span>
<span class="w"> </span><span class="nt">"main"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"app_module"</span><span class="p">:</span><span class="w"> </span><span class="s2">"main"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"app_function"</span><span class="p">:</span><span class="w"> </span><span class="s2">"main.loginsights2metrics"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"runtime"</span><span class="p">:</span><span class="w"> </span><span class="s2">"python3.8"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"log_level"</span><span class="p">:</span><span class="w"> </span><span class="s2">"WARNING"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"xray_tracing"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"exception_handler"</span><span class="p">:</span><span class="w"> </span><span class="s2">"zappa_sentry.unhandled_exceptions"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><p>And the requirements are</p><div class="code"><div class="title"><code>requirements.txt</code></div><div class="content"><div class="highlight"><pre>zappa
zappa-sentry
</pre></div> </div> </div><p>Please note that as I mentioned before <code>zappa-sentry</code> is not a strict requirement for this solution.</p><p>The code can be packaged and deployed with a simple bash script like</p><div class="code"><div class="content"><div class="highlight"><pre><span class="ch">#!/bin/bash</span>
<span class="nv">VENV_DIRECTORY</span><span class="o">=</span>venv
<span class="nv">LAMBDA_PACKAGE</span><span class="o">=</span>lambda.zip
<span class="nv">REGION</span><span class="o">=</span>eu-west-1
<span class="nv">FUNCTION_NAME</span><span class="o">=</span>loginsights2metrics
<span class="k">if</span><span class="w"> </span><span class="o">[[</span><span class="w"> </span>-d<span class="w"> </span><span class="si">${</span><span class="nv">VENV_DIRECTORY</span><span class="si">}</span><span class="w"> </span><span class="o">]]</span><span class="p">;</span><span class="w"> </span><span class="k">then</span><span class="w"> </span>rm<span class="w"> </span>-fR<span class="w"> </span><span class="si">${</span><span class="nv">VENV_DIRECTORY</span><span class="si">}</span><span class="p">;</span><span class="w"> </span><span class="k">fi</span>
<span class="k">if</span><span class="w"> </span><span class="o">[[</span><span class="w"> </span>-f<span class="w"> </span><span class="si">${</span><span class="nv">LAMBDA_PACKAGE</span><span class="si">}</span><span class="w"> </span><span class="o">]]</span><span class="p">;</span><span class="w"> </span><span class="k">then</span><span class="w"> </span>rm<span class="w"> </span>-fR<span class="w"> </span><span class="si">${</span><span class="nv">LAMBDA_PACKAGE</span><span class="si">}</span><span class="p">;</span><span class="w"> </span><span class="k">fi</span>
python<span class="w"> </span>-m<span class="w"> </span>venv<span class="w"> </span><span class="si">${</span><span class="nv">VENV_DIRECTORY</span><span class="si">}</span>
<span class="nb">source</span><span class="w"> </span><span class="si">${</span><span class="nv">VENV_DIRECTORY</span><span class="si">}</span>/bin/activate
pip<span class="w"> </span>install<span class="w"> </span>-r<span class="w"> </span>requirements.txt
zappa<span class="w"> </span>package<span class="w"> </span>main<span class="w"> </span>-o<span class="w"> </span><span class="si">${</span><span class="nv">LAMBDA_PACKAGE</span><span class="si">}</span>
rm<span class="w"> </span>-fR<span class="w"> </span><span class="si">${</span><span class="nv">VENV_DIRECTORY</span><span class="si">}</span>
aws<span class="w"> </span>--region<span class="o">=</span><span class="si">${</span><span class="nv">REGION</span><span class="si">}</span><span class="w"> </span>lambda<span class="w"> </span>update-function-code<span class="w"> </span>--function-name<span class="w"> </span><span class="si">${</span><span class="nv">FUNCTION_NAME</span><span class="si">}</span><span class="w"> </span>--zip-file<span class="w"> </span><span class="s2">"fileb://</span><span class="si">${</span><span class="nv">LAMBDA_PACKAGE</span><span class="si">}</span><span class="s2">"</span>
</pre></div> </div> </div>
<div class="advertisement">
<a href="https://www.thedigitalcat.academy/freebie-first-class-objects">
<img src="/images/first-class-objects/cover.jpg" />
</a>
<div class="body">
<h2 id="first-class-objects-in-python-fffa">First-class objects in Python<a class="headerlink" href="#first-class-objects-in-python-fffa" title="Permanent link">¶</a></h2>
<p>Higher-order functions, wrappers, and factories</p>
<p>Learn all you need to know to understand first-class citizenship in Python, the gateway to grasp how decorators work and how functional programming can supercharge your code.</p>
<div class="actions">
<a class="action" href="https://www.thedigitalcat.academy/freebie-first-class-objects">Get your FREE copy</a>
</div>
</div>
</div>
<h2 id="costs-dbe1">Costs<a class="headerlink" href="#costs-dbe1" title="Permanent link">¶</a></h2><p>I will follow here the <a href="https://aws.amazon.com/lambda/pricing/">AWS guide on Lambda pricing</a> and the calculations published in 2018 by my colleague João Neves on <a href="https://silvaneves.org/how-much-does-a-lambda-cost.html">his blog</a>.</p><p>I assume the following:</p><ul><li>The Lambda runs 4 queries, so we have 5 invocations (1 for the main Lambda and 4 asynchronous tasks)</li><li>Each invocation runs for 5 seconds. The current average time of each invocation in my AWS accounts is 4.6 seconds</li><li>I run the Lambda every 2 minutes</li></ul><p>Requests: <code>5 invocations/event * 30 events/hour * 24 hours/day * 31 days/month = 111600 requests</code></p><p>Duration: <code>0.128 GB/request * 111600 requests * 5 seconds = 71424 GB-second</code></p><p>Total: <code>$0.20 * 111600 / 10^6 + $0.0000166667 * 71424 ~= $1.22/month</code></p><p>As you can see, for applications like this it's extremely convenient to use a serverless solution like Lambda functions.</p><h2 id="feedback-d845">Feedback<a class="headerlink" href="#feedback-d845" title="Permanent link">¶</a></h2><p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>Dissecting a Web stack2020-02-16T15:00:00+00:002020-10-27T08:30:00+00:00Leonardo Giordanitag:www.thedigitalcatonline.com,2020-02-16:/blog/2020/02/16/dissecting-a-web-stack/<p>A layer-by-layer review of the components of a web stack and the reasons behind them</p><blockquote>
<p>It was gross. They wanted me to dissect a frog.</p>
<p>(Beetlejuice, 1988)</p>
</blockquote>
<h2 id="introduction">Introduction<a class="headerlink" href="#introduction" title="Permanent link">¶</a></h2>
<p>Having recently worked with young web developers who were exposed for the first time to proper production infrastructure, I received many questions about the various components that one can find in the architecture of a "Web service". These questions clearly expressed the confusion (and sometimes the frustration) of developers who understand how to create endpoints in a high-level language such as Node.js or Python, but were never introduced to the complexity of what happens between the user's browser and their framework of choice. Most of the times they don't know why the framework itself is there in the first place.</p>
<p>The challenge is clear if we just list (in random order), some of the words we use when we discuss (Python) Web development: HTTP, cookies, web server, Websockets, FTP, multi-threaded, reverse proxy, Django, nginx, static files, POST, certificates, framework, Flask, SSL, GET, WSGI, session management, TLS, load balancing, Apache.</p>
<p>In this post, I want to review all the words mentioned above (and a couple more) trying to build a production-ready web service from the ground up. I hope this might help young developers to get the whole picture and to make sense of these "obscure" names that senior developers like me tend to drop in everyday conversations (sometimes arguably out of turn).</p>
<p>As the focus of the post is the global architecture and the reasons behind the presence of specific components, the example service I will use will be a basic HTML web page. The reference language will be Python but the overall discussion applies to any language or framework.</p>
<p>My approach will be that of first stating the rationale and then implementing a possible solution. After this, I will point out missing pieces or unresolved issues and move on with the next layer. At the end of the process, the reader should have a clear picture of why each component has been added to the system.</p>
<h2 id="the-perfect-architecture">The perfect architecture<a class="headerlink" href="#the-perfect-architecture" title="Permanent link">¶</a></h2>
<p>A very important underlying concept of system architectures is that there is no <em>perfect solution</em> devised by some wiser genius, that we just need to apply. Unfortunately, often people mistake design patterns for such a "magic solution". The "Design Patterns" original book, however, states that</p>
<blockquote>
<p>Your design should be specific to the problem at hand but also general enough to address future problems and requirements. You also want to avoid redesign, or at least minimize it.</p>
</blockquote>
<p>And later</p>
<blockquote>
<p>Design patterns make it easier to reuse successful designs and architectures. [...] Design patterns help you choose design alternatives that make a system reusable and avoid alternatives that compromise reusability.</p>
</blockquote>
<p>The authors of the book are discussing Object-oriented Programming, but these sentences can be applied to any architecture. As you can see, we have a "problem at hand" and "design alternatives", which means that the most important thing to understand is the requirements, both the present and future ones. Only with clear requirements in mind, one can effectively design a solution, possibly tapping into the great number of patterns that other designers already devised.</p>
<p>A very last remark. A web stack is a complex beast, made of several components and software packages developed by different programmers with different goals in mind. It is perfectly understandable, then, that such components have some degree of superposition. While the division line between theoretical layers is usually very clear, in practice the separation is often blurry. Expect this a lot, and you will never be lost in a web stack anymore.</p>
<h2 id="some-definitions">Some definitions<a class="headerlink" href="#some-definitions" title="Permanent link">¶</a></h2>
<p>Let's briefly review some of the most important concepts involved in a Web stack, the protocols.</p>
<h3 id="tcpip">TCP/IP<a class="headerlink" href="#tcpip" title="Permanent link">¶</a></h3>
<p>TCP/IP is a network protocol, that is, a <em>set of established rules</em> two computers have to follow to get connected over a physical network to exchange messages. TCP/IP is composed of two different protocols covering two different layers of the OSI stack, namely the Transport (TCP) and the Network (IP) ones. TCP/IP can be implemented on top of any physical interface (Data Link and Physical OSI layers), such as Ethernet and Wireless. Actors in a TCP/IP network are identified by a <em>socket</em>, which is a tuple made of an IP address and a port number.</p>
<p>As far as we are concerned when developing a Web service, however, we need to be aware that TCP/IP is a <em>reliable</em> protocol, which in telecommunications means that the protocol itself takes care or retransmissions when packets get lost. In other words, while the speed of the communication is not granted, we can be sure that once a message is sent it will reach its destination without errors.</p>
<h3 id="http">HTTP<a class="headerlink" href="#http" title="Permanent link">¶</a></h3>
<p>TCP/IP can guarantee that the raw bytes one computer sends will reach their destination, but this leaves completely untouched the problem of how to send meaningful information. In particular, in 1989 the problem Tim Barners-Lee wanted to solve was how to uniquely name hypertext resources in a network and how to access them.</p>
<p>HTTP is the protocol that was devised to solve such a problem and has since greatly evolved. With the help of other protocols such as WebSocket, HTTP invaded areas of communication for which it was originally considered unsuitable such as real-time communication or gaming.</p>
<p>At its core, HTTP is a protocol that states the format of a text request and the possible text responses. The initial version 0.9 published in 1991 defined the concept of URL and allowed only the GET operation that requested a specific resource. HTTP 1.0 and 1.1 added crucial features such as headers, more methods, and important performance optimisations. At the time of writing the adoption of HTTP/2 is around 45% of the websites in the world, and HTTP/3 is still a draft.</p>
<p>The most important feature of HTTP we need to keep in mind as developers is that it is a <em>stateless</em> protocol. This means that the protocol doesn't require the server to keep track of the state of the communication between requests, basically leaving session management to the developer of the service itself.</p>
<p>Session management is crucial nowadays because you usually want to have an authentication layer in front of a service, where a user provides credentials and accesses some private data. It is, however, useful in other contexts such as visual preferences or choices made by the user and re-used in later accesses to the same website. Typical solutions to the session management problem of HTTP involve the use of cookies or session tokens.</p>
<h3 id="https">HTTPS<a class="headerlink" href="#https" title="Permanent link">¶</a></h3>
<p>Security has become a very important word in recent years, and with a reason. The amount of sensitive data we exchange on the Internet or store on digital devices is increasing exponentially, but unfortunately so is the number of malicious attackers and the level of damage they can cause with their actions. The HTTP protocol is inherently</p>
<p>HTTP is inherently insecure, being a plain text communication between two servers that usually happens on a completely untrustable network such as the Internet. While security wasn't an issue when the protocol was initially conceived, it is nowadays a problem of paramount importance, as we exchange private information, often vital for people's security or for businesses. We need to be sure we are sending information to the correct server and that the data we send cannot be intercepted.</p>
<p>HTTPS solves both the problem of tampering and eavesdropping, encrypting HTTP with the Transport Layer Security (TLS) protocol, that also enforces the usage of digital certificates, issued by a trusted authority. At the time of writing, approximately 80% of websites loaded by Firefox use HTTPS by default. When a server receives an HTTPS connection and transforms it into an HTTP one it is usually said that it <em>terminates TLS</em> (or SSL, the old name of TLS).</p>
<h3 id="websocket">WebSocket<a class="headerlink" href="#websocket" title="Permanent link">¶</a></h3>
<p>One great disadvantage of HTTP is that communication is always initiated by the client and that the server can send data only when this is explicitly requested. Polling can be implemented to provide an initial solution, but it cannot guarantee the performances of proper full-duplex communication, where a channel is kept open between server and client and both can send data without being requested. Such a channel is provided by the WebSocket protocol.</p>
<p>WebSocket is a killer technology for applications like online gaming, real-time feeds like financial tickers or sports news, or multimedia communication like conferencing or remote education.</p>
<p>It is important to understand that WebSocket is not HTTP, and can exist without it. It is also true that this new protocol was designed to be used on top of an existing HTTP connection, so a WebSocket communication is often found in parts of a Web page, which was originally retrieved using HTTP in the first place.</p>
<h2 id="implementing-a-service-over-http">Implementing a service over HTTP<a class="headerlink" href="#implementing-a-service-over-http" title="Permanent link">¶</a></h2>
<p>Let's finally start discussing bits and bytes. The starting point for our journey is a service over HTTP, which means there is an HTTP request-response exchange. As an example, let us consider a GET request, the simplest of the HTTP methods.</p>
<div class="highlight"><pre><span></span><code><span class="nf">GET</span> <span class="nn">/</span> <span class="kr">HTTP</span><span class="o">/</span><span class="m">1.1</span>
<span class="na">Host</span><span class="o">:</span> <span class="l">localhost</span>
<span class="na">User-Agent</span><span class="o">:</span> <span class="l">curl/7.65.3</span>
<span class="na">Accept</span><span class="o">:</span> <span class="l">*/*</span>
</code></pre></div>
<p>As you can see, the client is sending a pure text message to the server, with the format specified by the HTTP protocol. The first line contains the method name (<code>GET</code>), the URL (<code>/</code>) and the protocol we are using, including its version (<code>HTTP/1.1</code>). The remaining lines are called <em>headers</em> and contain metadata that can help the server to manage the request. The complete value of the <code>Host</code> header is in this case <code>localhost:80</code>, but as the standard port for HTTP services is 80, we don't need to specify it.</p>
<p>If the server <code>localhost</code> is <em>serving</em> HTTP (i.e. running some software that understands HTTP) on port 80 the response we might get is something similar to</p>
<div class="highlight"><pre><span></span><code><span class="kr">HTTP</span><span class="o">/</span><span class="m">1.0</span> <span class="m">200</span> <span class="ne">OK</span>
<span class="na">Date</span><span class="o">:</span> <span class="l">Mon, 10 Feb 2020 08:41:33 GMT</span>
<span class="na">Content-type</span><span class="o">:</span> <span class="l">text/html</span>
<span class="na">Content-Length</span><span class="o">:</span> <span class="l">26889</span>
<span class="na">Last-Modified</span><span class="o">:</span> <span class="l">Mon, 10 Feb 2020 08:41:27 GMT</span>
<span class="cp"><!DOCTYPE HTML></span>
<span class="p"><</span><span class="nt">html</span><span class="p">></span>
...
<span class="p"></</span><span class="nt">html</span><span class="p">></span>
</code></pre></div>
<p>As happened for the request, the response is a text message, formatted according to the standard. The first line mentions the protocol and the status of the request (<code>200</code> in this case, that means success), while the following lines contain metadata in various headers. Finally, after an empty line, the message contains the resource the client asked for, the source code of the base URL of the website in this case. Since this HTML page probably contains references to other resources like CSS, JS, images, and so on, the browser will send several other requests to gather all the data it needs to properly show the page to the user.</p>
<p>So, the first problem we have is that of implementing a server that understands this protocol and sends a proper response when it receives an HTTP request. We should try to load the requested resource and return either a success (HTTP 200) if we can find it, or a failure (HTTP 404) if we can't.</p>
<h2 id="1-sockets-and-parsers">1 Sockets and parsers<a class="headerlink" href="#1-sockets-and-parsers" title="Permanent link">¶</a></h2>
<h3 id="11-rationale">1.1 Rationale<a class="headerlink" href="#11-rationale" title="Permanent link">¶</a></h3>
<p>TCP/IP is a network protocol that works with <em>sockets</em>. A socket is a tuple of an IP address (unique in the network) and a port (unique for a specific IP address) that the computer uses to communicate with others. A socket is a file-like object in an operating system, that can be thus <em>opened</em> and <em>closed</em>, and that we can <em>read</em> from or <em>write</em> to. Socket programming is a pretty low-level approach to the network, but you need to be aware that every software in your computer that provides network access has ultimately to deal with sockets (most probably through some library, though).</p>
<p>Since we are building things from the ground up, let's implement a small Python program that opens a socket connection, receives an HTTP request, and sends an HTTP response. As port 80 is a "low port" (a number smaller than 1024), we usually don't have permissions to open sockets there, so I will use port 8080. This is not a problem for now, as HTTP can be served on any port.</p>
<h3 id="12-implementation">1.2 Implementation<a class="headerlink" href="#12-implementation" title="Permanent link">¶</a></h3>
<p>Create the file <code>server.py</code> and type this code. Yes, <strong>type it</strong>, don't just copy and paste, you will not learn anything otherwise.</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">socket</span>
<span class="c1">## Create a socket instance</span>
<span class="c1">## AF_INET: use IP protocol version 4</span>
<span class="c1">## SOCK_STREAM: full-duplex byte stream</span>
<span class="n">s</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">socket</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">AF_INET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SOCK_STREAM</span><span class="p">)</span>
<span class="c1">## Allow reuse of addresses</span>
<span class="n">s</span><span class="o">.</span><span class="n">setsockopt</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">SOL_SOCKET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SO_REUSEADDR</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="c1">## Bind the socket to any address, port 8080, and listen</span>
<span class="n">s</span><span class="o">.</span><span class="n">bind</span><span class="p">((</span><span class="s1">''</span><span class="p">,</span> <span class="mi">8080</span><span class="p">))</span>
<span class="n">s</span><span class="o">.</span><span class="n">listen</span><span class="p">()</span>
<span class="c1">## Serve forever</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="c1"># Accept the connection</span>
<span class="n">conn</span><span class="p">,</span> <span class="n">addr</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="n">accept</span><span class="p">()</span>
<span class="c1"># Receive data from this socket using a buffer of 1024 bytes</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">conn</span><span class="o">.</span><span class="n">recv</span><span class="p">(</span><span class="mi">1024</span><span class="p">)</span>
<span class="c1"># Print out the data</span>
<span class="nb">print</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="s1">'utf-8'</span><span class="p">))</span>
<span class="c1"># Close the connection</span>
<span class="n">conn</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
</code></pre></div>
<p>This little program accepts a connection on port 8080 and prints the received data on the terminal. You can test it executing it and then running <code>curl localhost:8080</code> in another terminal. You should see something like</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>python3<span class="w"> </span>server.py<span class="w"> </span>
GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1
Host:<span class="w"> </span>localhost:8080
User-Agent:<span class="w"> </span>curl/7.65.3
Accept:<span class="w"> </span>*/*
</code></pre></div>
<p>The server keeps running the code in the <code>while</code> loop, so if you want to terminate it you have to do it with Ctrl+C. So far so good, but this is not an HTTP server yet, as it sends no response; you should actually receive an error message from curl that says <code>curl: (52) Empty reply from server</code>.</p>
<p>Sending back a standard response is very simple, we just need to call <code>conn.sendall</code> passing the raw bytes. A minimal HTTP response contains the protocol and the status, an empty line, and the actual content, for example</p>
<div class="highlight"><pre><span></span><code><span class="kr">HTTP</span><span class="o">/</span><span class="m">1.1</span> <span class="m">200</span> <span class="ne">OK</span>
Hi there!
</code></pre></div>
<p>Our server becomes then</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">socket</span>
<span class="c1">## Create a socket instance</span>
<span class="c1">## AF_INET: use IP protocol version 4</span>
<span class="c1">## SOCK_STREAM: full-duplex byte stream</span>
<span class="n">s</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">socket</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">AF_INET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SOCK_STREAM</span><span class="p">)</span>
<span class="c1">## Allow reuse of addresses</span>
<span class="n">s</span><span class="o">.</span><span class="n">setsockopt</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">SOL_SOCKET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SO_REUSEADDR</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="c1">## Bind the socket to any address, port 8080, and listen</span>
<span class="n">s</span><span class="o">.</span><span class="n">bind</span><span class="p">((</span><span class="s1">''</span><span class="p">,</span> <span class="mi">8080</span><span class="p">))</span>
<span class="n">s</span><span class="o">.</span><span class="n">listen</span><span class="p">()</span>
<span class="c1">## Serve forever</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="c1"># Accept the connection</span>
<span class="n">conn</span><span class="p">,</span> <span class="n">addr</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="n">accept</span><span class="p">()</span>
<span class="c1"># Receive data from this socket using a buffer of 1024 bytes</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">conn</span><span class="o">.</span><span class="n">recv</span><span class="p">(</span><span class="mi">1024</span><span class="p">)</span>
<span class="c1"># Print out the data</span>
<span class="nb">print</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="s1">'utf-8'</span><span class="p">))</span>
<span class="n">conn</span><span class="o">.</span><span class="n">sendall</span><span class="p">(</span><span class="nb">bytes</span><span class="p">(</span><span class="s2">"HTTP/1.1 200 OK</span><span class="se">\n\n</span><span class="s2">Hi there!</span><span class="se">\n</span><span class="s2">"</span><span class="p">,</span> <span class="s1">'utf-8'</span><span class="p">))</span>
<span class="c1"># Close the connection</span>
<span class="n">conn</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
</code></pre></div>
<p>At this point, we are not really responding to the user's request, however. Try different curl command lines like <code>curl localhost:8080/index.html</code> or <code>curl localhost:8080/main.css</code> and you will always receive the same response. We should try to find the resource the user is asking for and send that back in the response content.</p>
<p>This version of the HTTP server properly extracts the resource and tries to load it from the current directory, returning either a success of a failure</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">socket</span>
<span class="kn">import</span> <span class="nn">re</span>
<span class="c1">## Create a socket instance</span>
<span class="c1">## AF_INET: use IP protocol version 4</span>
<span class="c1">## SOCK_STREAM: full-duplex byte stream</span>
<span class="n">s</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">socket</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">AF_INET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SOCK_STREAM</span><span class="p">)</span>
<span class="c1">## Allow reuse of addresses</span>
<span class="n">s</span><span class="o">.</span><span class="n">setsockopt</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">SOL_SOCKET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SO_REUSEADDR</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="c1">## Bind the socket to any address, port 8080, and listen</span>
<span class="n">s</span><span class="o">.</span><span class="n">bind</span><span class="p">((</span><span class="s1">''</span><span class="p">,</span> <span class="mi">8080</span><span class="p">))</span>
<span class="n">s</span><span class="o">.</span><span class="n">listen</span><span class="p">()</span>
<span class="n">HEAD_200</span> <span class="o">=</span> <span class="s2">"HTTP/1.1 200 OK</span><span class="se">\n\n</span><span class="s2">"</span>
<span class="n">HEAD_404</span> <span class="o">=</span> <span class="s2">"HTTP/1.1 404 Not Found</span><span class="se">\n\n</span><span class="s2">"</span>
<span class="c1">## Serve forever</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="c1"># Accept the connection</span>
<span class="n">conn</span><span class="p">,</span> <span class="n">addr</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="n">accept</span><span class="p">()</span>
<span class="c1"># Receive data from this socket using a buffer of 1024 bytes</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">conn</span><span class="o">.</span><span class="n">recv</span><span class="p">(</span><span class="mi">1024</span><span class="p">)</span>
<span class="n">request</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="s1">'utf-8'</span><span class="p">)</span>
<span class="c1"># Print out the data</span>
<span class="nb">print</span><span class="p">(</span><span class="n">request</span><span class="p">)</span>
<span class="n">resource</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="sa">r</span><span class="s1">'GET /(.*) HTTP'</span><span class="p">,</span> <span class="n">request</span><span class="p">)</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="k">try</span><span class="p">:</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">resource</span><span class="p">,</span> <span class="s1">'r'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">content</span> <span class="o">=</span> <span class="n">HEAD_200</span> <span class="o">+</span> <span class="n">f</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Resource </span><span class="si">{}</span><span class="s1"> correctly served'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">resource</span><span class="p">))</span>
<span class="k">except</span> <span class="ne">FileNotFoundError</span><span class="p">:</span>
<span class="n">content</span> <span class="o">=</span> <span class="n">HEAD_404</span> <span class="o">+</span> <span class="s2">"Resource /</span><span class="si">{}</span><span class="s2"> cannot be found</span><span class="se">\n</span><span class="s2">"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">resource</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Resource </span><span class="si">{}</span><span class="s1"> cannot be loaded'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">resource</span><span class="p">))</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'--------------------'</span><span class="p">)</span>
<span class="n">conn</span><span class="o">.</span><span class="n">sendall</span><span class="p">(</span><span class="nb">bytes</span><span class="p">(</span><span class="n">content</span><span class="p">,</span> <span class="s1">'utf-8'</span><span class="p">))</span>
<span class="c1"># Close the connection</span>
<span class="n">conn</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
</code></pre></div>
<p>As you can see this implementation is extremely simple. If you create a simple local file named <code>index.html</code> with this content</p>
<div class="highlight"><pre><span></span><code><span class="p"><</span><span class="nt">head</span><span class="p">></span>
<span class="p"><</span><span class="nt">title</span><span class="p">></span>This is my page<span class="p"></</span><span class="nt">title</span><span class="p">></span>
<span class="p"><</span><span class="nt">link</span> <span class="na">rel</span><span class="o">=</span><span class="s">"stylesheet"</span> <span class="na">href</span><span class="o">=</span><span class="s">"main.css"</span><span class="p">></span>
<span class="p"></</span><span class="nt">head</span><span class="p">></span>
<span class="p"><</span><span class="nt">html</span><span class="p">></span>
<span class="p"><</span><span class="nt">p</span><span class="p">></span>Some random content<span class="p"></</span><span class="nt">p</span><span class="p">></span>
<span class="p"></</span><span class="nt">html</span><span class="p">></span>
</code></pre></div>
<p>and run <code>curl localhost:8080/index.html</code> you will see the content of the file. At this point, you can even use your browser to open <code>http://localhost:8080/index.html</code> and you will see the title of the page and the content. A Web browser is a software capable of sending HTTP requests and of interpreting the content of the responses if this is HTML (and many other file types like images or videos), so it can <em>render</em> the content of the message. The browser is also responsible of retrieving the missing resources needed for the rendering, so when you provide links to style sheets or JS scripts with the <code><link></code> or the <code><script></code> tags in the HTML code of a page, you are instructing the browser to send an HTTP GET request for those files as well.</p>
<p>The output of <code>server.py</code> when I access <code>http://localhost:8080/index.html</code> is</p>
<div class="highlight"><pre><span></span><code><span class="nf">GET</span> <span class="nn">/index.html</span> <span class="kr">HTTP</span><span class="o">/</span><span class="m">1.1</span>
<span class="na">Host</span><span class="o">:</span> <span class="l">localhost:8080</span>
<span class="na">User-Agent</span><span class="o">:</span> <span class="l">Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0</span>
<span class="na">Accept</span><span class="o">:</span> <span class="l">text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8</span>
<span class="na">Accept-Language</span><span class="o">:</span> <span class="l">en-GB,en;q=0.5</span>
<span class="na">Accept-Encoding</span><span class="o">:</span> <span class="l">gzip, deflate</span>
<span class="na">Connection</span><span class="o">:</span> <span class="l">keep-alive</span>
<span class="na">Upgrade-Insecure-Requests</span><span class="o">:</span> <span class="l">1</span>
<span class="na">Pragma</span><span class="o">:</span> <span class="l">no-cache</span>
<span class="na">Cache-Control</span><span class="o">:</span> <span class="l">no-cache</span>
Resource index.html correctly served
--------------------
GET /main.css HTTP/1.1
Host: localhost:8080
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0
Accept: text/css,*/*;q=0.1
Accept-Language: en-GB,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Referer: http://localhost:8080/index.html
Pragma: no-cache
Cache-Control: no-cache
Resource main.css cannot be loaded
--------------------
GET /favicon.ico HTTP/1.1
Host: localhost:8080
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0
Accept: image/webp,*/*
Accept-Language: en-GB,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Resource favicon.ico cannot be loaded
--------------------
</code></pre></div>
<p>As you can see the browser sends rich HTTP requests, with a lot of headers, automatically requesting the CSS file mentioned in the HTML code and automatically trying to retrieve a favicon image.</p>
<h3 id="13-resources">1.3 Resources<a class="headerlink" href="#13-resources" title="Permanent link">¶</a></h3>
<p>These resources provide more detailed information on the topics discussed in this section</p>
<ul>
<li><a href="https://docs.python.org/3/howto/sockets.html">Python 3 Socket Programming HOWTO</a></li>
<li><a href="https://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5">HTTP/1.1 Request format</a></li>
<li><a href="https://www.w3.org/Protocols/rfc2616/rfc2616-sec6.html#sec6">HTTP/1.1 Response format</a></li>
<li>The source code of this example is available <a href="https://github.com/lgiordani/dissecting-a-web-stack-code/tree/master/1_sockets_and_parsers">here</a></li>
</ul>
<h3 id="14-issues">1.4 Issues<a class="headerlink" href="#14-issues" title="Permanent link">¶</a></h3>
<p>It gives a certain dose of satisfaction to build something from scratch and discover that it works smoothly with full-fledged software like the browser you use every day. I also think it is very interesting to discover that technologies like HTTP, that basically run the world nowadays, are at their core very simple.</p>
<p>That said, there are many features of HTTP that we didn't cover with our simple socket programming. For starters, HTTP/1.0 introduced other methods after GET, such as POST that is of paramount importance for today's websites, where users keep sending information to servers through forms. To implement all 9 HTTP methods we need to properly parse the incoming request and add relevant functions to our code.</p>
<p>At this point, however, you might notice that we are dealing a lot with low-level details of the protocol, which is usually not the core of our business. When we build a service over HTTP we believe that we have the knowledge to properly implement some code that can simplify a certain process, be it searching for other websites, shopping for books or sharing pictures with friends. We don't want to spend our time understanding the subtleties of the TCP/IP sockets and writing parsers for request-response protocols. It is nice to see how these technologies work, but on a daily basis, we need to focus on something at a higher level.</p>
<p>The situation of our small HTTP server is possibly worsened by the fact that HTTP is a stateless protocol. The protocol doesn't provide any way to connect two successive requests, thus keeping track of the <em>state</em> of the communication, which is the cornerstone of modern Internet. Every time we authenticate on a website and we want to visit other pages we need the server to remember who we are, and this implies keeping track of the state of the connection.</p>
<p>Long story short: to work as a proper HTTP server, our code should at this point implement all HTTP methods and cookies management. We also need to support other protocols like Websockets. These are all but trivial tasks, so we definitely need to add some component to the whole system that lets us focus on the business logic and not on the low-level details of application protocols.</p>
<h2 id="2-web-framework">2 Web framework<a class="headerlink" href="#2-web-framework" title="Permanent link">¶</a></h2>
<h3 id="21-rationale">2.1 Rationale<a class="headerlink" href="#21-rationale" title="Permanent link">¶</a></h3>
<p>Enter the Web framework!</p>
<p>As I discussed many times (see <a href="https://www.thedigitalcatonline.com/blog/2018/12/20/cabook/">the book on clean architectures</a> or <a href="https://www.thedigitalcatonline.com/blog/2016/11/14/clean-architectures-in-python-a-step-by-step-example/">the relative post</a>) the role of the Web framework is that of <em>converting HTTP requests into function calls</em>, and function return values into HTTP responses. The framework's true nature is that of a layer that connects a working business logic to the Web, through HTTP and related protocols. The framework takes care of session management for us and maps URLs to functions, allowing us to focus on the application logic.</p>
<p>In the grand scheme of an HTTP service, this is what the framework is supposed to do. Everything the framework provides out of this scope, like layers to access DBs, template engines, and interfaces to other systems, is an addition that you, as a programmer, may find useful, but is not in principle part of the reason why we added the framework to the system. We add the framework because it acts as a layer between our business logic and HTTP.</p>
<h3 id="22-implementation">2.2 Implementation<a class="headerlink" href="#22-implementation" title="Permanent link">¶</a></h3>
<p>Thanks to Miguel Gringberg and his <a href="https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-i-hello-world">amazing Flask mega-tutorial</a> I can set up Flask in seconds. I will not run through the tutorial here, as you can follow it on Miguel's website. I will only use the content of the first article (out of 23!) to create an extremely simple "Hello, world" application.</p>
<p>To run the following example you will need a virtual environment and you will have to <code>pip install flask</code>. Follow Miguel's tutorial if you need more details on this.</p>
<p>The <code>app/__init__.py</code> file is</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">flask</span> <span class="kn">import</span> <span class="n">Flask</span>
<span class="n">application</span> <span class="o">=</span> <span class="n">Flask</span><span class="p">(</span><span class="vm">__name__</span><span class="p">)</span>
<span class="kn">from</span> <span class="nn">app</span> <span class="kn">import</span> <span class="n">routes</span>
</code></pre></div>
<p>and the <code>app/routes.py</code> file is</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">app</span> <span class="kn">import</span> <span class="n">application</span>
<span class="nd">@application</span><span class="o">.</span><span class="n">route</span><span class="p">(</span><span class="s1">'/'</span><span class="p">)</span>
<span class="nd">@application</span><span class="o">.</span><span class="n">route</span><span class="p">(</span><span class="s1">'/index'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">index</span><span class="p">():</span>
<span class="k">return</span> <span class="s2">"Hello, world!"</span>
</code></pre></div>
<p>You can already see here the power of a framework in action. We defined an <code>index</code> function and connected it with two different URLs (<code>/</code> and <code>/index</code>) in 3 lines of Python. This leaves us time and energy to properly work on the business logic, that in this case is a revolutionary "Hello, world!". Nobody ever did this before.</p>
<p>Finally, the <code>service.py</code> file is</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">app</span> <span class="kn">import</span> <span class="n">application</span>
</code></pre></div>
<p>Flask comes with a so-called development web server (do these words ring any bell now?) that we can run on a terminal</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span><span class="nv">FLASK_APP</span><span class="o">=</span>service.py<span class="w"> </span>flask<span class="w"> </span>run
<span class="w"> </span>*<span class="w"> </span>Serving<span class="w"> </span>Flask<span class="w"> </span>app<span class="w"> </span><span class="s2">"service.py"</span>
<span class="w"> </span>*<span class="w"> </span>Environment:<span class="w"> </span>production
<span class="w"> </span>WARNING:<span class="w"> </span>This<span class="w"> </span>is<span class="w"> </span>a<span class="w"> </span>development<span class="w"> </span>server.<span class="w"> </span>Do<span class="w"> </span>not<span class="w"> </span>use<span class="w"> </span>it<span class="w"> </span><span class="k">in</span><span class="w"> </span>a<span class="w"> </span>production<span class="w"> </span>deployment.
<span class="w"> </span>Use<span class="w"> </span>a<span class="w"> </span>production<span class="w"> </span>WSGI<span class="w"> </span>server<span class="w"> </span>instead.
<span class="w"> </span>*<span class="w"> </span>Debug<span class="w"> </span>mode:<span class="w"> </span>off
<span class="w"> </span>*<span class="w"> </span>Running<span class="w"> </span>on<span class="w"> </span>http://127.0.0.1:5000/<span class="w"> </span><span class="o">(</span>Press<span class="w"> </span>CTRL+C<span class="w"> </span>to<span class="w"> </span>quit<span class="o">)</span>
</code></pre></div>
<p>You can now visit the given URL with your browser and see that everything works properly. Remember that 127.0.0.1 is the special IP address that refers to "this computer"; the name <code>localhost</code> is usually created by the operating system as an alias for that, so the two are interchangeable. As you can see the standard port for Flask's development server is 5000, so you have to mention it explicitly, otherwise your browser would try to access port 80 (the default HTTP one). When you connect with the browser you will see some log messages about the HTTP requests</p>
<div class="highlight"><pre><span></span><code><span class="m">127</span>.0.0.1<span class="w"> </span>-<span class="w"> </span>-<span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020<span class="w"> </span><span class="m">14</span>:54:27<span class="o">]</span><span class="w"> </span><span class="s2">"GET / HTTP/1.1"</span><span class="w"> </span><span class="m">200</span><span class="w"> </span>-
<span class="m">127</span>.0.0.1<span class="w"> </span>-<span class="w"> </span>-<span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020<span class="w"> </span><span class="m">14</span>:54:28<span class="o">]</span><span class="w"> </span><span class="s2">"GET /favicon.ico HTTP/1.1"</span><span class="w"> </span><span class="m">404</span><span class="w"> </span>-
</code></pre></div>
<p>You can recognise both now, as those are the same request we got with our little server in the previous part of the article.</p>
<h3 id="23-resources">2.3 Resources<a class="headerlink" href="#23-resources" title="Permanent link">¶</a></h3>
<p>These resources provide more detailed information on the topics discussed in this section</p>
<ul>
<li><a href="https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-i-hello-world">Miguel Gringberg's amazing Flask mega-tutorial</a></li>
<li><a href="https://en.wikipedia.org/wiki/Localhost">What is localhost</a></li>
<li>The source code of this example is available <a href="https://github.com/lgiordani/dissecting-a-web-stack-code/tree/master/2_web_framework">here</a></li>
</ul>
<h3 id="24-issues">2.4 Issues<a class="headerlink" href="#24-issues" title="Permanent link">¶</a></h3>
<p>Apparently, we solved all our problems, and many programmers just stop here. They learn how to use the framework (which is a big achievement!), but as we will shortly discover, this is not enough for a production system. Let's have a closer look at the output of the Flask server. It clearly says, among other things</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span>WARNING:<span class="w"> </span>This<span class="w"> </span>is<span class="w"> </span>a<span class="w"> </span>development<span class="w"> </span>server.<span class="w"> </span>Do<span class="w"> </span>not<span class="w"> </span>use<span class="w"> </span>it<span class="w"> </span><span class="k">in</span><span class="w"> </span>a<span class="w"> </span>production<span class="w"> </span>deployment.
<span class="w"> </span>Use<span class="w"> </span>a<span class="w"> </span>production<span class="w"> </span>WSGI<span class="w"> </span>server<span class="w"> </span>instead.
</code></pre></div>
<p>The main issue we have when we deal with any production system is represented by performances. Think about what we do with JavaScript when we minimise the code: we consciously obfuscate the code in order to make the file smaller, but this is done for the sole purpose of making the file faster to retrieve.</p>
<p>For HTTP servers the story is not very different. The Web framework usually provides a development Web server, as Flask does, which properly implements HTTP, but does it in a very inefficient way. For starters, this is a <em>blocking</em> framework, which means that if our request takes seconds to be served (for example because the endpoint retrieves data from a very slow database), any other request will have to wait to be served in a queue. That ultimately means that the user will see a spinner in the browser's tab and just shake their head thinking that we can't build a modern website. Other performances concerns might be connected with memory management or disk caches, but in general, we are safe to say that this web server cannot handle any production load (i.e. multiple users accessing the web site at the same time and expecting good quality of service).</p>
<p>This is hardly surprising. After all, we didn't want to deal with TCP/IP connections to focus on our business, so we delegated this to other coders who maintain the framework. The framework's authors, in turn, want to focus on things like middleware, routes, proper handling of HTTP methods, and so on. They don't want to spend time trying to optimise the performances of the "multi-user" experience. This is especially true in the Python world (and somehow less true for Node.js, for example): Python is not heavily concurrency-oriented, and both the style of programming and the performances are not favouring fast, non-blocking applications. This is changing lately, with async and improvements in the interpreter, but I leave this for another post.</p>
<p>So, now that we have a full-fledged HTTP service, we need to make it so fast that users won't even notice this is not running locally on their computer.</p>
<h2 id="3-concurrency-and-facades">3 Concurrency and façades<a class="headerlink" href="#3-concurrency-and-facades" title="Permanent link">¶</a></h2>
<h3 id="31-rationale">3.1 Rationale<a class="headerlink" href="#31-rationale" title="Permanent link">¶</a></h3>
<p>Well, whenever you have performance issues, just go for concurrency. Now you have many problems!
(see <a href="https://twitter.com/davidlohr/status/288786300067270656?lang=en">here</a>)</p>
<p>Yes, concurrency solves many problems and it's the source of just as much, so we need to find a way to use it in the safest and less complicated way. We basically might want to add a layer that runs the framework in some concurrent way, without requiring us to change anything in the framework itself.</p>
<p>And whenever you have to homogenise different things just create a layer of indirection. This solves any problem but one. (see <a href="https://en.wikipedia.org/wiki/Fundamental_theorem_of_software_engineering">here</a>)</p>
<p>So we need to create a layer that runs our service in a concurrent way, but we also want to keep it detached from the specific implementation of the service, that is independent of the framework or library that we are using.</p>
<h3 id="32-implementation">3.2 Implementation<a class="headerlink" href="#32-implementation" title="Permanent link">¶</a></h3>
<p>In this case, the solution is that of giving a <em>specification</em> of the API that web frameworks have to expose, in order to be usable by independent third-party components. In the Python world, this set of rules has been named WSGI, the Web Server Gateway Interface, but such interfaces exist for other languages such as Java or Ruby. The "gateway" mentioned here is the part of the system outside the framework, which in this discussion is the part that deals with production performances. Through WSGI we are defining a way for frameworks to expose a common interface, leaving people interested in concurrency free to implement something independently.</p>
<p>If the framework is compatible with the gateway interface, we can add software that deals with concurrency and uses the framework through the compatibility layer. Such a component is a production-ready HTTP server, and two common choices in the Python world are Gunicorn and uWSGI.</p>
<p>Production-ready HTTP server means that the software understands HTTP as the development server already did, but at the same time pushes performances in order to sustain a bigger workload, and as we said before this is done through concurrency.</p>
<p>Flask is compatible with WSGI, so we can make it work with Gunicorn. To install it in our virtual environment run <code>pip install gunicorn</code> and set it up creating a file names <code>wsgi.py</code> with the following content</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">app</span> <span class="kn">import</span> <span class="n">application</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">"__main__"</span><span class="p">:</span>
<span class="n">application</span><span class="o">.</span><span class="n">run</span><span class="p">()</span>
</code></pre></div>
<p>To run Gunicorn specify the number of concurrent instances and the external port</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>gunicorn<span class="w"> </span>--workers<span class="w"> </span><span class="m">3</span><span class="w"> </span>--bind<span class="w"> </span><span class="m">0</span>.0.0.0:8000<span class="w"> </span>wsgi
<span class="o">[</span><span class="m">2020</span>-02-12<span class="w"> </span><span class="m">18</span>:39:07<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">13393</span><span class="o">]</span><span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Starting<span class="w"> </span>gunicorn<span class="w"> </span><span class="m">20</span>.0.4
<span class="o">[</span><span class="m">2020</span>-02-12<span class="w"> </span><span class="m">18</span>:39:07<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">13393</span><span class="o">]</span><span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Listening<span class="w"> </span>at:<span class="w"> </span>http://0.0.0.0:8000<span class="w"> </span><span class="o">(</span><span class="m">13393</span><span class="o">)</span>
<span class="o">[</span><span class="m">2020</span>-02-12<span class="w"> </span><span class="m">18</span>:39:07<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">13393</span><span class="o">]</span><span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Using<span class="w"> </span>worker:<span class="w"> </span>sync
<span class="o">[</span><span class="m">2020</span>-02-12<span class="w"> </span><span class="m">18</span>:39:07<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">13396</span><span class="o">]</span><span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Booting<span class="w"> </span>worker<span class="w"> </span>with<span class="w"> </span>pid:<span class="w"> </span><span class="m">13396</span>
<span class="o">[</span><span class="m">2020</span>-02-12<span class="w"> </span><span class="m">18</span>:39:07<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">13397</span><span class="o">]</span><span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Booting<span class="w"> </span>worker<span class="w"> </span>with<span class="w"> </span>pid:<span class="w"> </span><span class="m">13397</span>
<span class="o">[</span><span class="m">2020</span>-02-12<span class="w"> </span><span class="m">18</span>:39:07<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">13398</span><span class="o">]</span><span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Booting<span class="w"> </span>worker<span class="w"> </span>with<span class="w"> </span>pid:<span class="w"> </span><span class="m">13398</span>
</code></pre></div>
<p>As you can see, Gunicorn has the concept of <em>workers</em> which are a generic way to express concurrency. Specifically, Gunicorn implements a pre-fork worker model, which means that it (pre)creates a different Unix process for each worker. You can check this running <code>ps</code></p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>ps<span class="w"> </span>ax<span class="w"> </span><span class="p">|</span><span class="w"> </span>grep<span class="w"> </span>gunicorn
<span class="m">14919</span><span class="w"> </span>pts/1<span class="w"> </span>S+<span class="w"> </span><span class="m">0</span>:00<span class="w"> </span>~/venv3/bin/python3<span class="w"> </span>~/venv3/bin/gunicorn<span class="w"> </span>--workers<span class="w"> </span><span class="m">3</span><span class="w"> </span>--bind<span class="w"> </span><span class="m">0</span>.0.0.0:8000<span class="w"> </span>wsgi
<span class="m">14922</span><span class="w"> </span>pts/1<span class="w"> </span>S+<span class="w"> </span><span class="m">0</span>:00<span class="w"> </span>~/venv3/bin/python3<span class="w"> </span>~/venv3/bin/gunicorn<span class="w"> </span>--workers<span class="w"> </span><span class="m">3</span><span class="w"> </span>--bind<span class="w"> </span><span class="m">0</span>.0.0.0:8000<span class="w"> </span>wsgi
<span class="m">14923</span><span class="w"> </span>pts/1<span class="w"> </span>S+<span class="w"> </span><span class="m">0</span>:00<span class="w"> </span>~/venv3/bin/python3<span class="w"> </span>~/venv3/bin/gunicorn<span class="w"> </span>--workers<span class="w"> </span><span class="m">3</span><span class="w"> </span>--bind<span class="w"> </span><span class="m">0</span>.0.0.0:8000<span class="w"> </span>wsgi
<span class="m">14924</span><span class="w"> </span>pts/1<span class="w"> </span>S+<span class="w"> </span><span class="m">0</span>:00<span class="w"> </span>~/venv3/bin/python3<span class="w"> </span>~/venv3/bin/gunicorn<span class="w"> </span>--workers<span class="w"> </span><span class="m">3</span><span class="w"> </span>--bind<span class="w"> </span><span class="m">0</span>.0.0.0:8000<span class="w"> </span>wsgi
</code></pre></div>
<p>Using processes is just one of the two ways to implement concurrency in a Unix system, the other being using threads. The benefits and demerits of each solution are outside the scope of this post, however. For the time being just remember that you are dealing with multiple workers that process incoming requests asynchronously, thus implementing a non-blocking server, ready to accept multiple connections.</p>
<h3 id="33-resources">3.3 Resources<a class="headerlink" href="#33-resources" title="Permanent link">¶</a></h3>
<p>These resources provide more detailed information on the topics discussed in this section</p>
<ul>
<li>The <a href="https://wsgi.readthedocs.io/en/latest/index.html">WSGI official documentation</a> and the <a href="https://en.wikipedia.org/wiki/Web_Server_Gateway_Interface">Wikipedia page
</a></li>
<li>The homepages of <a href="https://gunicorn.org/">Gunicorn</a> and <a href="https://uwsgi-docs.readthedocs.io/en/latest/">uWSGI</a></li>
<li>A good entry point for your journey into the crazy world of concurrency: <a href="https://en.wikipedia.org/wiki/Multithreading_(computer_architecture)">multithreading</a>.</li>
<li>The source code of this example is available <a href="https://github.com/lgiordani/dissecting-a-web-stack-code/tree/master/3_concurrency_and_facades">here</a></li>
</ul>
<h3 id="34-issues">3.4 Issues<a class="headerlink" href="#34-issues" title="Permanent link">¶</a></h3>
<p>Using a Gunicorn we have now a production-ready HTTP server, and apparently implemented everything we need. There are still many considerations and missing pieces, though.</p>
<h4 id="performances-again">Performances (again)<a class="headerlink" href="#performances-again" title="Permanent link">¶</a></h4>
<p>Are 3 workers enough to sustain the load of our new killer mobile application? We expect thousands of visitors per minute, so maybe we should add some. But while we increase the amount of workers, we have to keep in mind that the machine we are using has a finite amount of CPU power and memory. So, once again, we have to focus on performances, and in particular on scalability: how can we keep adding workers without having to stop the application, replace the machine with a more powerful one, and restart the service?</p>
<h4 id="embrace-change">Embrace change<a class="headerlink" href="#embrace-change" title="Permanent link">¶</a></h4>
<p>This is not the only problem we have to face in production. An important aspect of technology is that it changes over time, as new and (hopefully) better solutions become widespread. We usually design systems dividing them as much as possible into communicating layers exactly because we want to be free to replace a layer with something else, be it a simpler component or a more advanced one, one with better performances or maybe just a cheaper one. So, once again, we want to be able to evolve the underlying system keeping the same interface, exactly as we did in the case of web frameworks.</p>
<h4 id="https_1">HTTPS<a class="headerlink" href="#https_1" title="Permanent link">¶</a></h4>
<p>Another missing part of the system is HTTPS. Gunicorn and uWSGI do not understand the HTTPS protocol, so we need something in front of them that will deal with the "S" part of the protocol, leaving the "HTTP" part to the internal layers.</p>
<h4 id="load-balancers">Load balancers<a class="headerlink" href="#load-balancers" title="Permanent link">¶</a></h4>
<p>In general, a <em>load balancer</em> is just a component in a system that distributes work among a pool of workers. Gunicorn is already distributing load among its workers, so this is not a new concept, but we generally want to do it on a bigger level, among machines or among entire systems. Load balancing can be hierarchical and be structured on many levels. We can also assign more importance to some components of the system, flagging them as ready to accept more load (for example because their hardware is better). Load balancers are extremely important in network services, and the definition of load can be extremely different from system to system: generally speaking, in a Web service the number of connections is the standard measure of the load, as we assume that on average all connections bring the same amount of work to the system.</p>
<h4 id="reverse-proxies">Reverse proxies<a class="headerlink" href="#reverse-proxies" title="Permanent link">¶</a></h4>
<p>Load balancers are forward proxies, as they allow a client to contact any server in a pool. At the same time, a <em>reverse proxy</em> allows a client to retrieve data produced by several systems through the same entry point. Reverse proxies are a perfect way to route HTTP requests to sub-systems that can be implemented with different technologies. For example, you might want to have part of the system implemented with Python, using Django and Postgres, and another part served by an AWS Lambda function written in Go and connected with a non-relational database such as DynamoDB. Usually, in HTTP services this choice is made according to the URL (for example routing every URL that begins with <code>/api/</code>).</p>
<h4 id="logic">Logic<a class="headerlink" href="#logic" title="Permanent link">¶</a></h4>
<p>We also want a layer that can implement a certain amount of logic, to manage simple rules that are not related to the service we implemented. A typical example is that of HTTP redirections: what happens if a user accesses the service with an <code>http://</code> prefix instead of <code>https://</code>? The correct way to deal with this is through an HTTP 301 code, but you don't want such a request to reach your framework, wasting resources for such a simple task.</p>
<h2 id="4-the-web-server">4 The Web server<a class="headerlink" href="#4-the-web-server" title="Permanent link">¶</a></h2>
<h3 id="41-rationale">4.1 Rationale<a class="headerlink" href="#41-rationale" title="Permanent link">¶</a></h3>
<p>The general label of <em>Web server</em> is given to software that performs the tasks we discussed. Two very common choices for this part of the system are nginx and Apache, two open source projects that are currently leading the market. With different technical approaches, they both implement all the features we discussed in the previous section (and many more).</p>
<h3 id="42-implementation">4.2 Implementation<a class="headerlink" href="#42-implementation" title="Permanent link">¶</a></h3>
<p>To test nginx without having to fight with the OS and install too many packages we can use Docker. Docker is useful to simulate a multi-machine environment, but it might also be your technology of choice for the actual production environment (AWS ECS works with Docker containers, for example).</p>
<p>The base configuration that we will run is very simple. One container will contain the Flask code and run the framework with Gunicorn, while the other container will run nginx. Gunicorn will serve HTTP on the internal port 8000, not exposed by Docker and thus not reachable from our browser, while nignx will expose port 80, the traditional HTTP port.</p>
<p>In the same directory of the file <code>wsgi.py</code>, create a <code>Dockerfile</code></p>
<div class="highlight"><pre><span></span><code><span class="k">FROM</span><span class="w"> </span><span class="s">python:3.6</span>
<span class="k">ADD</span><span class="w"> </span>app<span class="w"> </span>/app
<span class="k">ADD</span><span class="w"> </span>wsgi.py<span class="w"> </span>/
<span class="k">WORKDIR</span><span class="w"> </span><span class="s">.</span>
<span class="k">RUN</span><span class="w"> </span>pip<span class="w"> </span>install<span class="w"> </span>flask<span class="w"> </span>gunicorn
<span class="k">EXPOSE</span><span class="w"> </span><span class="s">8000</span>
</code></pre></div>
<p>This starts from a Python Docker image, adds the <code>app</code> directory and the <code>wsgi.py</code> file, and installs Gunicorn. Now create a configuration for nginx in a file called <code>nginx.conf</code> in the same directory</p>
<div class="highlight"><pre><span></span><code><span class="k">server</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kn">listen</span><span class="w"> </span><span class="mi">80</span><span class="p">;</span>
<span class="w"> </span><span class="kn">server_name</span><span class="w"> </span><span class="s">localhost</span><span class="p">;</span>
<span class="w"> </span><span class="kn">location</span><span class="w"> </span><span class="s">/</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kn">proxy_pass</span><span class="w"> </span><span class="s">http://application:8000/</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>This defines a server that listens on port 80 and that connects all the URL starting with <code>/</code> with a server called <code>application</code> on port 8000, which is the container running Gunicorn.</p>
<p>Last, create a file <code>docker-compose.yml</code> that will describe the configuration of the containers.</p>
<div class="highlight"><pre><span></span><code><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="s">"3.7"</span>
<span class="nt">services</span><span class="p">:</span>
<span class="w"> </span><span class="nt">application</span><span class="p">:</span>
<span class="w"> </span><span class="nt">build</span><span class="p">:</span>
<span class="w"> </span><span class="nt">context</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">.</span>
<span class="w"> </span><span class="nt">dockerfile</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Dockerfile</span>
<span class="w"> </span><span class="nt">command</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">gunicorn --workers 3 --bind 0.0.0.0:8000 wsgi</span>
<span class="w"> </span><span class="nt">expose</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">8000</span>
<span class="w"> </span><span class="nt">nginx</span><span class="p">:</span>
<span class="w"> </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">nginx</span>
<span class="w"> </span><span class="nt">volumes</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">./nginx.conf:/etc/nginx/conf.d/default.conf</span>
<span class="w"> </span><span class="nt">ports</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">8080:80</span>
<span class="w"> </span><span class="nt">depends_on</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">application</span>
</code></pre></div>
<p>As you can see the name <code>application</code> that we mentioned in the nginx configuration file is not a magic string, but is the name we assigned to the Gunicorn container in the Docker Compose configuration. Please note that nginx listens on port 80 inside the container, but the port is published as 8080 on the host.</p>
<p>To create this infrastructure we need to install Docker Compose in our virtual environment through <code>pip install docker-compose</code>. I also created a file named <code>.env</code> with the name of the project</p>
<div class="highlight"><pre><span></span><code><span class="nv">COMPOSE_PROJECT_NAME</span><span class="o">=</span>service
</code></pre></div>
<p>At this point you can run Docker Compose with <code>docker-compose up -d</code></p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>docker-compose<span class="w"> </span>up<span class="w"> </span>-d
Creating<span class="w"> </span>network<span class="w"> </span><span class="s2">"service_default"</span><span class="w"> </span>with<span class="w"> </span>the<span class="w"> </span>default<span class="w"> </span>driver
Creating<span class="w"> </span>service_application_1<span class="w"> </span>...<span class="w"> </span><span class="k">done</span>
Creating<span class="w"> </span>service_nginx_1<span class="w"> </span>...<span class="w"> </span><span class="k">done</span>
</code></pre></div>
<p>If everything is working correctly, opening the browser and visiting <code>localhost:8080</code> should show you the HTML page Flask is serving.</p>
<p>Through <code>docker-compose logs</code> we can check what services are doing. We can recognise the output of Gunicorn in the logs of the service named <code>application</code></p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>docker-compose<span class="w"> </span>logs<span class="w"> </span>application
Attaching<span class="w"> </span>to<span class="w"> </span>service_application_1
application_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">2020</span>-02-14<span class="w"> </span><span class="m">08</span>:35:42<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">1</span><span class="o">]</span><span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Starting<span class="w"> </span>gunicorn<span class="w"> </span><span class="m">20</span>.0.4
application_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">2020</span>-02-14<span class="w"> </span><span class="m">08</span>:35:42<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">1</span><span class="o">]</span><span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Listening<span class="w"> </span>at:<span class="w"> </span>http://0.0.0.0:8000<span class="w"> </span><span class="o">(</span><span class="m">1</span><span class="o">)</span>
application_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">2020</span>-02-14<span class="w"> </span><span class="m">08</span>:35:42<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">1</span><span class="o">]</span><span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Using<span class="w"> </span>worker:<span class="w"> </span>sync
application_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">2020</span>-02-14<span class="w"> </span><span class="m">08</span>:35:42<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">8</span><span class="o">]</span><span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Booting<span class="w"> </span>worker<span class="w"> </span>with<span class="w"> </span>pid:<span class="w"> </span><span class="m">8</span>
application_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">2020</span>-02-14<span class="w"> </span><span class="m">08</span>:35:42<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">9</span><span class="o">]</span><span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Booting<span class="w"> </span>worker<span class="w"> </span>with<span class="w"> </span>pid:<span class="w"> </span><span class="m">9</span>
application_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">2020</span>-02-14<span class="w"> </span><span class="m">08</span>:35:42<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">10</span><span class="o">]</span><span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Booting<span class="w"> </span>worker<span class="w"> </span>with<span class="w"> </span>pid:<span class="w"> </span><span class="m">10</span>
</code></pre></div>
<p>but the one we are mostly interested with now is the service named <code>nginx</code>, so let's follow the logs in real-time with <code>docker-compose logs -f nginx</code>. Refresh the <code>localhost</code> page you visited with the browser, and the container should output something like</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>docker-compose<span class="w"> </span>logs<span class="w"> </span>-f<span class="w"> </span>nginx
Attaching<span class="w"> </span>to<span class="w"> </span>service_nginx_1
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">192</span>.168.192.1<span class="w"> </span>-<span class="w"> </span>-<span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:08:42:20<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span><span class="s2">"GET / HTTP/1.1"</span><span class="w"> </span><span class="m">200</span><span class="w"> </span><span class="m">13</span><span class="w"> </span><span class="s2">"-"</span><span class="w"> </span><span class="s2">"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0"</span><span class="w"> </span><span class="s2">"-"</span>
</code></pre></div>
<p>which is the standard log format of nginx. It shows the IP address of the client (<code>192.168.192.1</code>), the connection timestamp, the HTTP request and the response status code (200), plus other information on the client itself.</p>
<p>Let's now increase the number of services, to see the load balancing mechanism in action. To do this, first we need to change the log format of nginx to show the IP address of the machine that served the request. Change the <code>nginx.conf</code> file adding the <code>log_format</code> and <code>access_log</code> options</p>
<div class="highlight"><pre><span></span><code><span class="k">log_format</span><span class="w"> </span><span class="s">upstreamlog</span><span class="w"> </span><span class="s">'[</span><span class="nv">$time_local]</span><span class="w"> </span><span class="nv">$host</span><span class="w"> </span><span class="s">to:</span><span class="w"> </span><span class="nv">$upstream_addr:</span><span class="w"> </span><span class="nv">$request</span><span class="w"> </span><span class="nv">$status'</span><span class="p">;</span>
<span class="k">server</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kn">listen</span><span class="w"> </span><span class="mi">80</span><span class="p">;</span>
<span class="w"> </span><span class="kn">server_name</span><span class="w"> </span><span class="s">localhost</span><span class="p">;</span>
<span class="w"> </span><span class="kn">location</span><span class="w"> </span><span class="s">/</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kn">proxy_pass</span><span class="w"> </span><span class="s">http://application:8000</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kn">access_log</span><span class="w"> </span><span class="s">/var/log/nginx/access.log</span><span class="w"> </span><span class="s">upstreamlog</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div>
<p>The <code>$upstream_addr</code> variable is the one that contains the IP address of the server proxied by nginx. Now run <code>docker-compose down</code> to stop all containers and then <code>docker-compose up -d --scale application=3</code> to start them again</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>docker-compose<span class="w"> </span>down
Stopping<span class="w"> </span>service_nginx_1<span class="w"> </span>...<span class="w"> </span><span class="k">done</span>
Stopping<span class="w"> </span>service_application_1<span class="w"> </span>...<span class="w"> </span><span class="k">done</span>
Removing<span class="w"> </span>service_nginx_1<span class="w"> </span>...<span class="w"> </span><span class="k">done</span>
Removing<span class="w"> </span>service_application_1<span class="w"> </span>...<span class="w"> </span><span class="k">done</span>
Removing<span class="w"> </span>network<span class="w"> </span>service_default
$<span class="w"> </span>docker-compose<span class="w"> </span>up<span class="w"> </span>-d<span class="w"> </span>--scale<span class="w"> </span><span class="nv">application</span><span class="o">=</span><span class="m">3</span>
Creating<span class="w"> </span>network<span class="w"> </span><span class="s2">"service_default"</span><span class="w"> </span>with<span class="w"> </span>the<span class="w"> </span>default<span class="w"> </span>driver
Creating<span class="w"> </span>service_application_1<span class="w"> </span>...<span class="w"> </span><span class="k">done</span>
Creating<span class="w"> </span>service_application_2<span class="w"> </span>...<span class="w"> </span><span class="k">done</span>
Creating<span class="w"> </span>service_application_3<span class="w"> </span>...<span class="w"> </span><span class="k">done</span>
Creating<span class="w"> </span>service_nginx_1<span class="w"> </span>...<span class="w"> </span><span class="k">done</span>
</code></pre></div>
<p>As you can see, Docker Compose runs now 3 containers for the <code>application</code> service. If you open the logs stream and visit the page in the browser you will now see a slightly different output</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>docker-compose<span class="w"> </span>logs<span class="w"> </span>-f<span class="w"> </span>nginx
Attaching<span class="w"> </span>to<span class="w"> </span>service_nginx_1
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:09:00:16<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span>localhost<span class="w"> </span>to:<span class="w"> </span><span class="m">192</span>.168.240.4:8000:<span class="w"> </span>GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1<span class="w"> </span><span class="m">200</span>
</code></pre></div>
<p>where you can spot <code>to: 192.168.240.4:8000</code> which is the IP address of one of the application containers. Please note that the IP address you see might be different, as it depends on the Docker network settings. If you now visit the page again multiple times you should notice a change in the upstream address, something like</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>docker-compose<span class="w"> </span>logs<span class="w"> </span>-f<span class="w"> </span>nginx
Attaching<span class="w"> </span>to<span class="w"> </span>service_nginx_1
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:09:00:16<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span>localhost<span class="w"> </span>to:<span class="w"> </span><span class="m">192</span>.168.240.4:8000:<span class="w"> </span>GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1<span class="w"> </span><span class="m">200</span>
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:09:00:17<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span>localhost<span class="w"> </span>to:<span class="w"> </span><span class="m">192</span>.168.240.2:8000:<span class="w"> </span>GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1<span class="w"> </span><span class="m">200</span>
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:09:00:17<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span>localhost<span class="w"> </span>to:<span class="w"> </span><span class="m">192</span>.168.240.3:8000:<span class="w"> </span>GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1<span class="w"> </span><span class="m">200</span>
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:09:00:17<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span>localhost<span class="w"> </span>to:<span class="w"> </span><span class="m">192</span>.168.240.4:8000:<span class="w"> </span>GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1<span class="w"> </span><span class="m">200</span>
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:09:00:17<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span>localhost<span class="w"> </span>to:<span class="w"> </span><span class="m">192</span>.168.240.2:8000:<span class="w"> </span>GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1<span class="w"> </span><span class="m">200</span>
</code></pre></div>
<p>This shows that nginx is performing load balancing, but to tell the truth this is happening through Docker's DNS, and not by an explicit action performed by the web server. We can verify this accessing the nginx container and running <code>dig application</code> (you need to run <code>apt update</code> and <code>apt install dnsutils</code> to install <code>dig</code>)</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>docker-compose<span class="w"> </span><span class="nb">exec</span><span class="w"> </span>nginx<span class="w"> </span>/bin/bash
root@99c2f348140e:/#<span class="w"> </span>apt<span class="w"> </span>update
root@99c2f348140e:/#<span class="w"> </span>apt<span class="w"> </span>install<span class="w"> </span>-y<span class="w"> </span>dnsutils
root@99c2f348140e:/#<span class="w"> </span>dig<span class="w"> </span>application
<span class="p">;</span><span class="w"> </span><<>><span class="w"> </span>DiG<span class="w"> </span><span class="m">9</span>.11.5-P4-5.1-Debian<span class="w"> </span><<>><span class="w"> </span>application
<span class="p">;;</span><span class="w"> </span>global<span class="w"> </span>options:<span class="w"> </span>+cmd
<span class="p">;;</span><span class="w"> </span>Got<span class="w"> </span>answer:
<span class="p">;;</span><span class="w"> </span>->>HEADER<span class="s"><<- opco</span>de:<span class="w"> </span>QUERY,<span class="w"> </span>status:<span class="w"> </span>NOERROR,<span class="w"> </span>id:<span class="w"> </span><span class="m">7221</span>
<span class="p">;;</span><span class="w"> </span>flags:<span class="w"> </span>qr<span class="w"> </span>rd<span class="w"> </span>ra<span class="p">;</span><span class="w"> </span>QUERY:<span class="w"> </span><span class="m">1</span>,<span class="w"> </span>ANSWER:<span class="w"> </span><span class="m">3</span>,<span class="w"> </span>AUTHORITY:<span class="w"> </span><span class="m">0</span>,<span class="w"> </span>ADDITIONAL:<span class="w"> </span><span class="m">0</span>
<span class="p">;;</span><span class="w"> </span>QUESTION<span class="w"> </span>SECTION:
<span class="p">;</span>application.<span class="w"> </span>IN<span class="w"> </span>A
<span class="p">;;</span><span class="w"> </span>ANSWER<span class="w"> </span>SECTION:
application.<span class="w"> </span><span class="m">600</span><span class="w"> </span>IN<span class="w"> </span>A<span class="w"> </span><span class="m">192</span>.168.240.2
application.<span class="w"> </span><span class="m">600</span><span class="w"> </span>IN<span class="w"> </span>A<span class="w"> </span><span class="m">192</span>.168.240.4
application.<span class="w"> </span><span class="m">600</span><span class="w"> </span>IN<span class="w"> </span>A<span class="w"> </span><span class="m">192</span>.168.240.3
<span class="p">;;</span><span class="w"> </span>Query<span class="w"> </span>time:<span class="w"> </span><span class="m">1</span><span class="w"> </span>msec
<span class="p">;;</span><span class="w"> </span>SERVER:<span class="w"> </span><span class="m">127</span>.0.0.11#53<span class="o">(</span><span class="m">127</span>.0.0.11<span class="o">)</span>
<span class="p">;;</span><span class="w"> </span>WHEN:<span class="w"> </span>Fri<span class="w"> </span>Feb<span class="w"> </span><span class="m">14</span><span class="w"> </span><span class="m">09</span>:57:24<span class="w"> </span>UTC<span class="w"> </span><span class="m">2020</span>
<span class="p">;;</span><span class="w"> </span>MSG<span class="w"> </span>SIZE<span class="w"> </span>rcvd:<span class="w"> </span><span class="m">110</span>
</code></pre></div>
<p>To see load balancing performed by nginx we can explicitly define two services and assign them different weights. Run <code>docker-compose down</code> and change the nginx configuration to</p>
<div class="highlight"><pre><span></span><code><span class="k">upstream</span><span class="w"> </span><span class="s">app</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kn">server</span><span class="w"> </span><span class="n">application1</span><span class="p">:</span><span class="mi">8000</span><span class="w"> </span><span class="s">weight=3</span><span class="p">;</span>
<span class="w"> </span><span class="kn">server</span><span class="w"> </span><span class="n">application2</span><span class="p">:</span><span class="mi">8000</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">log_format</span><span class="w"> </span><span class="s">upstreamlog</span><span class="w"> </span><span class="s">'[</span><span class="nv">$time_local]</span><span class="w"> </span><span class="nv">$host</span><span class="w"> </span><span class="s">to:</span><span class="w"> </span><span class="nv">$upstream_addr:</span><span class="w"> </span><span class="nv">$request</span><span class="w"> </span><span class="nv">$status'</span><span class="p">;</span>
<span class="k">server</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kn">listen</span><span class="w"> </span><span class="mi">80</span><span class="p">;</span>
<span class="w"> </span><span class="kn">server_name</span><span class="w"> </span><span class="s">localhost</span><span class="p">;</span>
<span class="w"> </span><span class="kn">location</span><span class="w"> </span><span class="s">/</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kn">proxy_pass</span><span class="w"> </span><span class="s">http://app</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kn">access_log</span><span class="w"> </span><span class="s">/var/log/nginx/access.log</span><span class="w"> </span><span class="s">upstreamlog</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div>
<p>We defined here an <code>upstream</code> structure that lists two different services, <code>application1</code> and <code>application2</code>, giving to the first one a weight of 3. This mean that each 4 requests, 3 will be routed to the first service, and one to the second service. Now nginx is not just relying on the DNS, but consciously choosing between two different services.</p>
<p>Let's define the services accordingly in the Docker Compose configuration file</p>
<div class="highlight"><pre><span></span><code>version: "3"
services:
application1:
build:
context: .
dockerfile: Dockerfile
command: gunicorn --workers 6 --bind 0.0.0.0:8000 wsgi
expose:
<span class="k">-</span> 8000
application2:
build:
context: .
dockerfile: Dockerfile
command: gunicorn --workers 3 --bind 0.0.0.0:8000 wsgi
expose:
<span class="k">-</span> 8000
nginx:
image: nginx
volumes:
<span class="k">-</span> ./nginx.conf:/etc/nginx/conf.d/default.conf
ports:
<span class="k">-</span> 80:80
depends_on:
<span class="k">-</span> application1
<span class="k">-</span> application2
</code></pre></div>
<p>I basically duplicated the definition of <code>application</code>, but the first service is running now 6 workers, just for the sake of showing a possible difference between the two. Now run <code>docker-compose up -d</code> and <code>docker-compose logs -f nginx</code>. If you refresh the page on the browser multiple times you will see something like</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>docker-compose<span class="w"> </span>logs<span class="w"> </span>-f<span class="w"> </span>nginx
Attaching<span class="w"> </span>to<span class="w"> </span>service_nginx_1
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:11:03:25<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span>localhost<span class="w"> </span>to:<span class="w"> </span><span class="m">172</span>.18.0.2:8000:<span class="w"> </span>GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1<span class="w"> </span><span class="m">200</span>
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:11:03:25<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span>localhost<span class="w"> </span>to:<span class="w"> </span><span class="m">172</span>.18.0.2:8000:<span class="w"> </span>GET<span class="w"> </span>/favicon.ico<span class="w"> </span>HTTP/1.1<span class="w"> </span><span class="m">404</span>
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:11:03:30<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span>localhost<span class="w"> </span>to:<span class="w"> </span><span class="m">172</span>.18.0.3:8000:<span class="w"> </span>GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1<span class="w"> </span><span class="m">200</span>
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:11:03:31<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span>localhost<span class="w"> </span>to:<span class="w"> </span><span class="m">172</span>.18.0.2:8000:<span class="w"> </span>GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1<span class="w"> </span><span class="m">200</span>
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:11:03:32<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span>localhost<span class="w"> </span>to:<span class="w"> </span><span class="m">172</span>.18.0.2:8000:<span class="w"> </span>GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1<span class="w"> </span><span class="m">200</span>
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:11:03:33<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span>localhost<span class="w"> </span>to:<span class="w"> </span><span class="m">172</span>.18.0.2:8000:<span class="w"> </span>GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1<span class="w"> </span><span class="m">200</span>
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:11:03:33<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span>localhost<span class="w"> </span>to:<span class="w"> </span><span class="m">172</span>.18.0.3:8000:<span class="w"> </span>GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1<span class="w"> </span><span class="m">200</span>
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:11:03:34<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span>localhost<span class="w"> </span>to:<span class="w"> </span><span class="m">172</span>.18.0.2:8000:<span class="w"> </span>GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1<span class="w"> </span><span class="m">200</span>
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:11:03:34<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span>localhost<span class="w"> </span>to:<span class="w"> </span><span class="m">172</span>.18.0.2:8000:<span class="w"> </span>GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1<span class="w"> </span><span class="m">200</span>
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:11:03:35<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span>localhost<span class="w"> </span>to:<span class="w"> </span><span class="m">172</span>.18.0.2:8000:<span class="w"> </span>GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1<span class="w"> </span><span class="m">200</span>
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:11:03:35<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span>localhost<span class="w"> </span>to:<span class="w"> </span><span class="m">172</span>.18.0.3:8000:<span class="w"> </span>GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1<span class="w"> </span><span class="m">200</span>
</code></pre></div>
<p>where you can clearly notice the load balancing between <code>172.18.0.2</code> (<code>application1</code>) and <code>172.18.0.3</code> (<code>application2</code>) in action.</p>
<p>I will not show here an example of reverse proxy or HTTPS to prevent this post to become too long. You can find resources on those topics in the next section.</p>
<h3 id="43-resources">4.3 Resources<a class="headerlink" href="#43-resources" title="Permanent link">¶</a></h3>
<p>These resources provide more detailed information on the topics discussed in this section</p>
<ul>
<li>Docker Compose <a href="https://docs.docker.com/compose/">official documentation</a></li>
<li>nginx <a href="http://nginx.org/en/docs/">documentation</a>: in particular the sections about <a href="http://nginx.org/en/docs/http/ngx_http_log_module.html#log_format">log_format</a> and <a href="http://nginx.org/en/docs/http/ngx_http_upstream_module.html#upstream">upstream</a> directives</li>
<li>How to <a href="https://docs.nginx.com/nginx/admin-guide/monitoring/logging/">configure logging</a> in nginx</li>
<li>How to <a href="https://docs.nginx.com/nginx/admin-guide/load-balancer/http-load-balancer/">configure load balancing</a> in nginx</li>
<li><a href="https://docs.nginx.com/nginx/admin-guide/security-controls/terminating-ssl-http/">Setting up an HTTPS Server</a> with nginx and <a href="https://www.humankode.com/ssl/create-a-selfsigned-certificate-for-nginx-in-5-minutes">how to created self-signed certificates</a></li>
<li>How to <a href="https://docs.nginx.com/nginx/admin-guide/web-server/reverse-proxy/">create a reverse proxy</a> with nginx, the documentation of the <a href="http://nginx.org/en/docs/http/ngx_http_core_module.html#location"><code>location</code></a> directive and <a href="https://www.digitalocean.com/community/tutorials/understanding-nginx-server-and-location-block-selection-algorithms">some insights</a> on the location choosing algorithms (one of the most complex parts of nginx)</li>
<li>The source code of this example is available <a href="https://github.com/lgiordani/dissecting-a-web-stack-code/tree/master/4_the_web_server">here</a></li>
</ul>
<h3 id="44-issues">4.4 Issues<a class="headerlink" href="#44-issues" title="Permanent link">¶</a></h3>
<p>Well, finally we can say that the job is done. Now we have a production-ready web server in front of our multi-threaded web framework and we can focus on writing Python code instead of dealing with HTTP headers.</p>
<p>Using a web server allows us to scale the infrastructure just adding new instances behind it, without interrupting the service. The HTTP concurrent server runs multiple instances of our framework, and the framework itself abstracts HTTP, mapping it to our high-level language.</p>
<h2 id="bonus-cloud-infrastructures">Bonus: cloud infrastructures<a class="headerlink" href="#bonus-cloud-infrastructures" title="Permanent link">¶</a></h2>
<p>Back in the early years of the Internet, companies used to have their own servers on-premise, and system administrators used to run the whole stack directly on the bare operating system. Needless to say, this was complicated, expensive, and failure-prone.</p>
<p>Nowadays "the cloud" is the way to go, so I want to briefly mention some components that can help you run such a web stack on AWS, which is the platform I know the most and the most widespread cloud provider in the world at the time of writing.</p>
<h3 id="elastic-beanstalk">Elastic Beanstalk<a class="headerlink" href="#elastic-beanstalk" title="Permanent link">¶</a></h3>
<p>This is the entry-level solution for simple applications, being a managed infrastructure that provides load balancing, auto-scaling, and monitoring. You can use several programming languages (among which Python and Node.js) and choose between different web servers like for example Apache or nginx. The components of an EB service are not hidden, but you don't have direct access to them, and you have to rely on configuration files to change the way they work. It's a good solution for simple services, but you will probably soon need more control.</p>
<p><a href="https://aws.amazon.com/elasticbeanstalk">Go to Elastic Beanstalk</a></p>
<h3 id="elastic-container-service-ecs">Elastic Container Service (ECS)<a class="headerlink" href="#elastic-container-service-ecs" title="Permanent link">¶</a></h3>
<p>With ECS you can run Docker containers grouping them in clusters and setting up auto-scale policies connected with metrics coming from CloudWatch. You have the choice of running them on EC2 instances (virtual machines) managed by you or on a serverless infrastructure called Fargate. ECS will run your Docker containers, but you still have to create DNS entries and load balancers on your own. You also have the choice of running your containers on Kubernetes using EKS (Elastic Kubernetes Service).</p>
<p><a href="https://aws.amazon.com/ecs/">Go to Elastic Container Service</a></p>
<h3 id="elastic-compute-cloud-ec2">Elastic Compute Cloud (EC2)<a class="headerlink" href="#elastic-compute-cloud-ec2" title="Permanent link">¶</a></h3>
<p>This is the bare metal of AWS, where you spin up stand-alone virtual machines or auto-scaling group of them. You can SSH into these instances and provide scripts to install and configure software. You can install here your application, web servers, databases, whatever you want. While this used to be the way to go at the very beginning of the cloud computing age I don't think you should go for it. There is so much a cloud provider can give you in terms of associated services like logs or monitoring, and in terms of performances, that it doesn't make sense to avoid using them. EC2 is still there, anyway, and if you run ECS on top of it you need to know what you can and what you can't do.</p>
<p><a href="https://aws.amazon.com/ec2/">Go to Elastic Compute Cloud</a></p>
<h3 id="elastic-load-balancing">Elastic Load Balancing<a class="headerlink" href="#elastic-load-balancing" title="Permanent link">¶</a></h3>
<p>While Network Load Balancers (NLB) manage pure TCP/IP connections, Application Load Balancers are dedicated to HTTP, and they can perform many of the services we need. They can reverse proxy through rules (that were recently improved) and they can terminate TLS, using certificates created in ACM (AWS Certificate Manager). As you can see, ALBs are a good replacement for a web server, even though they clearly lack the extreme configurability of a software. You can, however, use them as the first layer of load balancing, still using nginx or Apache behind them if you need some of the features they provide.</p>
<p><a href="https://aws.amazon.com/elasticloadbalancing/">Go to Elastic Load Balancing</a></p>
<h3 id="cloudfront">CloudFront<a class="headerlink" href="#cloudfront" title="Permanent link">¶</a></h3>
<p>CloudFront is a Content Delivery Network, that is a geographically-distributed cache that provides faster access to your content. While CDNs are not part of the stack that I discussed in this post I think it is worth mentioning CF as it can speed-up any static content, and also terminate TLS in connection with AWS Certificate Manager.</p>
<p><a href="https://aws.amazon.com/cloudfront/">Go to CloudFront</a></p>
<h2 id="conclusion">Conclusion<a class="headerlink" href="#conclusion" title="Permanent link">¶</a></h2>
<p>As you can see a web stack is a pretty rich set of components, and the reason behind them is often related to performances. There are a lot of technologies that we take for granted, and that fortunately have become easier to deploy, but I still believe a full-stack engineer should be aware not only of the existence of such layers, but also of their purpose and at least their basic configuration.</p>
<h2 id="feedback">Feedback<a class="headerlink" href="#feedback" title="Permanent link">¶</a></h2>
<p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>