Brighter IR

Home Blog Highly-available, secure and scalable website hosting using AWS

Highly-available, secure and scalable website hosting using AWS

31 Jan 2020

It’s been a busy few months at BIR Towers. Thankfully the team continues to grow to meet an ever increasing demand, so we figured we’d share some of our recent news and get blogging again! Today we’re pleased to introduce our new web-hosting infrastructure built on Amazon Web Services (AWS).

It’s hard to know where to begin when talking about website hosting as there are so many things to consider. So let’s start at the beginning and ask the fairly simple question, what do we expect of a webserver? The answer is almost always “guaranteed uptime”. The website should always be online.

You’d think this is fairly simple to achieve too, right? Not quite.

While technology has come on in leaps and bounds, if you want to truly deliver highly available (HA) website hosting, you need to ensure you have multiple copies of the website in different places. In principal, this means that if one server encounters a hardware or software issue, the other server(s) keep the website up.

The theory is fairly straight-forward; have multiple servers (aka hosts) each with a copy of the website(s), so that if one host fails the others keep things ticking along without any downtime.

This naturally introduces the next question, how do you ensure each host has an exact replica of each website available all the time? This is where things begin to get a little tricky.

Acronyms abound in AWS EC2, ELB, RDS and S3!

A website typically comprises of two main elements – a bunch of files, such as images, PDFs, PHP files, HTML and CSS which all sit in a folder on the server somewhere; and a database of some description which is where all of the content, options, usernames/passwords and other related information is stored. Combine the files and a database and you end up with a website.

If we, or a client, logs into a website and uploads a PDF, this needs to be instantly available across all of the hosts in the cluster. It’s the same if the text on a page changes too; this text is contained within a database, and it needs to be instantly reflected across all of the available hosts.

One solution is to “sync” changes between hosts. So if I log into server 1 and make some changes, these are synced and replicated to the other servers in the cluster. The downside with this sort of architecture is the time it takes for the synching to happen. Any lag at all means you end up with different versions of a website until the changes are replicated, which is less than ideal – especially on a busy production environment hosting hundreds of websites which are constantly being updated via the CMS by multiple people.

Thankfully, AWS provides the solutions.

The Amazon Relational Database Service (RDS) solves the database side of things very neatly. Traditionally, the webserver hosts the database AND files on the same “box”, but RDS provides a separate cloud-based database server which connects to each of our webserver hosts in parallel. In this sense, when some text is updated, or something in the database changes, it’s instantly available to all copies of the site on every host without waiting for anything to sync. Neat! It also takes some of the strain by separating database processing away from the hosts themselves – many hands make light work. It’s easy to administer, lightning fast, highly scalable, highly available and incredibly robust and durable.

The Amazon Simple Storage Service (or S3 as it’s commonly known) also solves the issue of not needing to sync media content such as images, PDFs, PPTs and other files. It provides a centralised “asset repository” which is instantly available to all of the connected webhosts, and it removes the need to wait for content to sync between them once it is uploaded.

Both of these technologies working in unison allow us to host client websites across multiple servers, and content related changes are centralised and immediately accessible to all of the connected hosts, meaning we don’t have to wait for anything to sync or update between them.

AWS ELB RDS S3 EC2

Amazon’s Elastic Compute Cloud (known as EC2) is where we host the websites themselves (PHP, HTML, CSS files etc.). AWS is the world’s largest cloud-hosting provider and has been successfully powering our IR tools platform Polaris for some time now, so it was the obvious choice for the new website servers. There’s a myriad of packages to choose from, but having multiple EC2 instances all inter-connected allows us to scale horizontally AND vertically.

For example, right now we have 3 primary boxes – a “Master” server, plus a couple of “Slave” servers. Each of these boxes have a certain amount of allocated resource in terms of raw CPU processing power, RAM and hard-disk space, all of which is completely configurable.

So, if we need more horsepower, we can scale the infrastructure vertically by simply adding more allocated resource to each existing box. Equally, we can just add more Slave boxes to the configuration and scale horizontally, too. The latter has the added benefit of increasing overall resiliency – put simply, the more boxes you have hosting and serving a copy of the site, the less chance there is of a site going offline. Right now all three of our boxes would have to go down at once for our client sites to go offline.

Well, almost. We have cloud-based data storage in RDS, and we have cloud-based asset storage in S3, and we have multiple EC2 boxes sitting ‘above’ them to serve the websites themselves. But how do we manage traffic between it all?

That’s where Amazon’s Elastic Load Balancing (ELB) comes into play. Put simply, traffic arrives at the ELB and that decides which EC2 instance it is sent to for processing. It can be incredibly clever, including features like geo-location (something we’ll roll out as we move into new markets) and serve content to users from servers which are closest to them, but at its most basic level, the ELB is there to connect the EC2 instances together and balance the load between them. This means we don’t end up with a Master server doing 99% of the work and the Slaves just kind of sitting there doing nothing – the load is evenly shared at all times.

But what if the ELB goes down? Well, that’s why we have multiple ELBs! It removes the potential for a “single point of failure” and if the ELB were to encounter an issue, another can take its place.

All in all we’ve split the load across a number of processing units, each of which has failsafes and redundancy built in, meaning we can guarantee greater uptimes and faster page speeds. Instead of having one or two boxes doing all the work for the sites which reside on them, we’ve shared the load and increased performance in the process.

Security, Firewalls and DDoS Protection

We’ve talked about website security in a previous blog post or two, and hosting is naturally a major component of overall security. It’s no good dead-bolting the front door if you’re going to leave the backdoor unlocked, so to speak.

The main thing a webhost needs is some sort of Web Application Firewall (WAF). This is designed to catch malicious traffic and stop it dead in its tracks, either before it happens (because it’s coming from a known bad source), or as it happens (because it’s broken one of a series of rules).

We’re big fans of the WordPress content management system and use it for the majority of our client work. WordPress is also the world’s most popular CMS and this makes it a likely target for would be attackers, so it pays to have good security in place. One of the most popular WAFs for WordPress is the WordFence plugin, a kind of hive-mind for protecting sites against known exploits and bad sources. But it comes with limitations.

WordFence is great at stopping an automated attack against one site; but there’s nothing to stop the same attacker from having a go at the next site on the same server, and the next, and the next… essentially working their way through all of your hosted websites looking for a way in. Even if they don’t find one, such attacks use valuable server resources and computing power.

This is why we’re now favouring Amazon’s Web Application Firewall. Sitting at the top-level, traffic must go through the WAF before it reaches the EC2 boxes and protects our servers based on a set of curated rules.

The WAF allows us to block common attack patterns, SQL injection attempts and cross-site scripting (XSS) among others. We’ve implemented a series of common rulesets including Open Web Application Security Project (OWASP), alongside more customised rules designed to protect us against common WordPress attacks. If a request breaks a rule, the IP is blocked at the highest level and the attacker is stopped from moving onto other sites as a consequence.

We don’t just rely on the WAF either, with similar rulesets in operation on each of the EC2 boxes acting as a ‘failsafe’ for any traffic that manages to bypass the high-level WAF, perhaps by targetting each individual EC2 instance on its own. It’s essentially another layer of security which helps protect the sites we host.

It doesn’t stop there either. AWS Shield is a managed Distributed Denial of Service (DDoS) protection service designed to safeguard all applications running on AWS. As you can imagine, it’s in AWS own interest to stop this kind of attack and it’s great to have it covered out of the box as part of the overall package. Combined with Amazon CloudFront, a fast content delivery network (CDN) it offers superb levels of protection for us and our clients.

Migrating existing clients

With the new infrastructure set up we’re busy planning migration and aim to have all of our legacy hosted clients (around 100 live websites) onto AWS over the next six months.

In a general sense the process should be relatively painless. The legacy and new infrastructures use the same underlying administration software which makes transfers between the two fairly straightforward, but there’s a little more work involved to offload existing website assets into S3 storage, and we’ll also need to thoroughly test each site as we go to ensure there are no hiccups.

Another new technology we’ve employed is LiteSpeed – a caching engine to replace the traditional Apache HTTP server. It speeds up websites immensely, but we did have to write some custom drivers and plugins to ensure caches are purged across all webhosts when changes are made and keep everything in sync.

We also expect a number of clients will take the opportunity to carry out their own security penetration testing to satisfy their internal IT departments, and as with any DNS change or host migration, there will be a number of questions along the way. Hopefully some of our clients will be reading this article in response to those questions though!

So there you have it. We’ve spent a lot of time architecting what we believe to be a scalable, flexible, resilient and highly available hosting infrastructure using the world’s most popular cloud-hosting provider with enterprise level security and features designed to make our lives easier, help us and our clients sleep better and night, and open the door to scaling the business in future. If you’re looking for a new website, or just better hosting, give us a call and maybe we can help.

We can’t finish this post without special thanks to Ragu, a member of our front-end development team, who single-handedly made the entire project possible. He overcame every hurdle along the way without breaking a sweat and we’re all in awe of what he’s achieved!

Server room