We believe very strongly in continuous improvement at Elcom as it forms a core value of our Agile development process. Anthony hinted at the significant changes to our website hosting infrastructure we have been making as part of our commitment to continuous improvement. These changes are mostly transparent to our clients, and have so far involved larger uninterruptible power supplies (UPS), more powerful servers and a rollout of Windows Server 2008 across our hosting servers.
The largest changes are still ahead of us and will involve measures designed to greatly improve the redundancy, stability and performance of our hosting infrastructure. The aim is to give clients a hosting service that is more fault tolerant, so that equipment failures do not affect website serving, user sessions or client data.
The first step is to create a completely redundant network infrastructure that supports transparent failover so that if one part of the network infrastructure (a cable, router, switch, firewall, Internet connection or network card) fails, there is a seamless transition to using the redundant hardware/software, with at worst a slight drop in throughput or peak capacity. In fact performance overall will improve as we remove some redundant network layers from our infrastructure. This step will be completed in the next few weeks, unfortunately with one or two very brief (2-5 minutes) outages across all hosted websites. (Key gains: additional redundancy, performance improvements)
The next step is to implement a high-availability, load-balanced web server cluster that will serve all Community Manager.NET websites that are in our current shared hosting servers.
- High-availability
When you can lose a server from a cluster without it affecting the overall job of that cluster then it is said to be "high-availability". We will start with three servers in this cluster which means that even if two of the servers have a problem simultaneously we will still be able to serve client websites as if nothing happened.
- Load-balanced
Load-balancing across servers means that peaks and troughs in traffic and workload are shared across servers. This means that if one client's website gets a sudden deluge of traffic the load can be shared across all three web servers, meaning there is less net effect on the capacity to serve client's websites (both their's and others).
Another big bonus of this change is that it includes implementing "out of process" user session handling which means we will be able to upgrade Community Manager.NET websites without requiring an outage as users' sessions will not be affected by a re-start of the website on any one server. This is an exciting development, as it will mean less need for website outages in the future as we continue to grow and expand on our hosting service. (Key gains: additional redundancy, greater stability)
The last step is for us to upgrade the current server that mirrors our main database server to have the same specification as that server (which is by far the biggest server we have). This will help ensure that database performance in a failover situation is maintained at the same level it is normally. (Key gains: additional redundancy, performance improvements)
Our strategy in the future will be to add web servers to the cluster, and bring online new mirrored database servers as demand for our hosting services continues to grow. We will also be upgrading our staging environment to have a similar (but less powerful) web server cluster to improve the performance of staging sites.
Do you want to know more about Elcom's website hosting service?