The date is July 24, 2018. Here is the message we had to deliver to our customers:
"A power incident at Cyxtera caused some network devices of CenturyLink to go out. There was a UPS/power maintenance at the datacenter, and due to an incident during the maintenance, some power lines were unavailable for a short period of time. Some of our power lines were also affected, but as servers are connected to multiple power feeds, none of our equipment went out. However, network devices of CenturyLink were affected. It took nearly 60 minutes the power to be restored and these devices to boot and return to operational state."
There is so much pain and emotions behind these sentences: 48,000 websites were down for an hour, and we had to wait for explanation from our provider for hours on end. This incident comes to remind us how vulnerable we are, because everyone of us depends not just on many other people, services and businesses, but also on devices, cables, plugs, and all types of small gadgets. What is worse: day by day, the number of people and gadgets grows exponentially. How do we cope with this?
Life, or specifically the "Managing a hosting company for over 18 years"-part, taught me that downtime is more of a psychological issue than a technical one. In the IT world, “uptime” is the measure of time that a device is in operation and available. Nowadays everyone in the industry guarantees 99.9% or even 100% uptime, so we accept it as a standard.
Well, it seems I am the only one in the industry who cannot guarantee anything… or perhaps I’m the only one who is honest to the bone. I can tell you what we’ve achieved so far: 99.998% average uptime, based on 18 years of uptime measurement using 500 servers. If you think that the reason for this achievement is that we are the "IT SuperPeople", then you’re wrong. The primary reason behind this achievement is that we have been lucky. Lucky that no disaster has struck the data centers where we keep our servers. We have been lucky that hackers have left us alone for these 18 years. You are lucky too, just for having access to this article and the whole Internet at your fingertips.
You can consider the following fact: there are many weaknesses in the software used by every hosting provider. Such weaknesses are disclosed on nearly a daily basis, and often there are time gaps between their discovery, public disclosure, and the official fix. Like every hosting provider, we’ve experienced such weaknesses. So it has been pure luck that hackers never targeted us during a time of vulnerability. All service operators are vulnerable at times - whether you’re a one-server-SOHO-web-host or CIA Headquarters - you’re either hacked or you’re lucky. Not everyone has been as lucky as we’ve been. Of course, we have still had our fair share of problems. Years ago, there was an occasion at which our datacenter provider in the US had no connectivity for ten hours straight. I was younger then and much less wise than I am today, so the stress of that experience shortened my life by a decade.
We always choose the best providers. They usually end up being the most expensive, too. We carefully study what they offer, who they are, and how what record they have, but sometimes even the best providers fail too. Everyone fails sometimes. One day ICDSoft might fail as well - just give us unlimited time, and we will fail for sure. That's a thing I can guarantee for. And that's why it is very important to be honest and kind to those around you no matter what. Because when you’re the one who’s failing, you want the people around you to return the favor; to offer empathy and understanding when you need it most - not to revel in your failure.
Technical talk aside, the heart of the uptime issue boils down to what’s become a universally human one: we’ve become addicted to being online. We have inherited the fear of rejection from our ancestors, and since everyone is constantly online nowadays, we confuse the state of being offline with rejection.
For primates, being exiled from the group is a real death sentence. You do not have to go to the jungle to witness what a primate does when ostracized: just cut any human's connectivity and watch their behavior!
In the real world, one only gets completely disconnected in very rare and unfortunate cases, such as going to the hospital or to jail. In the real world, others notice your absence. But in the virtual world few - if anyone - will notice, and this is what terrifies us even more than the downtime itself. It reminds us how unimportant we are. Yes, the world will spin without us the same, beautiful way it always has.
But the cultural implications of the uptime of a business site are a bit different. When the website is offline, it is equal to a brick-and-mortar shop being closed, so customers can't get in and spend their money there. Disaster? Yes, for those who cannot accept the fact that this happens sometimes. In fact, it happens to everyone, including your competitors. I would understand if you find it arrogant and even cynical that someone running a successful web hosting company claims it’s no big drama if your website goes offline for a minute or five. And obviously, I can’t guarantee such a situation would be drama-free. So instead of promising you the impossible, I will share with you the drama-free ways to cope with this when it happens:
If your business experiences an unexpected closure for an hour, out of 176 working hours for the month, and this is enough to change your financial result from positive to negative, then the one-hour outage is definitely not the biggest problem your business has. Instead, you can consider these matters:
The perspective of your loyal customers. If the customers who were not able to enter your shop during that hour refuse to wait and go to buy from your competitor instead, this means that those customers were not loyal customers to begin with. Loyal customers will check again in an hour or a day, but will not betray you, as they don't want to compromise on quality. I know that my statement cannot cover all cases of every business; no one would wait for a restaurant to reopen during a lunch break… but still, they’d likely go back to the restaurant another day for another meal. But generally speaking, it is illogical for a loyal customer to withdraw their business. I would not buy a pink T-shirt simply because the blue T-shirt shop is temporarily closed - because I need a blue T-shirt. As the blue T-shirt shop, you can rest easy with confidence in your product.
The financial stability of your business. If you have not yet achieved financial stability and good balance, then everything is to blame: the five second power outage, the rainy weather, all of your vendors, and your own customers, too. A business should generate profit at healthy levels. Next time, before you blame someone else for pausing your business for 15 minutes and thereby ruining it, think about why the remaining 43185 minutes during the month were not enough to make up for this minor lapse that shouldn’t mean anything.
Many people are so terrified by the perspective of failing that they always have an excuse lined up. Whenever a business owner interprets an event as a potential excuse, their reaction is severe. Cut the connectivity of a group of business owners and watch their reactions towards their providers. Then you will know whose business will last and whose will fail. Mark these words - I sign my name to them.
I pay $80,000 per month to my colocation providers for their service, and I hope for 100% uptime. But should something unexpected happen on their side and my corporate site stops for hours, or for a day, I will roll with the punches as it will not affect my profit much - if at all. And no, I was not born with this attitude. In fact, ten years ago such a situation would've driven me out of my mind. But experience has taught me better.
So while I am not able to promise an eternal uptime, I can say that we do and will always do our best to keep all servers up and running, not by compromising with anything in anyway, but by careful planning of every detail and staying forever vigilant. Whenever something happens to be down, we give our utmost effort to get it back up as soon as possible. If you still want numbers instead of empty words, here’s a number for you: we give 100% effort, 100% of the time, to keep all websites up, and our own corporate site is always in the same boat as the websites of our customers - so you know we care.