Why does ̶l̶o̶v̶e̶ high availability have to be so complicated?
It’s the Hallmark movie season, I mean Christmas season, I mean Hallmark Christmas movie season… (don’t judge too harshly, I’m a father of six young ladies, a hopeless romantic, and married to an amazing spouse who enjoys a good holiday laugh and happy ending). If you are in the Hallmark movie season, you know that it is highly likely that you’ll hear the phrase, “Why is love so complicated?” It will be spoken just before the heartbroken young person has developed feelings for a new love interest, and is ready to dance the night away in their arms, just as the old flame walks into the party. If you aren’t into the Hallmark holiday romances, maybe it isn’t love that you are wondering about. Perhaps you want to know: “Why does high availability have to be so complex.
Ten reasons that high availability is so ‘gosh darn’ complicated:
1. The speed of innovation. Cloud computing, edge computing, hyper converged, multi-cloud, containers, and machine learning are changing the landscape of enterprise availability at a blistering pace. By conservative estimates, AWS currently has over 175 services, and “provides a highly reliable, scalable, low-cost infrastructure platform in the cloud that powers hundreds of thousands of businesses in 190 countries around the world.” Choosing an HA solution that allows consistent management across all of these environments, with infrastructure and application awareness is an important way to reduce complexity.
2. Randomness of disasters. Someone once said, “make your solution disaster proof, and the universe will build a better disaster.” Not only are we seeing innovations in the realm of technology, but also in the world of disasters. Resource starvation, cooling system disasters, natural disasters, power grid failures, and a host of new and random disasters often make it harder to insulate the entirety of your enterprise. Last year’s solutions will likely need updates to handle this year’s unprecedented outages. It’s important to work with a vendor that has focused on high availability for many years — who has firsthand experience with finding solutions to the randomness of disasters.
3. Application complexity. As technology moves head in the realm of virtualization and cloud computing, applications are following suite. As these application vendors add new options to take advantage of the cloud, they are also adding additional complexity. Your applications should be protected by solutions designed for higher availability and clustering in AWS, Azure, GCP or other environments. Look for vendors who provide greater application awareness, understanding of best practices, and who deliver availability solutions architected to taking account of how the application may have been architected and are able to optimize the application’s orchestration in the cloud.
4. Advances in threats. The threats to your enterprise also impact your availability. Systems have always had to handle the attacks from intruders, hackers, and even the self-inflicted. These attacks have become more sophisticated, and the solutions and methods to avoid being victimized often impact the layout, architecture, and software that is deployed within your organization. This software has to “play nice” with your availability solution and your applications. As VP of Customer Experience for SIOS Technology, I have seen how an overly aggressive virus scanner can impact your application and your availability solution. Ensure you understand the impact of your security systems on your HA/DR environment and choose a HA solution that works with, not against your security goals.
5. Regulatory requirements. Data breaches impact the architecture for your application, hypervisor and environment, but so too does the regulatory requirements. Businesses that have become global now have to make sure they are compliant with data handling regulations in multiple countries. This can impact what region your solutions can be deployed in, and how many zones you can use for redundancy. Additional, regulatory requirements can also impact the teams that can support your organization which may impact the choices for your availability software and support.
6. Shrinking windows. In the world of 24/7 searches, shopping, gaming, banking, and research the windows are shrinking. Queries must run faster and take less time. Responses have to be quicker and have better data. This means that the allowable downtime for your environment is shrinking faster than you previously imagined. It also means that maintenance windows are tighter, packed, and have to be optimized and highly coordinated. Work with an HA vendor that can provide guidance on optimizing your cluster configuration for both application performance and fast recovery time.
7. Increasing competitive pressure. I grew up in a small town. The hardware store had one competitor. The grocery store had one competitor. The bookstore, antique shop, car dealership, rental office, and bank all had one competitor. Today, you have thousands upon thousands of competitors who want nothing more than to see your customers in their checkout carts. This competition impacts the complexity of your entire business, and weighs heavily on what can and cannot be done in maintenance windows, with upgrades, and at what speed you innovate. Environments that may have been refreshed once every five years have moved to the cloud where optimizations and advancements in processor speed and memory can be had in seconds or minutes. Systems that once had a single run book covering a simple list of applications now look closer to “War and Peace” and cover the growing number of processes, products, services and intelligence being added to increase profits while simultaneously working to reduce risks and downtime.
8. High availability solution costs. We all wish we had an unlimited budget, but the reality between what you have available is sometimes somewhere between a little and not enough. Teams are often forced to balance consumption versus fixed cost, license costs for applications on the standby clusters, and associated costs for availability software. Enterprise licenses often add a ‘tough to swallow’ price tag for a standby server in an availability environment. Architecting an availability solution is never free, even if you are a hard core ‘DIY’ team. DIY comes with additional costs in maintenance, management, source control, testing, deployment, version management and version control, patches, and patch management. While your team of experts may be clearly up for the challenge, your business likely would prefer their highly valued talents be applied to creating more revenue opportunities.
9. Business growth. Growth of your business due to innovation means that your teams are now responsible for more critical applications, more sites, more offices, and more data that needs to be accessible and highly available. As your business grows and thrives the challenges that come with scaling up and scaling out add to the complexities mentioned previously, but also just expand what you have to prepare and plan for.
10. Team turnover. The complexity of the environments, speed of innovation, growth of your business, advances in the application tier, and growth in the competitive landscape brings with it the challenge of retaining top talent to keep your infrastructure running smoothly. Most companies understand that availability is a merger of people, process, product, and architecture among other things. So finding ways to reduce the complexity of clustering environments with automated configuration, documented run books, leveraging products with consistent HA strategies across the infrastructure is a key to both retaining the talent that installs and manages your infrastructure, and mitigating the risks and heavy lifting of those responsible for the key components of availability.
Let’s face it, love takes hard work, good communication, time, investment, skill and determination, and there are no shortcuts to a successful relationship. The same can be said about achieving the best outcomes in an ever emerging, increasingly complex, and fluid technology space within your enterprise. Availability, clustering, disaster recovery and up time is so ‘gosh darn’ hard because it requires a serious, dedicated, non-stop top to bottom cultural shift accounting for the speed of innovation, the complexity of applications and orchestration, competition and growth, and the other components of keeping applications, databases, and critical infrastructure available to those who need them, when they need them.