Self Healing
Linked to availability
Self-healing is a property going beyond graceful failure handling; it is the ability to detect and fix problems automatically without human intervention
difficult and costly to build
Minimizing the mean time to recovery and automating the repair process is what self-healing is all about.
-
you want to make your system appear as if all of its components were functioning perfectly even when things break and during maintenance times.
so in times of failures, system should still be functioning
As you scale out, failures become a much more frequent occurrence.
Be able to handle power outages, human error, network failures
Always prepare for these failures
Trigger them yourselves and to test your self healing, responsiveness etc
chaos monkey
Crash-Only
the system should always be ready to crash, and whenever it reboots, it should be able to continue to work without human interaction
he system needs to be able to detect its failure, fix the broken data if necessary, and start work as normal,
if you want to shut the system down, you need to terminate it
ensuring high availability is mainly about removing single points of failure and graceful failover.
Single point of failure is any piece of infrastructure that is necessary for the system to work properly
Once you identify your single points of failure, you need to decide with your business team whether it is a good investment to put redundancy in place.
Redundancy is having more than one copy of each piece of data or each component of the infrastructure
Systems that are not redundant need special attention, and it is a best practice to prepare a disaster recovery plan (sometimes called a business continuity plan) with recovery procedures for all critical pieces of infrastructure.
Last updated