Knowledge base
  • Goal of knowledge base
  • Linux & core
    • Linux
      • Record SSH session for reporting
      • Compress / Decompress files
      • Colorize logs
      • Cron output & logging
      • Signal
      • Break out and escape SSH session
      • Mount volume permanently
      • Show processes most consuming CPU & MEM
      • Improve and optimize battery life on Linux
      • File ownership & groups in linux
      • Automatic security update/patch on Ubuntu
      • Clean buffers and cached on linux
      • Bash completion on Linux/Mac
    • Core services
      • Nginx reload
      • OpenVPN Split tunneling
      • Nmap commands
    • Hardware
      • CPU Architecture fundamental
  • Database
    • MySQL
      • InnoDB - innodb_file_per_table parameter
      • MySQL - enable slow query log
      • MySQL - export large tables
    • MongoDB
  • Container
    • Docker
      • ADD or COPY in Dockerfile
        • Clean data of docker completely
    • Podman
  • Automation
    • Ansible
      • Output format
  • Build & Deployment
    • Jenkins
      • Jenkins - force exit pipeline when failure
  • Language & Toolset
    • PHP
      • Composer
      • php-redis & php-igbinary
  • Mindset
    • Technical based
      • Writing well
      • Reinvent The Wheel
      • Approach a new system
      • Backup philosophy
      • Mindset for building HA and scalable system
      • GitLab database incident
    • Non-technical based
      • How to read news efficiency?
      • How long should you nap?
      • Assume good faith
  • Reference & learning source
    • Books
      • Sysadmin/SRE
      • Mindsets
      • Software fundamentals
    • English
Powered by GitBook
On this page
  • Goal: system or infrastructure must have
  • 1. Fault tolerance
  • 2. Single point of failure SPOF
  • 3. Defense in depth
  • 4. Failover
  • 5. Heartbeat
  • 6. Infrastructure as code
  1. Mindset
  2. Technical based

Mindset for building HA and scalable system

PreviousBackup philosophyNextGitLab database incident

Last updated 6 years ago

Goal: system or infrastructure must have

  • Fault tolerance

  • No single point of failure

  • More than one or two security layers

  • Auto-failover without requiring human intervention

  • Heartbeat monitoring on all running components

  • Infrastructure as code

1. Fault tolerance

It is the property that enables a system to continue operating properly in the event of the failure of (or one or more faults within) some of its components. If its operating quality decreases at all, the decrease is proportional to the severity of the failure, as compared to a naively designed system in which even a small failure can cause total breakdown. Fault tolerance is particularly sought after in high-availability or life-critical systems.

  • Distributed read/write to MySQL replication cluster

  • CDN system like Cloudfront/Cloudflare

  • Micro-services, separated databases for some big components

2. Single point of failure SPOF

A single point of failure (SPOF) is a part of a system that, if it fails, will stop the entire system from working. SPOFs are undesirable in any system with a goal of high availability or reliability, be it a business practice, software application, or other industrial system.

  • MySQL multi-master - galera cluster

  • AWS RDS multi-AZ feature

  • Elasticsearch master nodes

  • Redis sentinel

3. Defense in depth

Defense in depth (also known as Castle Approach) is an information assurance (IA) concept in which multiple layers of security controls (defense) are placed throughout an information technology (IT) system. Its intent is to provide redundancy in the event a security control fails or a vulnerability is exploited that can cover aspects of personnel, procedural, technical and physical for the duration of the system's life cycle.

  • Cloudflare Anti DDOS layer

  • IPtable / AWS secgroup

  • VPN

  • Snort / Ossec

4. Failover

A method of protecting computer systems from failure, in which standby equipment automatically takes over when the main system fails. In computing, failover is switching to a redundant or standby computer server, system, hardware component or network upon the failure or abnormal termination of the previously active application, server, system, hardware component, or network. Failover and switchover are essentially the same operation, except that failover is automatic and usually operates without warning, while switchover requires human intervention.

  • HAproxy / AWS ALB & ELB

  • Auto promote on MySQL replication

5. Heartbeat

In computer science, a heartbeat is a periodic signal generated by hardware or software to indicate normal operation or to synchronize other parts of a computer system. Usually a heartbeat is sent between machines at a regular interval in the order of seconds. If the endpoint does not receive a heartbeat for a time —usually a few heartbeat intervals—, the machine that should have sent the heartbeat is assumed to have failed.

  • Uptime tools (Monit, Newrelic synthetics, AWS LB healh-check)

  • Percona pt-heartbeat

6. Infrastructure as code

All configuration is defined in executable configuration definition files, such as shell scripts, Ansible playbooks, Chef recipes, or Puppet manifests ...

  • Infra & network layer: Terraform, Cloudformation

  • Application layer: Ansible playbook, Puppet, Chef, Salt stack

kubernetes