GitLab database incident
For your information:
Lessons learned:
Engineers should get more sleeps
Restore strategy is more important than backup strategy
Testing backup plans would not be a bad idea. If we don't test backups, we don't have them. We must rechecking backup/restore plans monthly, quarterly or yearly
Always careful, anything with
sudo
command, we need to double/triple checkChange terminal PS1 format/colors to make it clear whether you’re using production or staging
RED for production
Blue/green for staging
Show the full hostname in the bash prompt for all users by default (e.g:
db1.staging.gitlab.com
instead of justdb1
)
Last updated