diff --git a/README.md b/README.md index e2b51f3..32e8071 100644 --- a/README.md +++ b/README.md @@ -1,353 +1,401 @@ # Micro Backup script -This is a keep it simple and stupid backup script. +This is a keep it simple and stupid backup script with de-duplication in mind, completely written in Bash. + +Authored by Valerio Bozzolan under a Free license. + +## Features + +- Unix-way + - support on-site backups based on simple filesystem directories + - easily creates a backup tower where nodes can make a push with limited privileges +- MySQL/MariaDB databases + - support local or remote database servers + - ability to dump all databases (without knowing them a priori) + - ability to dump all databases but skip some databases + - ability to dump only specific databases + - ability to don't dump the data of some specific table names, with regex support + - ability to stop a systemd service and start it again exactly during a database dump + - ability to easily dump Phorge/Phabricator databases (https://we.phorge.it/) (60+ databases) + automatically activating/de-activating the maintenance read-only mode before/after the dump +- filesystem + - ability to preserve all file attributes + - create an on-site backup and specify the destination in one configuration + - ability to backup a file, or a directory, with the same syntax + - ability to backup only the last lines of a very big log file +- transfer + - ability to transfer the on-site backup whenever you want with network compression +- scalability + - ability to Designed by design to be easily replicated on several other node +- time mess + - ability to prevent parallel executions, for example to prevent common issues during daylight saving time changes +- Antani + - we 100% support Antani but only in particular conditions. Contact sales. ## Installation +You should "install" (download) this thing for every host that should have on-site backups, or push its backup. + +Installing this has sense in all my user cases, so on-site backups can be executed even if the host goes offline. + Choose a directory and then clone this repository: ``` git clone URL ``` Example: ``` sudo -i cd /opt git clone https://gitpull.it/source/micro-backup-script/ ``` Please change the URL to point on your own fork (if you have one). ## Configuration Enter in the cloned directory and run: ``` cp backup-instructions-example.conf backup-instructions.conf cp options-example.conf options.conf ``` Then edit these files with your favorite text editor: * `options.conf` * `backup-instructions.conf` ## Usage After you configured the script for your needs, there are no arguments. Just run this: ``` sudo ./backup-everything.sh ``` You can schedule it from your crontab. Example: ``` $ sudo crontab -e # on-site backup at night #m h dom mon dow command 00 1 * * * /opt/micro-backup-script/backup-everything.sh ``` ## Options documentation (`options.conf`) The file `options.conf` is designed to store important options (like `BOX` and `BASE`). The file `options.conf` can be copied from the example called `options-example.conf`. ### Option `BOX` The `BOX` option sets the human name for your local computer. Default: current `hostname`. The `BOX` option is recommended in order to be independent from the hostname and have more stable backup pathnames. The `BOX` option is used to create a directory with the same name. Here 3 examples: ``` BOX=gargantua BOX=my-local-nice-machine BOX=server001.example.com ``` NOTE: The `BOX` option is used to build the final pathname of your on-site backups. See below. ### Option `BASE` The `BASE` option sets the base pathname for on-site backups. Default: `/home/backups` for historical reasons. The `BASE` option is strongly suggested in order to be independent from the system default. The `BASE` option should **not** end with a slash. The `BASE` option should contain a valid filesystem pathname. Here 3 examples: ``` BASE=/var/backups/stark-industries BASE=/mnt/stark-industries BASE=/tmp/test-backups ``` For example if you set `BASE=/var/backups/stark-industries` and `BOX=gargantua`, your on-site backups will be placed in `/var/backups/stark-industries/gargantua`. ### Option `PORCELAIN` The option `PORCELAIN` allows to debug everything and do nothing. It's useful to see what would be done without executing any command. The option `PORCELAIN` accepts an empty value (default) or `1`. When `1` the flag is activated. Here 2 examples: ``` # run all instructions normally (default) PORCELAIN= # do not run any instruction but just print them PORCELAIN=1 ``` The option `PORCELAIN` will skip the following actions when set to `1`: * skip any database dump (avoiding to run `mysqldump`) * skip any data transfer via rsync (adding a `--dry-run') * skip any systemd stop/start command ### Option `NO_DISSERVICE` The option `NO_DISSERVICE` is a flag that can avoid any command related to a systemd service. It's disabled as default. The option `NO_DISSERVICE` is only useful if you use some backup instructions related to systemd and you want to debug them. The option `NO_DISSERVICE` accepts an empty value (default) or `1`. When `1` the flag is activated. Here 2 examples: ``` # run all systemd-related instructions normally (default) NO_DISSERVICE= # do not run any systemd-related instruction NO_DISSERVICE=1 ``` ### Option `HOURS_INTERVAL` The option `HOURS_INTERVAL` can be used to set a desired minimum time window (in hours) between each execution. The option `HOURS_INTERVAL` is particularly effective to mitigate daylight saving issues and race conditions in general. The option `HOURS_INTERVAL` defaults to `12` hours. The option `HOURS_INTERVAL` can be disabled with a value of `0` to always backup everytime you call the script. ## Instructions documentation (`backup-instructions.conf`) The file `backup-instructions.conf` contains the backup commands (which databases should be saved, which pathnames, etc.) The file `backup-instructions.conf` can be copied from an example called `backup-instructions-example.conf`. ### Instruction `backup_path` The instruction `backup_path` instruction does an on-site copy of a directory or a single file. Examples: ``` # save the Unix configuration files backup_path /etc # save all user homes backup_path /home # save a copy of this specific log file backup_path /mnt/something/log.err ``` The data will be saved in a sub-directory of `$BASE/$BOX/daily/files` keeping the original structure. For example the path `/mnt/something/log.err` will be stored under `$BASE/$BOX/daily/files/mnt/something/log.err`. ### Instruction `backup_paths` The instruction `backup_paths` (note it ends with an "s") allows to save multiple pathnames or use Bash globs to capture multiple pathnames. Example: ``` # backup only the log files who start with "Kern" backup_paths /var/log/Kern* # backup these pathnames backup_paths /home/mario /home/wario ``` ### Instruction `backup_last_log_lines` The instruction `backup_last_log_lines` saves the last lines of a long txt file. Example: ``` backup_last_log_lines /var/log/secure ``` ### Instruction `backup_database` The instruction `backup_database` runs a `mysqldump` on a specific database and compress it with `gzip`. The instruction `backup_database` saves the database under `$BASE/$BOX/daily/databases/$DATABASE_NAME.sql.gzip`. Examples: ``` # first, backup a database with a specific name backup_database wordpress_testing # then, backup another database backup_database wordpress_production ``` ### Instruction `backup_every_database` The instruction `backup_every_database` runs a `mysqldump` for every database (but not on the skipped ones). The instruction `backup_every_database` skips as default `information_schema` and `performance_schema` and more databases can be ignored using the instruction `skip_database`. Example: ``` # skip 2 databases skip_database "^BIG_DATABASE_PRODUCTION_ALPHA$" skip_database "^BIG_DATABASE_PRODUCTION_BETA$" # backup all the others backup_every_database ``` ### Instruction `skip_database` The instruction `skip_database` adds another database name from the exclusion list of `backup_every_database`. The instruction `skip_database` accepts only one argument expressed as a regular expression and has no effect if it's executed after `backup_every_database` or without a `backup_every_database` in your instructions. Examples: ``` # skip this specific database skip_database "^BIG_DATABASE_PRODUCTION_ALPHA$" # skip also all databases starting with the prefix 'OLD_' skip_database "$OLD_.*$" # backup all the remaining databases backup_every_database ``` ### Instruction `skip_database_table_data` The instruction `skip_database_table_data` can be used to skip the table data. The system automagically saves that table schema (and table triggers, etc.) in a separate file. The instruction can be used multiple times to specify more tables to be ignored. The instruction must be specified before any `backup_every_database` and before `backup_database` (on that specific database at least). ``` skip_database_table_data DBNAME1 TABLENAME1 skip_database_table_data DBNAME1 TABLENAME2 skip_database_table_data DBNAME2 TABLENAMEANOTHER ``` ### Instruction `backup_service_and_database` The instruction `backup_service_and_database` can be used to stop a service, backup its database, and restart the service. The instruction `backup_service_and_database` does not try to stop a not-running service and does not try to start it if it was not running. If it's not running it just backups the expressed database. Example: ``` backup_service_and_database tomcat9.service WEBAPPDB ``` ### Instruction `backup_phabricator` The instruction `backup_phabricator` puts a Phabricator installation in maintenance mode, then it dumps all its databases, and then it removes maintenance mode. Examples: ``` backup_phabricator /var/www/phabricator DATABASEPREFIX_ ``` ### Instruction `push_path_host` The instruction `push_path_host` sends local files (e.g. `/home/foo`) to a remote host (e.g. `example.com`). Example: ``` push_path_host /home/foo backupuser@example.com:/var/backups/remote-destination ``` The instruction `push_path_host` works running an `rsync` command and connecting to SSH to the remote host. So your local Unix user will run `ssh backupuser@example.com`. So, if it does not work, and if you have no idea how SSH works, just run these and press enter 10 times from your local Unix user: ``` ssh-keygen ssh-copy-id backupuser@example.com ``` If it still does not work and you don't know how to configure SSH or how to use rsync, trust me, RTMF about SSH and rsync. ### Instruction `push_daily_directory` The instruction `push_daily_directory` sends your local daily backup to a remote host (e.g. `example.com`). Example: ``` push_daily_directory backupuser@example.com:/var/backups/remote-destination ``` The instruction `push_daily_directory` internally uses the `push_path_host` passing the pathname `$BASE/$BOX/daily` as first argument. ## Utility `rotate.sh` +The utility `rotate.sh` rotates your backups, de-duplicating unchanged files thanks to hard-links. + +INFO: This is totally compatible with the default behavior of rsync. So if your backup is created with rsync, that's perfectly fine. + +IMPORTANT: The basic assumption is that you do NOT overwrite source files in place, but you just delete and replace them. DO NOT OVERWRITE source files to do not compromise your backups. So DO NOT use `rsync --inplace` LOL! + The utility `rotate.sh` is like logrotate applied to a directory. It's designed to configure backup data retention. The utility `rotate.sh` does not suffer from timezone changes. It has a mechanism to avoid to be launched twice by mistake. The utility `rotate.sh` has this help menu: ``` ./rotate.sh PATH DAYS MAX_ROTATIONS ``` * `PATH`: the directory to be rotated * `DAYS`: the minimum amount of days between each rotation * `MAX_ROTATIONS`: the maximum allowed rotation (the next one will be dropped) The utility `rotate.sh` creates directories named like `PATH` but with a suffix like `.1` and `.2` etc. up to `MAX_ROTATIONS`. The utility `rotate.sh` can be used to rotate a directory (e.g. `/var/backups`) every 1 day up to 30 days and automatically drop older rotations. Example: ``` $ sudo crontab -e # every 1 day at 2:00 rotate my latest backups (/var/backups) for max 30 times # NOTE: this creates /var/backups.{1..30} where .1 is the most recent and .30 the oldest #m h dom mon dow command 0 2 * * * /opt/micro-backup-script/rotate.sh /var/backups 1 30 ``` The utility `rotate.sh` allows to have longer times between rotations: ``` $ sudo crontab -e 00 1 * * * /opt/my-rotate.sh ``` ``` name=/opt/my-rotate.sh #!/bin/sh # rotate my backups every day for 30 days to have /var/backups.{1..30} # NOTE: this creates /var/backups.{1..30} /opt/micro-backup-script/rotate.sh /var/backups 1 30 # then rotate by oldest backup (/var/backups.30) every week for 10 times # NOTE: this creates /var/backups.30.{1.10} /opt/micro-backup-script/rotate.sh /var/backups.30 7 10 ``` +## Before Adoption + +- if you don't have `/bin/bash` (really?) in production, this tool is __not__ for you. lol + Premising that if you are able to define arrays and run some regexes in `/bin/sh`, patch welcome. +- if you need something with an active community, this tool is __not__ for you. lol + I don't expect this thing to become mainstream. Anyway, I have some free time to assist newcomers. + Just email me, or something. You can find my email in the git log. I don't bite. + ## License 2020-2024 Valerio Bozzolan, ER Informatica, contributors MIT License https://mit-license.org/ ## Contact For EVERY question, feel free to contact Valerio Bozzolan: https://boz.reyboz.it diff --git a/bootstrap.sh b/bootstrap.sh index d86c0a9..58009c0 100644 --- a/bootstrap.sh +++ b/bootstrap.sh @@ -1,313 +1,345 @@ #!/bin/bash ### # Part of a stupid script to backup some stuff # # This bootstrap.sh file does nothing by itself but loads useful stuff. # # This file is loaded from 'backup-everything.sh' or 'rotate.sh' # # Author: 2020-2024 Valerio Bozzolan, contributors # License: MIT ## # current directory export DIR="${BASH_SOURCE%/*}" if [[ ! -d "$DIR" ]]; then DIR="$PWD"; fi # check if the standard input is not a terminal export INTERACTIVE= if [ -t 0 ]; then INTERACTIVE=1 fi # # Check if this is the quiet mode # # Default - not quite. # # Actually we are in quiet mode if it's not interactive. # This lazy behavior is to avoid stupid emails from the crontab # without the need to specify some --quiet etc. # Note that in quiet mode only WARN and ERROR messages are shown. # I've not created a --quiet flag because nobody is needing it. # # Edit your options - do not edit here. # #QUIET= # # Eventually write a log file # # Default - write a log file. # # Edit your options - do not edit here. # export WRITELOG=1 # path to the instructions file export INSTRUCTIONS="$DIR/backup-instructions.conf" # path to the configuration file export CONFIG="$DIR/options.conf" # no config no party if [ ! -f "$CONFIG" ]; then echo "missing options expected in $CONFIG" exit 1 fi # default mysql commands # --batch: avoid fancy columns (auto-enabled, but better to specify it) # --silent: avoid the column name to be included export MYSQL="mysql --batch --silent" export MYSQLDUMP="mysqldump --routines --triggers" # default rsync command # --archive: Try to keep all the properties # --fuzzy: Try to check if a file was renamed instead of delete and download a new one # It's efficient for example with log rotated files. # --delete: Delete the destination files if not present in the source # NOTE: we want this behaviour but it's not a good idea toghether with --fuzzy # that's why we do not use --delete but we use the next flags # --delay-updates Put all updated files into place at end (useful with fuzzy and delete modes) # --delete-delay Delete after everything (useful with fuzzy and delete modes) # NOTE: sometime some data is kept in damn .~tmp~ directories # So we are deprecating --delete-delay, and going back to --delete # and so removing --fuzzy # --hard-links Try to look for hard links during the transfer to do not copy separate files #RSYNC="rsync --archive --fuzzy --delay-updates --delete-delay --hard-links" # default rsync command # --archive: Try to keep all the properties # --delete: Delete the destination files if not present in the source # --hard-links Try to look for hard links during the transfer to do not copy separate files export RSYNC="rsync --archive --delete --hard-links" # rsync used in remote transfers # --compress Use more CPU to save network bandwidth export RSYNC_REMOTE="$RSYNC --compress" # default base backup directory for all backups export BASE="/home/backups" # default box name BOX="$(hostname)" export BOX # set to 1 to avoid any disservice (e.g. systemctl stop/start) export NO_DISSERVICE= # set to 1 to do nothing export PORCELAIN= # How many hours should pass between each execution. # This is just a sane default to avoid daylight saving issues. export HOURS_INTERVAL=12 # include the configuration to eventually override some options # shellcheck source=config.sh . "$CONFIG" # as default, if not interactive, set quite mode if [ -z "$QUIET" ] && [ "$INTERACTIVE" != 1 ]; then QUIET=1 fi # full pathnames to the backup directories export BASEBOX="$BASE/$BOX" export DAILY="$BASEBOX/daily" export DAILY_FILES="$DAILY/files" export DAILY_DATABASES="$DAILY/databases" export DAILY_LASTLOG="$DAILY/last.log" export DAILY_LASTTIME="$DAILY/last.timestamp" export DAILY_STARTTIME="$DAILY/start.timestamp" # apply the porcelain to the rsync command if [ "$PORCELAIN" = 1 ]; then RSYNC="$RSYNC --dry-run" RSYNC_REMOTE="$RSYNC_REMOTE --dry-run" fi # set default backup_last_log() lines if [ -z "$BACKUP_LAST_LOG_LINES" ]; then BACKUP_LAST_LOG_LINES=8000 fi ## # Receive in input a file path, and a number of hours, and check whenever # enough time (in hours) was passed or not. # # If the file was never created, we assume that enough time was passed. # # @param string timestamp_file # @param int hours # function are_enough_hours_passed() { # No args, no party. # Note that the file argument will be checked later. local timestamp_file="$1" local expected_hours="$2" if [ -z "$expected_hours" ]; then echo "Error: Missing argument expected hours." exit 2 fi local expected_seconds=$((expected_hours * 3600)) are_enough_seconds_passed "$timestamp_file" "$expected_seconds" } ## # Receive in input a file path, and a number of hours, and check whenever # enough time (in seconds) was passed or not. # # If the file was never created, we assume that enough time was passed. # # @param string timestamp_file # @param int seconds # function are_enough_seconds_passed() { # No args, no party. local timestamp_file="$1" local expected_hours="$2" if [ -z "$timestamp_file" ]; then echo "Error: Missing argument timestamp file." exit 2 fi if [ -z "$expected_hours" ]; then echo "Error: Missing argument expected hours." exit 2 fi if [ -f "$timestamp_file" ]; then # Read the file, if it has sense. local last_timestamp=$(<"$timestamp_file") if [ "$last_timestamp" -lt 1000 ]; then echo "Error: Bad format in file $timestamp_file" exit 2 fi local current_timestamp=$(date +%s) local diff_seconds=$((current_timestamp - last_timestamp)) # If enough time is passed, return true. [ "$diff_seconds" -ge "$expected_seconds" ]; fi # The file doesn't exist. Return nothing special (0, that is True). } ## # Receive in input a file path, and write there the current timestamp. # # @param string timestamp_file # function write_timestamp() { # No arg, no party. local timestamp_file="$1" if [ -z "$timestamp_file" ]; then echo "Error: Missing timestamp file argument." exit 1 fi # Write the current Unix timestamp. date +%s > "$timestamp_file" } ### # Print something # # It also put the message in the backup directory # # @param string severity # @param string message # function printthis() { local msg msg="[$(date)][$1] $2" # print to standard output if it's not in quiet mode if [ "$QUIET" != 1 ]; then printf "%s\n" "$msg" fi # put in the log file if possible if [ -f "$DAILY_LASTLOG" ] && [ "$WRITELOG" = 1 ]; then printf "%s\n" "$msg" >> "$DAILY_LASTLOG" fi } ### -# Run an rsync +# Run an rsync archive copy +# @param string source +# @param string destionation # function copy() { # show what we are doing log "copy $*" # run the rsync command if [ "$PORCELAIN" != 1 ]; then $RSYNC $@ fi } +### +# Run an rsync archive copy, +# creating hard-links into the destination, +# instead of copying files. +# This saves a lot of space but you MUST pay attention +# to never touch the source files, or you will overwrite +# destination files. +# @param string source +# @param string destionation +# +function copy_using_hard_links() { + + local source="$1" + local dest="$2" + local source_abs=$(realpath "$source") + + if [ "$#" != 2 ]; then + echo "Bad usage of $0. Must have 2 arguments. Current arguments: $@" + exit 3 + fi + + # Show what we are doing, without being verbose. + log "copy using hard links from $source to $dest" + + # run the rsync command + if [ "$PORCELAIN" != 1 ]; then + $RSYNC --link-dest="$source_abs" "$source" "$dest" + fi +} + ### # Remove a pathname # function drop() { # show what we are doing log "drop $*" # well, proceed... finger crossed... with some protections if [ "$PORCELAIN" != 1 ]; then rm --recursive --force --one-file-system --preserve-root -- $@ fi } ### # Move something somewhere # function move() { # show what we are doing log "move $*" if [ "$PORCELAIN" != 1 ]; then mv --force $@ fi } ### # Print a information message # # @param msg Message # function log() { printthis INFO "$1" } ### # Print a warning message # # @param msg Message # function warn() { printthis WARN "$1" } ### # Print an error message # # @param msg Message # function error() { printthis ERROR "$1" } diff --git a/rotate.sh b/rotate.sh index 5f5701e..8a1b4eb 100755 --- a/rotate.sh +++ b/rotate.sh @@ -1,157 +1,163 @@ #!/bin/bash ### -# Stupid script to rotate a backup +# Stupid script to rotate backup, +# creating incremental backups thanks to +# hard-links. # # Author: Valerio B. # Date: Wed 4 Ago 2020 # License: CC 0 - public domain ## # do not proceed in case of errors set -e # current directory MYDIR="$(dirname "$(realpath "$0")")" # as default don't be quiet while rotating QUIET=0 # as default don't write in the log while rotating WRITELOG=0 # # Maximum time that your rotation could last # # Right now this should be a good default since it doesn't make much sense # for a rotation to take more than this number of hours. # # If the script takes longer than this, the next rotation may not run. # # Note: at the moment this must be shorter than a single day. # # Current default: 6 hours (6 * 60 * 60 = 21600 seconds) MAX_ROTATE_SECONDS=21600 # include all the stuff and useful functions . "$MYDIR"/bootstrap.sh # arguments place="$1" days="$2" max="$3" # expected file containing last timestamp last_timestamp_file="$place.timestamp" # show usage function show_help_rotate() { echo "USAGE" echo " $0 PATH DAYS MAX_ROTATIONS" echo "EXAMPLE" echo " $0 /home/backups 1 30" } function harden() { local harden_path="$1" # no path no party if [ -z "$harden_path" ]; then echo "Wrong usage of harden" exit 2 fi # Harden rotations # # Note that non-privileged users should be able to push their last copy, # but MUST not in any way be able to touch older copies chown root:root "$harden_path" chmod 600 "$harden_path" } # all the arguments must exist (just check the last one) if [ -z "$max" ]; then echo "Bad usage" show_help_rotate exit 1 fi # the place to be rotated must exist if [ ! -e "$place" ]; then error "unexisting directory '$place'" exit 2 fi # validate max parameter if [ "$max" -lt 2 ]; then echo "The MAX parameter must be greater than 1" show_help_rotate exit 3 fi # expected seconds from the last rotation before continuing # NOTE: leave the star escaped to avoid syntax error in expr expected_seconds=$(expr "$days" "*" 86400) # check if the duration in seconds is a day or more if [ "$expected_seconds" -ge 86400 ]; then # the expected time since the last execution is never exactly the number of days in seconds # Solution: remove few hours from the expected (just to say, uhm, 5 hours) expected_seconds=$(expr "$expected_seconds" - "$MAX_ROTATE_SECONDS") fi # do not proceed if not enough time passed since last execution on that directory # this avoids daylight saving time change problems # this also avoids race conditions when starting parallel executions by mistake if ! are_enough_seconds_passed "$last_timestamp_file" "$expected_seconds"; then warn "doing nothing: last rotation was executed too recently on $place: now-last $(date +%s)-$(< "$last_timestamp_file") - expected at least $expected_seconds seconds" exit 0 fi # save the last timestamp before rotating everything # this will avoid even parallel rotations write_timestamp "$last_timestamp_file" # eventually drop the last backup step # if it does not exist, don't care max_path="$place.$max" drop "$max_path" # shift all the backups after="$max" while [[ "$after" -gt 1 ]]; do before=$(expr "$after" - 1) # do not process the root directory for no reason in the world if you type that by mistake # the --preserve-root is already implicit but... let's be sure! asd before_path="$place.$before" after_path="$place.$after" # the source must exist. asd if [ -e "$before_path" ]; then # the trailing slash means: copy files and not just the directory move "$before_path/" "$after_path" harden "$after_path" fi # next after="$before" done # at the end, move the base forward # the trailing slash means: copy files and not just the directory -copy "$place/" "$place.1" -harden "$place.1" +copy_using_hard_links "$place/" "$place.1" + +# Make sure that other users cannot see this path. +# This is usually since you may want to preserve original context in YOUR files, +# but having a strict parent context in OUR generated files. +harden "$place.1" # yeah! log "rotation concluded"