diff --git a/README.md b/README.md index 32e8071..4dd9535 100644 --- a/README.md +++ b/README.md @@ -1,401 +1,404 @@ # Micro Backup script This is a keep it simple and stupid backup script with de-duplication in mind, completely written in Bash. Authored by Valerio Bozzolan under a Free license. ## Features - Unix-way - support on-site backups based on simple filesystem directories - easily creates a backup tower where nodes can make a push with limited privileges - MySQL/MariaDB databases - support local or remote database servers - ability to dump all databases (without knowing them a priori) - ability to dump all databases but skip some databases - ability to dump only specific databases - ability to don't dump the data of some specific table names, with regex support - ability to stop a systemd service and start it again exactly during a database dump - ability to easily dump Phorge/Phabricator databases (https://we.phorge.it/) (60+ databases) automatically activating/de-activating the maintenance read-only mode before/after the dump - filesystem - ability to preserve all file attributes - create an on-site backup and specify the destination in one configuration - ability to backup a file, or a directory, with the same syntax - ability to backup only the last lines of a very big log file - transfer - ability to transfer the on-site backup whenever you want with network compression - scalability - ability to Designed by design to be easily replicated on several other node - time mess - ability to prevent parallel executions, for example to prevent common issues during daylight saving time changes - Antani - we 100% support Antani but only in particular conditions. Contact sales. ## Installation +You may want to install the `rdfind` utility before running this script. That tool is available in any package manager. + You should "install" (download) this thing for every host that should have on-site backups, or push its backup. Installing this has sense in all my user cases, so on-site backups can be executed even if the host goes offline. Choose a directory and then clone this repository: ``` git clone URL ``` Example: ``` sudo -i cd /opt git clone https://gitpull.it/source/micro-backup-script/ ``` Please change the URL to point on your own fork (if you have one). ## Configuration Enter in the cloned directory and run: ``` cp backup-instructions-example.conf backup-instructions.conf cp options-example.conf options.conf ``` Then edit these files with your favorite text editor: * `options.conf` * `backup-instructions.conf` ## Usage After you configured the script for your needs, there are no arguments. Just run this: ``` sudo ./backup-everything.sh ``` You can schedule it from your crontab. Example: ``` $ sudo crontab -e # on-site backup at night #m h dom mon dow command 00 1 * * * /opt/micro-backup-script/backup-everything.sh ``` ## Options documentation (`options.conf`) The file `options.conf` is designed to store important options (like `BOX` and `BASE`). The file `options.conf` can be copied from the example called `options-example.conf`. ### Option `BOX` The `BOX` option sets the human name for your local computer. Default: current `hostname`. The `BOX` option is recommended in order to be independent from the hostname and have more stable backup pathnames. The `BOX` option is used to create a directory with the same name. Here 3 examples: ``` BOX=gargantua BOX=my-local-nice-machine BOX=server001.example.com ``` NOTE: The `BOX` option is used to build the final pathname of your on-site backups. See below. ### Option `BASE` The `BASE` option sets the base pathname for on-site backups. Default: `/home/backups` for historical reasons. The `BASE` option is strongly suggested in order to be independent from the system default. The `BASE` option should **not** end with a slash. The `BASE` option should contain a valid filesystem pathname. Here 3 examples: ``` BASE=/var/backups/stark-industries BASE=/mnt/stark-industries BASE=/tmp/test-backups ``` For example if you set `BASE=/var/backups/stark-industries` and `BOX=gargantua`, your on-site backups will be placed in `/var/backups/stark-industries/gargantua`. ### Option `PORCELAIN` The option `PORCELAIN` allows to debug everything and do nothing. It's useful to see what would be done without executing any command. The option `PORCELAIN` accepts an empty value (default) or `1`. When `1` the flag is activated. Here 2 examples: ``` # run all instructions normally (default) PORCELAIN= # do not run any instruction but just print them PORCELAIN=1 ``` The option `PORCELAIN` will skip the following actions when set to `1`: * skip any database dump (avoiding to run `mysqldump`) * skip any data transfer via rsync (adding a `--dry-run') * skip any systemd stop/start command ### Option `NO_DISSERVICE` The option `NO_DISSERVICE` is a flag that can avoid any command related to a systemd service. It's disabled as default. The option `NO_DISSERVICE` is only useful if you use some backup instructions related to systemd and you want to debug them. The option `NO_DISSERVICE` accepts an empty value (default) or `1`. When `1` the flag is activated. Here 2 examples: ``` # run all systemd-related instructions normally (default) NO_DISSERVICE= # do not run any systemd-related instruction NO_DISSERVICE=1 ``` ### Option `HOURS_INTERVAL` The option `HOURS_INTERVAL` can be used to set a desired minimum time window (in hours) between each execution. The option `HOURS_INTERVAL` is particularly effective to mitigate daylight saving issues and race conditions in general. The option `HOURS_INTERVAL` defaults to `12` hours. The option `HOURS_INTERVAL` can be disabled with a value of `0` to always backup everytime you call the script. ## Instructions documentation (`backup-instructions.conf`) The file `backup-instructions.conf` contains the backup commands (which databases should be saved, which pathnames, etc.) The file `backup-instructions.conf` can be copied from an example called `backup-instructions-example.conf`. ### Instruction `backup_path` The instruction `backup_path` instruction does an on-site copy of a directory or a single file. Examples: ``` # save the Unix configuration files backup_path /etc # save all user homes backup_path /home # save a copy of this specific log file backup_path /mnt/something/log.err ``` The data will be saved in a sub-directory of `$BASE/$BOX/daily/files` keeping the original structure. For example the path `/mnt/something/log.err` will be stored under `$BASE/$BOX/daily/files/mnt/something/log.err`. ### Instruction `backup_paths` The instruction `backup_paths` (note it ends with an "s") allows to save multiple pathnames or use Bash globs to capture multiple pathnames. Example: ``` # backup only the log files who start with "Kern" backup_paths /var/log/Kern* # backup these pathnames backup_paths /home/mario /home/wario ``` ### Instruction `backup_last_log_lines` The instruction `backup_last_log_lines` saves the last lines of a long txt file. Example: ``` backup_last_log_lines /var/log/secure ``` ### Instruction `backup_database` The instruction `backup_database` runs a `mysqldump` on a specific database and compress it with `gzip`. The instruction `backup_database` saves the database under `$BASE/$BOX/daily/databases/$DATABASE_NAME.sql.gzip`. Examples: ``` # first, backup a database with a specific name backup_database wordpress_testing # then, backup another database backup_database wordpress_production ``` ### Instruction `backup_every_database` The instruction `backup_every_database` runs a `mysqldump` for every database (but not on the skipped ones). The instruction `backup_every_database` skips as default `information_schema` and `performance_schema` and more databases can be ignored using the instruction `skip_database`. Example: ``` # skip 2 databases skip_database "^BIG_DATABASE_PRODUCTION_ALPHA$" skip_database "^BIG_DATABASE_PRODUCTION_BETA$" # backup all the others backup_every_database ``` ### Instruction `skip_database` The instruction `skip_database` adds another database name from the exclusion list of `backup_every_database`. The instruction `skip_database` accepts only one argument expressed as a regular expression and has no effect if it's executed after `backup_every_database` or without a `backup_every_database` in your instructions. Examples: ``` # skip this specific database skip_database "^BIG_DATABASE_PRODUCTION_ALPHA$" # skip also all databases starting with the prefix 'OLD_' skip_database "$OLD_.*$" # backup all the remaining databases backup_every_database ``` ### Instruction `skip_database_table_data` The instruction `skip_database_table_data` can be used to skip the table data. The system automagically saves that table schema (and table triggers, etc.) in a separate file. The instruction can be used multiple times to specify more tables to be ignored. The instruction must be specified before any `backup_every_database` and before `backup_database` (on that specific database at least). ``` skip_database_table_data DBNAME1 TABLENAME1 skip_database_table_data DBNAME1 TABLENAME2 skip_database_table_data DBNAME2 TABLENAMEANOTHER ``` ### Instruction `backup_service_and_database` The instruction `backup_service_and_database` can be used to stop a service, backup its database, and restart the service. The instruction `backup_service_and_database` does not try to stop a not-running service and does not try to start it if it was not running. If it's not running it just backups the expressed database. Example: ``` backup_service_and_database tomcat9.service WEBAPPDB ``` ### Instruction `backup_phabricator` The instruction `backup_phabricator` puts a Phabricator installation in maintenance mode, then it dumps all its databases, and then it removes maintenance mode. Examples: ``` backup_phabricator /var/www/phabricator DATABASEPREFIX_ ``` ### Instruction `push_path_host` The instruction `push_path_host` sends local files (e.g. `/home/foo`) to a remote host (e.g. `example.com`). Example: ``` push_path_host /home/foo backupuser@example.com:/var/backups/remote-destination ``` The instruction `push_path_host` works running an `rsync` command and connecting to SSH to the remote host. So your local Unix user will run `ssh backupuser@example.com`. So, if it does not work, and if you have no idea how SSH works, just run these and press enter 10 times from your local Unix user: ``` ssh-keygen ssh-copy-id backupuser@example.com ``` If it still does not work and you don't know how to configure SSH or how to use rsync, trust me, RTMF about SSH and rsync. ### Instruction `push_daily_directory` The instruction `push_daily_directory` sends your local daily backup to a remote host (e.g. `example.com`). Example: ``` push_daily_directory backupuser@example.com:/var/backups/remote-destination ``` The instruction `push_daily_directory` internally uses the `push_path_host` passing the pathname `$BASE/$BOX/daily` as first argument. ## Utility `rotate.sh` The utility `rotate.sh` rotates your backups, de-duplicating unchanged files thanks to hard-links. INFO: This is totally compatible with the default behavior of rsync. So if your backup is created with rsync, that's perfectly fine. IMPORTANT: The basic assumption is that you do NOT overwrite source files in place, but you just delete and replace them. DO NOT OVERWRITE source files to do not compromise your backups. So DO NOT use `rsync --inplace` LOL! The utility `rotate.sh` is like logrotate applied to a directory. It's designed to configure backup data retention. The utility `rotate.sh` does not suffer from timezone changes. It has a mechanism to avoid to be launched twice by mistake. The utility `rotate.sh` has this help menu: ``` ./rotate.sh PATH DAYS MAX_ROTATIONS ``` * `PATH`: the directory to be rotated * `DAYS`: the minimum amount of days between each rotation * `MAX_ROTATIONS`: the maximum allowed rotation (the next one will be dropped) The utility `rotate.sh` creates directories named like `PATH` but with a suffix like `.1` and `.2` etc. up to `MAX_ROTATIONS`. The utility `rotate.sh` can be used to rotate a directory (e.g. `/var/backups`) every 1 day up to 30 days and automatically drop older rotations. Example: ``` $ sudo crontab -e # every 1 day at 2:00 rotate my latest backups (/var/backups) for max 30 times # NOTE: this creates /var/backups.{1..30} where .1 is the most recent and .30 the oldest #m h dom mon dow command 0 2 * * * /opt/micro-backup-script/rotate.sh /var/backups 1 30 ``` -The utility `rotate.sh` allows to have longer times between rotations: - -``` -$ sudo crontab -e - -00 1 * * * /opt/my-rotate.sh -``` +The utility `rotate.sh` allows to have longer times between rotations. +For example you can have 30 daily rotations, and then, additional 10 weekly rotations, with: ``` name=/opt/my-rotate.sh #!/bin/sh # rotate my backups every day for 30 days to have /var/backups.{1..30} # NOTE: this creates /var/backups.{1..30} /opt/micro-backup-script/rotate.sh /var/backups 1 30 # then rotate by oldest backup (/var/backups.30) every week for 10 times # NOTE: this creates /var/backups.30.{1.10} /opt/micro-backup-script/rotate.sh /var/backups.30 7 10 ``` +``` +$ sudo crontab -e + +00 1 * * * /opt/my-rotate.sh +``` + ## Before Adoption - if you don't have `/bin/bash` (really?) in production, this tool is __not__ for you. lol Premising that if you are able to define arrays and run some regexes in `/bin/sh`, patch welcome. - if you need something with an active community, this tool is __not__ for you. lol I don't expect this thing to become mainstream. Anyway, I have some free time to assist newcomers. Just email me, or something. You can find my email in the git log. I don't bite. ## License 2020-2024 Valerio Bozzolan, ER Informatica, contributors MIT License https://mit-license.org/ ## Contact For EVERY question, feel free to contact Valerio Bozzolan: https://boz.reyboz.it diff --git a/rotate.sh b/rotate.sh index 8a1b4eb..fae3520 100755 --- a/rotate.sh +++ b/rotate.sh @@ -1,163 +1,226 @@ #!/bin/bash ### # Stupid script to rotate backup, # creating incremental backups thanks to # hard-links. # # Author: Valerio B. # Date: Wed 4 Ago 2020 # License: CC 0 - public domain ## # do not proceed in case of errors set -e # current directory MYDIR="$(dirname "$(realpath "$0")")" # as default don't be quiet while rotating QUIET=0 # as default don't write in the log while rotating WRITELOG=0 # # Maximum time that your rotation could last # # Right now this should be a good default since it doesn't make much sense # for a rotation to take more than this number of hours. # # If the script takes longer than this, the next rotation may not run. # # Note: at the moment this must be shorter than a single day. # # Current default: 6 hours (6 * 60 * 60 = 21600 seconds) MAX_ROTATE_SECONDS=21600 # include all the stuff and useful functions . "$MYDIR"/bootstrap.sh # arguments place="$1" days="$2" max="$3" # expected file containing last timestamp last_timestamp_file="$place.timestamp" # show usage function show_help_rotate() { echo "USAGE" echo " $0 PATH DAYS MAX_ROTATIONS" echo "EXAMPLE" echo " $0 /home/backups 1 30" } function harden() { local harden_path="$1" # no path no party if [ -z "$harden_path" ]; then echo "Wrong usage of harden" exit 2 fi # Harden rotations # # Note that non-privileged users should be able to push their last copy, # but MUST not in any way be able to touch older copies chown root:root "$harden_path" chmod 600 "$harden_path" } # all the arguments must exist (just check the last one) if [ -z "$max" ]; then echo "Bad usage" show_help_rotate exit 1 fi # the place to be rotated must exist if [ ! -e "$place" ]; then error "unexisting directory '$place'" exit 2 fi # validate max parameter if [ "$max" -lt 2 ]; then echo "The MAX parameter must be greater than 1" show_help_rotate exit 3 fi # expected seconds from the last rotation before continuing # NOTE: leave the star escaped to avoid syntax error in expr expected_seconds=$(expr "$days" "*" 86400) # check if the duration in seconds is a day or more if [ "$expected_seconds" -ge 86400 ]; then # the expected time since the last execution is never exactly the number of days in seconds # Solution: remove few hours from the expected (just to say, uhm, 5 hours) expected_seconds=$(expr "$expected_seconds" - "$MAX_ROTATE_SECONDS") fi # do not proceed if not enough time passed since last execution on that directory # this avoids daylight saving time change problems # this also avoids race conditions when starting parallel executions by mistake if ! are_enough_seconds_passed "$last_timestamp_file" "$expected_seconds"; then warn "doing nothing: last rotation was executed too recently on $place: now-last $(date +%s)-$(< "$last_timestamp_file") - expected at least $expected_seconds seconds" exit 0 fi # save the last timestamp before rotating everything # this will avoid even parallel rotations write_timestamp "$last_timestamp_file" # eventually drop the last backup step # if it does not exist, don't care max_path="$place.$max" drop "$max_path" # shift all the backups after="$max" while [[ "$after" -gt 1 ]]; do before=$(expr "$after" - 1) # do not process the root directory for no reason in the world if you type that by mistake # the --preserve-root is already implicit but... let's be sure! asd before_path="$place.$before" after_path="$place.$after" # the source must exist. asd if [ -e "$before_path" ]; then # the trailing slash means: copy files and not just the directory move "$before_path/" "$after_path" harden "$after_path" fi # next after="$before" done # at the end, move the base forward # the trailing slash means: copy files and not just the directory +# this should be able to create a copy that is lightweight than the original if few +# things changed. So later "rdfind" has less work to do. copy_using_hard_links "$place/" "$place.1" # Make sure that other users cannot see this path. # This is usually since you may want to preserve original context in YOUR files, # but having a strict parent context in OUR generated files. harden "$place.1" +# +# De-duplicate the first rotations. +# This saves ~1TB of data for 20 rotations of 100 computers in my use-case :) +# +# - Why not de-duplicating "$place"? +# We avoid to touch the source, since the source can be wrote and that's an additional risk. +# I'm 9999.99% sure that we could also write the original source "$place", +# but again, it's an unnecessary risk. +# So this is a security mitigation. +# - Why de-duplicating only "$place.1" and "$place.2"? +# It's not useful to rotate ALL rotations since the 1+2 will be rotated. +# It's rare that you generate a duplicate file between rotation 1 and rotation 30, +# So this is a micro-optimization. +# +# Do these path exists? +if [ -d "$place.1" ] && [ -d "$place.2" ]; then + # Check that rdfind is installed in your system. + if which rdfind > /dev/null; then + # rdfind arguments: + # -makehardlinks: this does the de-duplication trick while keeping data consistency. + # Yuppie! + # -removeidentinode: this sounds scary for backups, and was disabled. + # I don't even know why it is not auto-disabled with the next one. + # -makeresultsfile For some reasons this script creates a report. We don't want it. + # We are already happy with its output. + # -checksum: the default is sha1, but it's not a secure method nowadays, + # so, adopting sha256 is better, slower but more safe. + # The risk with sha1 is that an attacker can push exact copies + # with same size and same hash and same initial and final bytes, + # to try to compromise future rotations. + # -> please avoid sha1 + # -> please adopt sha256 or bigger (but think about performance) + # -minsize Under this size in bytes, files are skipped. + # The default is just 1 bytes, to skip empty files. + # It may be really inconvenient to process billions of tiny files, + # so, I suggest to keep this at least at some megabytes, + # To receive some visible benefits in terms of saved disk space, + # and do not cause intensive read operations to run sha256 (or whatever) + # over millions of files. + # -sleep: This is just a sleep between each file, to do not over-heat your drives. + # This is just to extend the life of your backup drives. + # This is probably reasonable since this backup is supposed to be executed + # on daily basis, so 1 millisecond at least between each file is probably + # a good default. + log "de-duplication started on $place.1 and $place.2" + rdfind \ + -makehardlinks true \ + -removeidentinode false \ + -makeresultsfile false \ + -minsize 1000000 \ + -checksum sha256 \ + -sleep 1ms \ + "$place.1" \ + "$place.2" + + log "de-duplication concluded" + else + warn "rdfind is not installed in this system so we cannot de-duplicate your backup rotations" + fi +fi + # yeah! log "rotation concluded"