diff --git a/docs/how-xtrabackup-works.md b/docs/how-xtrabackup-works.md index aa2250fd..705a0227 100644 --- a/docs/how-xtrabackup-works.md +++ b/docs/how-xtrabackup-works.md @@ -1,114 +1,127 @@ # How Percona XtraBackup works -Percona XtraBackup is based on InnoDB’s crash-recovery functionality. -It copies your InnoDB data files, which results in data that is internally -inconsistent; but then it performs crash recovery on the files to make them a -consistent, usable database again. - -This works because InnoDB maintains a redo log, also called the transaction -log. This contains a record of every change to InnoDB data. When InnoDB -starts, it inspects the data files and the transaction log, and performs two -steps. It applies committed transaction log entries to the data files, and it -performs an undo operation on any transactions that modified data but did not -commit. - -The `--register-redo-log-consumer` parameter is disabled by default. When enabled, this parameter lets Percona XtraBackup register as a redo log consumer at the start of the backup. The server does not remove a redo log that Percona XtraBackup (the consumer) has not yet copied. The consumer reads the redo log and manually advances the log sequence number (LSN). The server blocks the writes during the process. Based on the redo log consumption, the server determines when it can purge the log. - -Percona XtraBackup remembers the LSN when it starts, and then copies the data files. The operation takes time, and the files may change, then LSN reflects the state of the database at different points in time. Percona XtraBackup also runs a background process that watches the transaction log files, and copies any changes. Percona XtraBackup does this continually. The transaction logs are written in a round-robin fashion, and can be reused. - -Percona XtraBackup uses [Backup locks :octicons-link-external-16:](https://docs.percona.com/percona-server/innovation-release/backup-locks.html) where available as a lightweight alternative to `FLUSH TABLES WITH READ LOCK`. MySQL {{vers}} allows acquiring an instance level backup lock via the `LOCK INSTANCE FOR BACKUP` statement. - -Locking is only done for MyISAM and other non-InnoDB tables -after Percona XtraBackup finishes backing up all InnoDB/XtraDB data and -logs. Percona XtraBackup uses this automatically to copy non-InnoDB data to -avoid blocking DML queries that modify InnoDB tables. - -!!! important - - The `BACKUP_ADMIN` privilege is required to query the - `performance_schema_log_status` for either `LOCK - INSTANCE FOR BACKUP` or `LOCK TABLES FOR BACKUP`. - -xtrabackup tries to avoid backup locks and `FLUSH TABLES WITH READ LOCK` -when the instance contains only InnoDB tables. In this case, xtrabackup -obtains binary log coordinates from `performance_schema.log_status`. `FLUSH -TABLES WITH READ LOCK` is still required in MySQL {{vers}} when xtrabackup is -started with the `--slave-info`. The `log_status` table in Percona -Server for MySQL {{vers}} is extended to include the relay log coordinates, so no locks are -needed even with the `--slave-info` option. +!!! warning "Prerequisites" +`BACKUP_ADMIN` is required to query `performance_schema.log_status` and to use `LOCK INSTANCE FOR BACKUP`, `LOCK TABLES FOR BACKUP`, or `LOCK BINLOG FOR BACKUP`. Additional privileges may be required depending on options: `RELOAD`, `LOCK TABLES`, and `REPLICATION CLIENT` are still needed for scenarios that use `FLUSH TABLES WITH READ LOCK` or `--slave-info`. Grant the minimum set for your use case; see [Connection and privileges needed](privileges.md) for the full list and examples. -!!! admonition "See also" +## High-level process: three phases + +Percona XtraBackup (PXB) follows a sequential lifecycle. Understanding the three phases helps you reason about backup consistency and where blocking can occur. + +| Step | Phase | What happens | +|------|--------|----------------| +| 1 | Hot copy (Backup) | PXB copies data files while the server is running and tracks changes via the redo log. | +| 2 | Make the data consistent (Prepare) | PXB applies the captured redo log to the copied data, then rolls back uncommitted transactions (roll forward + roll back). | +| 3 | Deployment (Restore) | The prepared, consistent data is copied or moved back to the server data directory. | + +Under the hood, PXB relies on InnoDB’s crash-recovery model: PXB copies data files (which are momentarily inconsistent), then replays the redo log and applies undo to produce a consistent snapshot. The sections below expand on each phase. + +!!! note "Documentation scope" + This guide covers Percona XtraBackup 8.4. Server behavior (for example, `performance_schema.log_status`) can vary by MySQL or Percona Server version. Verify your build with `xtrabackup --version` and your server with the server version command (for example, `SELECT VERSION();`). Check release notes or vendor documentation when in doubt. + +!!! admonition "See also" [MySQL Documentation: LOCK INSTANCE FOR BACKUP :octicons-link-external-16:](https://dev.mysql.com/doc/refman/{{vers}}/en/lock-instance-for-backup.html) -When backup locks are supported by the server, xtrabackup first copies -InnoDB data, runs the `LOCK TABLES FOR BACKUP` and then copies the MyISAM -tables. Once this is done, the backup of the files will -begin. It will backup .frm, .MRG, .MYD, .MYI, .CSM, -.CSV, `.sdi` and `.par` files. +## Technical deep-dive: Backup phase + +This section explains how PXB avoids blocking your database during the hot copy. + +### Redo log thread: capturing changes in real time + +Percona XtraBackup records the log sequence number (LSN) when the backup starts, then copies InnoDB data files. Because the copy takes time, the on-disk files change while copying. A background thread runs for the duration of the backup: the thread watches the redo (transaction) log files and continuously copies new log data. The redo log is written in a round-robin fashion and can be reused; the background thread ensures PXB has captured all changes up to a consistent point. + +The backup and locking lifecycle in text form: (1) Phase 1 — Non-blocking: record LSN, copy InnoDB data and redo while a background thread follows new redo. (2) Phase 2 — Lightweight lock: under LOCK INSTANCE FOR BACKUP or LOCK TABLES FOR BACKUP, copy non-InnoDB files. (3) Phase 3 — Final sync: under LOCK BINLOG FOR BACKUP, finish redo copy, fetch binlog/replica coordinates, then unlock and exit. DML continues during Phase 1; Phase 2 blocks only DDL; Phase 3 is a brief hold to capture a consistent binlog position. + +The following diagram shows the same flow. If the diagram does not render (for example, in a viewer that does not support Mermaid), use the text description above. + +``` mermaid +--- +title: MySQL Backup Phases +--- +flowchart LR + subgraph Phase1["Phase 1: Non-blocking"] + A[Record LSN] --> B[Copy InnoDB data + redo] + B --> C[Background thread follows redo] + end + subgraph Phase2["Phase 2: Lightweight lock"] + D[LOCK INSTANCE / TABLES FOR BACKUP] --> E[Copy non-InnoDB files] + end + subgraph Phase3["Phase 3: Final sync"] + F[LOCK BINLOG FOR BACKUP] --> G[Finish redo copy] + G --> H[Fetch binlog/replica coords] + H --> I[Unlock & exit] + end + Phase1 --> Phase2 --> Phase3 + +``` + +Redo log consumer (8.4): In MySQL 8.x, redo logs are highly volatile. The optional parameter `--register-redo-log-consumer` (disabled by default) lets PXB register as a redo log consumer at backup start. The server will not remove a redo log file until PXB (the consumer) has copied that file. The consumer reads the log and advances the LSN; the server may block writes briefly during that process and uses consumption to decide when the server can purge the log. + +!!! important "High-write servers" + On busy servers, the server can reuse or purge redo log files before PXB has copied them. Enable `--register-redo-log-consumer` for high-write workloads to reduce the risk of backup failure. + +!!! warning "Redo log consumer: disk and server impact" + Enabling `--register-redo-log-consumer` prevents the server from purging redo until PXB has copied the redo. On high-write systems, enabling the option can retain more redo on disk and increase disk usage ("redo bloat"). Monitor disk space and server I/O when using this option; the trade-off is backup reliability and server-side redo retention. + + Emergency — if disk becomes critically full: The consumer is released when the backup process exits. Stop the backup (Ctrl+C or send SIGTERM to the xtrabackup process) so the server can purge redo log files again. Resolve disk usage before retrying; consider enabling the consumer only when sufficient disk headroom exists. + +### Locking hierarchy: minimal blocking + +Backup locks are a [lightweight alternative to `FLUSH TABLES WITH READ LOCK`](https://docs.percona.com/percona-server/innovation-release/backup-locks.html). MySQL {{vers}} supports an instance-level backup lock via `LOCK INSTANCE FOR BACKUP`. + +* Phase 1 — Non-blocking: InnoDB data files and redo log are copied while DML continues. No global lock is held during this phase. + +* Phase 2 — Lightweight lock: When backup locks are supported, PXB uses them so that non-InnoDB data can be copied without blocking DML on InnoDB. This lock blocks DDL (for example, `CREATE`, `ALTER`, `DROP`) but allows DML (INSERT, UPDATE, DELETE). With the default `--lock-ddl=ON`, the backup lock is taken at the start of the backup; with `--lock-ddl=REDUCED`, the lock is taken only after copying InnoDB data. Under the lock, PXB copies non-InnoDB files: .frm, .MRG, .MYD, .MYI, .CSM, .CSV, `.sdi`, and `.par`. + +* Phase 3 — Final sync: PXB uses `LOCK BINLOG FOR BACKUP` to briefly block operations that would change binary log position or replica coordinates (`Exec_Source_Log_Pos`, `Exec_Gtid_Set`). PXB then finishes copying the redo log and fetches binary log coordinates from `performance_schema.log_status`, after which PXB releases the backup and binlog locks. The binary log position is printed to STDERR (redirect to a file if needed, for example `xtrabackup OPTIONS 2> backupout.log`), and PXB exits with 0 on success. + +Locking is only needed for MyISAM and other non-InnoDB tables after InnoDB data and logs are backed up, so DML on InnoDB is not blocked during the main copy. + +### When locks are avoided -After that xtrabackup will use `LOCK BINLOG FOR BACKUP` to block all -operations that might change either binary log position or -`Exec_Source_Log_Pos` or `Exec_Gtid_Set` (i.e. source binary log coordinates -corresponding to the current SQL thread state on a replication replica) as -reported by `SHOW BINARY LOG STATUS` or `SHOW REPLICA STATUS`. xtrabackup will then finish copying -the REDO log files and fetch the binary log coordinates. After this is completed -xtrabackup will unlock the binary log and tables. +When all tables in every schema are InnoDB, PXB can avoid backup locks and obtain binary log coordinates from `performance_schema.log_status` only. In practice, the `mysql` system schema often contains non-InnoDB tables (for example, MyISAM or CSV, such as `general_log`), so backup locks are usually still taken. Treat "lockless" as applying only when you have confirmed that no schema uses MyISAM or other non-InnoDB engines. On Percona Server for MySQL {{vers}}, `log_status` is extended to include relay log coordinates, so no extra locks are needed even with `--slave-info`. On standard MySQL {{vers}}, `FLUSH TABLES WITH READ LOCK` is still required when using `--slave-info` if relay log position is needed. -Finally, the binary log position will be printed to `STDERR` and xtrabackup -will exit returning 0 if all went OK. +See [Index of files created by Percona XtraBackup](xtrabackup-files.md) for the files created in the backup directory. -Note that the `STDERR` of xtrabackup is not written in any file. You will -have to redirect it to a file, for example, `xtrabackup OPTIONS 2> backupout.log`. +### Cloud and streaming backups (xbcloud, S3, Azure) -It will also create the following files in the -directory of the backup. +When backing up directly to cloud storage (for example, via `xbcloud` to S3 or Azure Blob), the backup lifecycle is the same, but the Final Sync phase is subject to network latency: the binlog lock may be held longer while redo and metadata are flushed to the remote endpoint. Plan for a longer binlog lock hold when using streaming or cloud backups in production. See [Take a streaming backup](take-streaming-backup.md) and [xbcloud binary overview](xbcloud-binary-overview.md) for cloud-specific behavior. -## Prepare phase +## Technical deep-dive: Prepare phase (recovery) -This phase involves two primary operations: applying the redo log and the undo log. +The `--prepare` step turns the raw backup into a consistent snapshot by applying redo and then undoing uncommitted work. The prepare step aligns InnoDB with the backup’s sync point (or, when `FLUSH TABLES WITH READ LOCK` is used, the time that lock was taken), so that InnoDB and MyISAM data are consistent with each other. -### Redo Log Application (Physical Operation) +### Part A: Redo application (physical) -XtraBackup directly applies changes recorded in the redo log to specific page offsets within the tablespace (IBD file). This is a physical operation, meaning it works at the page level, without regard for rows or transactions. +XtraBackup applies changes from the redo log directly to page offsets in the tablespace (IBD files). Redo application is a physical operation: the operation works at the page level, not at the row or transaction level. The redo log can contain uncommitted transactions (the server may flush them to disk), so redo application alone does not guarantee transactional consistency. -It's important to understand that the redo log might contain uncommitted transactions, as the server can flush or write these to the log. Therefore, the redo log application doesn't inherently guarantee transactional consistency. +### Part B: Undo application (logical, with SDI) -### Undo Log Application (Logical Operation) +After redo, XtraBackup uses the undo log to logically roll back any uncommitted transactions whose changes appear in the redo log. Undo records are typed as `INSERT` or `UPDATE` and carry a `table_id`. To perform rollback, XtraBackup initializes the InnoDB engine and data dictionary, then uses Serialized Dictionary Information (SDI) from the tablespace—a JSON representation of the table—to parse index pages and apply undo operations. -Following the redo log application, XtraBackup uses the undo log to logically roll back changes from any uncommitted transactions present in the redo log. -Undo log records are of two types: `INSERT` and `UPDATE`. Each record contains a `table_id`, which XtraBackup uses to locate the table definition. +!!! note "Table metadata (SDI)" + In MySQL 8.0+, table definitions live in the tablespace as SDI (Serialized Dictionary Information), not in separate `.frm` files. During prepare, PXB uses SDI to map `table_id` to table structure for undo rollback. -To perform the rollback, XtraBackup initializes the InnoDB engine and data dictionary, then uses Serialized Dictionary Information (SDI) from the tablespace (a JSON representation of the table) to parse index pages and apply undo operations. +Tables are loaded as evictable; PXB maps `table_id` to tablespace via the data dictionary and loads user tables only when needed for rollback. This design reduces memory and I/O and speeds up prepare and Percona XtraDB Cluster SST. -Tables are loaded as evictable, and XtraBackup scans data dictionary indexes to relate `table_id` to tablespace, which is used during rollback. User tables are loaded only when needed for rollback. This design significantly reduces memory and I/O usage, speeds up the `--prepare` phase, and improves Percona XtraDB Cluster SST performance. +After `--prepare`, InnoDB tables are rolled forward to the backup completion point, not rolled back to the start. Both InnoDB and MyISAM tables are consistent with each other at that point. -### Achieving Consistency: Redo, Undo, and MyISAM +### Prepare is often the bottleneck -The `--prepare` phase ensures that InnoDB tables are rolled forward to the point where the backup completed, not rolled back to where it began. This point aligns with the time a `FLUSH TABLES WITH READ LOCK` was taken, which is crucial for maintaining consistency with MyISAM tables. +The prepare phase is frequently the longest part of recovery, especially for large or incremental backups. You can shorten the prepare phase by: -Therefore, after the `--prepare` phase, both InnoDB and MyISAM tables are eventually consistent with each other. +* `--use-memory` — Increases memory used during prepare (similar to a buffer pool). Default is 100MB; recommended 1GB–2GB when RAM allows. Only applies to the prepare phase. If you run prepare on the same host as production MySQL, do not allocate memory that the OS or MySQL need—doing so can trigger OOM (Out of Memory) kills; prefer running prepare on a separate host or leave sufficient headroom. See [`--use-memory`](xtrabackup-option-reference.md#use-memory). +* `--parallel` — From Percona XtraBackup 8.4.0-3 onward, prepare can use multiple threads to apply `.delta` files (incremental backups). This does not parallelize the initial redo log application on a full backup; setting for example `--parallel=64` on a full backup will not make redo application multi-threaded. Use a numeric value; minimum recommended is 4 (for example, `--parallel=4`). See [`--parallel`](xtrabackup-option-reference.md#parallel). -## Restore a backup +Example: `xtrabackup --prepare --use-memory=2G --parallel=4 --target-dir=/data/backups/` -To restore a backup with xtrabackup you can use the `--copy-back` or -`--move-back` options. +## Restore phase (deployment) -xtrabackup will read from the `my.cnf` the variables datadir, -innodb_data_home_dir, innodb_data_file_path, -innodb_log_group_home_dir and check that the directories exist. +To restore a backup, use `--copy-back` or `--move-back`. XtraBackup reads the target paths from your configuration (for example, `datadir`, `innodb_data_home_dir`, `innodb_data_file_path`, `innodb_log_group_home_dir` in `my.cnf`) and ensures the directories exist. XtraBackup copies (or moves) files in a defined order: MyISAM-related files (.MRG, .MYD, .MYI, .CSM, .CSV, `.sdi`, `.par`) first, then InnoDB tables and indexes, then log files. File attributes are preserved. -It will copy the MyISAM tables, indexes, etc. (.MRG, .MYD, -.MYI, .CSM, .CSV, `.sdi`, -and `par` files) first, InnoDB tables and indexes next and the log files at -last. It will preserve file’s attributes when copying them, you may have to -change the files’ ownership to `mysql` before starting the database server, as -they will be owned by the user who created the backup. +!!! warning "Datadir ownership and permissions" + You must set correct ownership and permissions on the datadir before starting the server (for example, `chown -R mysql:mysql /var/lib/mysql`). Failing to do so is one of the most common causes of "database won't start" after a restore. Restored files are owned by the user who ran the backup; the server typically expects them to be owned by the `mysql` system user. -Alternatively, the `--move-back` option may be used to -restore a backup. This option is similar to `--copy-back` -with the only difference that instead of copying files it moves them to their -target locations. As this option removes backup files, it must be used with -caution. It is useful in cases when there is not enough free disk space to hold -both data files and their backup copies. +`--move-back` moves files instead of copying and removes them from the backup directory. Use `--move-back` when disk space is limited; the backup is consumed and cannot be reused. +For full restore procedures, see [Restore full, incremental, and compressed backups](restore-a-backup.md).