Rechercher une page de manuel
mk-parallel-dump.1p
Langue: en
Version: 2009-05-04 (debian - 07/07/09)
Section: 1 (Commandes utilisateur)
Sommaire
NAME
mk-parallel-dump - Dump sets of MySQL tables in parallel.SYNOPSIS
mk-parallel-dump mk-parallel-dump --tab --basedir /path/to/backups/ mk-parallel-dump --sets order,profile,session --settable meta.backupset
Do not rely on mk-parallel-dump for your backups unless you have tested it. You have been warned.
DESCRIPTION
mk-parallel-dump connects to a MySQL server, finds database and table names, and dumps them in parallel for speed. It can be used in several pre-packaged ways, or as a generic wrapper to call some program in parallel, passing it parameters for each table. It supports backup sets and dumping only tables that have changed since the last dump.To dump all tables to gzipped files in the current directory, each database with its own directory, with a global read lock, flushing and recording binary log positions, each table in a single file:
mk-parallel-dump
To dump tables elsewhere:
mk-parallel-dump --basedir /path/to/elsewhere
To dump to tab-separated files with "SELECT INTO OUTFILE", each table with separate data and SQL files:
mk-parallel-dump --tab
To dump one or more backup sets (see ``BACKUP SETS''):
mk-parallel-dump --sets set1,set2,set3 --settable meta.backupset
To ``write your own command line,'' use "--" to indicate where the arguments for mk-parallel-dump stop and where the arguments for "mysqldump" (or any other program) begin. The following example shows "mysqldump", and aside from simpler options to "mysqldump", is basically what happens when you specify no arguments at all:
mk-parallel-dump -- mysqldump --skip-lock-tables '%D' '%N' \ \| gzip --fast -c - \> '%D.%N.gz'
The "%" modifiers are macros (see ``MACROS''). The "--skip-lock-tables" argument is very important in that last example, because otherwise both mk-parallel-dump and "mysqldump" will lock tables, so "mysqldump" will hang, waiting for the locks. Notice the shell metacharacters "|" and ">" are escaped so the shell won't interpret them, and they'll get passed through to the generated command-line.
There's no reason you can't use mk-parallel-dump to do other tasks in parallel, such as "OPTIMIZE TABLE":
mk-parallel-dump --noflushlock -- mysqlcheck --optimize '%D' '%N'
When you use built-in defaults, mk-parallel-dump will relay these arguments on to every forked copy of "mysqldump": ``--defaults-file'', ``--host'', ``--port'', ``--socket'', ``--user'', ``--password''. If you write your own command-line, you will need to specify them manually.
If you specify the ``--tab'' option, mk-parallel-dump creates separate files that hold views and triggers, so they can be restored correctly (this is not currently possible with the "mysqldump" from MySQL AB, which will restore triggers before restoring data). Otherwise it does not back up your entire database; it dumps tables and data only. It does not dump view definitions or stored routines. However, if you dump the "mysql" database, you'll be dumping the stored routines anyway.
Exit status is 0 if everything went well, 1 if any chunks failed, and any other value indicates an internal error.
mk-parallel-dump doesn't clean out any destination directories before dumping into them. You can move away the old destination, then remove it after a successful dump, with a shell script like the following:
#!/bin/sh CNT=`ls | grep -c old`; if [ -d default ]; then mv default default.old.$CNT; mk-parallel-dump if [ $? != 0 ] then echo "There were errors, not purging old sets." else echo "No errors during dump, purging old sets." rm -rf default.old.* fi
mk-parallel-dump checks whether files have been created before dumping. If the file has been created, it skips the table or chunk that would have created the file. This makes dumps resumable. If you don't want this behavior, and instead you want a full dump, then move away the old files.
BACKUP SETS
Backup sets are groups of logically related tables you want to backup together. You specify a set by inserting the table names into a table in the MySQL server from which you're dumping, and then naming it in the ``--sets'' option. mk-parallel-dump always works a set at a time; if you don't specify a set, it auto-discovers tables, filters them with the various command-line options (``--databases'', etc) and considers them the default set.The table that stores backup sets should have at least these columns: setname, priority, db, tbl. The following is a suggested table structure:
CREATE TABLE backupset ( setname CHAR(10) NOT NULL, priority INT NOT NULL DEFAULT 0, db CHAR(64) NOT NULL, tbl CHAR(64) NOT NULL, ts TIMESTAMP NOT NULL, PRIMARY KEY(setname, db, tbl), KEY(setname, priority, db, tbl) );
Entries are ordered by priority, db, and tbl. Priority 0 tables are dumped first, not last. If it looks like tables are dumped in the wrong order, it's probably because they're being dumped asynchronously. The output is printed when the dump finishes, not when it starts.
If you specify ``--age'', mk-parallel-dump expects the "ts" column to exist, and will update the column to the current date and time when it successfully dumps a table.
Don't use "default" as a set name. It is used when you don't specify any sets and when you want all tables not explicitly assigned to a set to be dumped (see ``--[no]defaultset'').
Set names may contain only lowercase letters, numbers, and underscores.
CHUNKS
mk-parallel-dump can break your tables into chunks when dumping, and put approximately the amount of data you specify into each chunk. This is useful to avoid enormous files for restoration, which can not only take a long time but may be a lot of extra work for transactional storage engines like InnoDB. A huge file can create a huge rollback segment in your tablespace.To dump in chunks, specify the ``--chunksize'' option. This option is an integer with an optional suffix. Without the suffix, it's the number of rows you want in each chunk. With the suffix, it's the approximate size of the data.
mk-parallel-dump tries to use index statistics to calculate where the boundaries between chunks should be. If the values are not evenly distributed, some chunks can have a lot of rows, and others may have very few or even none. Some chunks can exceed the size you want.
When you specify the size with a suffix, the allowed suffixes are k, M and G, for kibibytes, mebibytes, and gibibytes, respectively. mk-parallel-dump doesn't know anything about data size. It asks MySQL (via "SHOW TABLE STATUS") how long an average row is in the table, and converts your option to a number of rows.
Not all tables can be broken into chunks. mk-parallel-dump looks for an index whose leading column is numeric (integers, real numbers, and date and time types). It prefers the primary key if its first column is chunk-able. Otherwise it chooses the first chunk-able column in the table.
Generating a series of "WHERE" clauses to divide a table into evenly-sized chunks is difficult. If you have any ideas on how to improve the algorithm, please write to the author (see ``BUGS'').
MACROS
mk-parallel-dump can insert "%" variables into arguments. The available macros are as follows:MACRO MEANING ===== ================= %S The backup set %D The database name %N The table name %C The chunk number %W The WHERE clause
You can place a number between the "%" and the letter. The macro replacement then assumes it's a digit and pads it with leading zeroes (in practice, this is only useful for %C).
OUTPUT
Output depends on verbosity. When ``--test'' is given, output includes commands that would be executed.When ``--verbose'' is 0, there is normally no output unless there's an error.
When ``--verbose'' is 1, there is one line of output for each backup set, showing the set, how many tables and chunks were dumped with what status, how much time elapsed, and how much time the parallel dump jobs added up to. A final line shows sums for all sets, unless there is only one set.
When ``--verbose'' is 2, there is also one line of output for each table. Each line is printed when a forked ``child'' process ends and is removed from the list of children. The output shows the backup set, database, table, seconds spent dumping, the exit status of the forked dump process, and number of current processes (including the one just reaped; so this typically shows ``how many are running in parallel''). A status of 0 indicates success:
SET DATABASE TABLE TIME STATUS THREADS default mysql db 0 0 4 default mysql columns_priv 0 0 4 default mysql help_category 0 0 3
SPEED OF PARALLEL DUMPS
How much faster is it to dump in parallel? That depends on your hardware and data. You may be able dump files twice as fast, or more if you have lots of disks and CPUs. Here are some user-contributed figures.The following table is for a 3.6GHz Xeon machine with 4 processors and a RAID-10 array of 15k disks, directly attached to the server with a fibre channel. Most of the space is in one huge table that wasn't dumped in parallel:
COMMAND SIZE TIME -------------------------- ----- ---- mk-parallel-dump 1.4GB 269 mysqldump 1.4GB 345
On the same machine, in a database with lots of roughly equal-sized tables:
COMMAND SIZE TIME -------------------------- ----- ---- mk-parallel-dump 117MB 7 mysqldump 117MB 37
It doesn't always work that well. A dual 2.80GHz Xeon server with a RAID-5 array of three 7200RPM SATA disk drives running MySQL 5.0.38 on GNU/Linux achieved the following dump times:
COMMAND SIZE TIME -------------------------- ----- ---- mk-parallel-dump 3.0GB 2596 mysqldump | gzip --fast 3.0GB 3195
While dumping two threads in parallel, this machine was at an average of 74% CPU utilization and 12% I/O wait. This machine doesn't have enough disks and CPUs to do that many things at once, so it's not going to speed up much.
Dumping lots of tiny tables by forking of lots of "mysqldump" processes isn't usually much faster, because of the overhead of starting "mysqldump", connecting, inspecting the table, and dumping it. Note that tab-separated dumps are typically much faster and don't suffer as much from the effects of many tiny tables, because they're not done via "mysqldump".
See also <http://www.paragon-cs.com/wordpress/?p=52> for a test of parallel dumping and restoring.
DOWNLOADING
You can download Maatkit from Google Code at <http://code.google.com/p/maatkit/>, or you can get any of the tools easily with a command like the following:wget http://www.maatkit.org/get/toolname or wget http://www.maatkit.org/trunk/toolname
Where "toolname" can be replaced with the name (or fragment of a name) of any of the Maatkit tools. Once downloaded, they're ready to run; no installation is needed. The first URL gets the latest released version of the tool, and the second gets the latest trunk code from Subversion.
OPTIONS
``--locktables'' and ``--[no]flushlock'' are mutually exclusive.``--sets'' and ``--setperdb'' are mutually exclusive.
``--losslessfp'' requires ``--tab''.
- --age
- type: time
Dump only tables modified since this long ago, or not dumped since this long ago.
Specifies how 'old' a table must be before mk-parallel-dump will consider it.
When ``--sets'' is not specified, mk-parallel-dump uses "SHOW TABLE STATUS" instead of "SHOW TABLES" to get a list of tables in each database, and compares the time to the "Update_time" column in the output. If the "Update_time" column is not "NULL" and is older than the specified interval ago, it will not be dumped. Thus, it means ``dump tables that have changed since X amount of time'' (presumably the last regular backup). This means the table will always be dumped if it uses InnoDB or another storage engine that doesn't report the "Update_time".
When ``--sets'' is specified, the ``--settable'' table determines when a table was last dumped, and the meaning of "--age" reverses; it becomes ``dump tables not dumped in X amount of time.''
- --askpass
- Prompt for a password when connecting to MySQL.
- --basedir
- type: string
Base directory for creating files.
The directory in which files will be stored. If you use pre-canned options, such as ``--tab'', mk-parallel-dump knows what the eventual filenames will be, and can place all the files in this directory. It will also create any parent directories that don't exist, if needed (see also ``--umask'').
The default is the current working directory.
If you write your own command line, mk-parallel-dump cannot know which arguments in the command line are filenames, and thus doesn't know the eventual destination of the dump files. It does not try to create parent directories in this case.
- --[no]biggestfirst
- default: yes
Process tables largest first.
This strategy gives better parallelization. Suppose there are 8 threads and the last table is huge. We will finish everything else and then be running single-threaded while that one finishes. If that one runs first, then we will have the max number of threads running at a time for as long as possible.
Ignored when ``--settable'' is given.
- --[no]binlogpos
- short form: -b; default: yes
Dump the master/slave position.
Dump binary log positions from both "SHOW MASTER STATUS" and "SHOW SLAVE STATUS", whichever can be retrieved from the server. The data is dumped to a file named 00_master_data.sql. This is done for each backup set.
The file also contains details of each table dumped, including the WHERE clauses used to dump it in chunks.
- --charset
- short form: -A; type: string
Default character set.
Enables character set settings in Perl and MySQL. If the value is "utf8", sets Perl's binmode on STDOUT to utf8, passes the "mysql_enable_utf8" option to DBD::mysql, and runs "SET NAMES UTF8" after connecting to MySQL. Any other value sets binmode on STDOUT without the utf8 layer, and runs "SET NAMES" after connecting to MySQL.
- --chunksize
- short form: -C; type: string
Number of rows or data size to dump per file.
Specifies that the table should be dumped in segments of approximately the size given. The syntax is either a plain integer, which is interpreted as a number of rows per chunk, or an integer with a suffix of G, M, or k, which is interpreted as the size of the data to be dumped in each chunk. See ``CHUNKS'' for more details.
- --csv
- Do ``--tab'' dump in CSV format (implies --tab).
Changes ``--tab'' options so the dump file is in comma-separated values (CSV) format. The SELECT INTO OUTFILE statement looks like the following, and can be re-loaded with the same options:
SELECT * INTO OUTFILE %D.%N.%6C.txt FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' LINES TERMINATED BY '\n' FROM %D.%N;
- --databases
- short form: -d; type: hash
Dump only this comma-separated list of databases.
- --dbregex
- type: string
Dump only databases whose names match this Perl regex.
- --defaults-file
- short form: -F; type: string
Only read mysql options from the given file. You must give an absolute pathname.
- --[no]defaultset
- When ``--sets'' given, dump tables not in any set.
The result will be a "default" set consisting of tables not explicitly included in any set.
- --[no]flushlock
- short form: -k
Use "FLUSH TABLES WITH READ LOCK".
This is enabled by default, unless you're dumping sets (see ``--sets''). This lock is taken once, at the beginning of the whole process, and is never released. If you want to lock only the tables you're dumping, use ``--locktables''.
- --flushlog
- Execute "FLUSH LOGS" when getting binlog positions.
This is done for each backup set. This option is NOT enabled by default because it causes the MySQL server to rotate its error log, potentially overwriting error messages.
- --[no]gzip
- Compress files with gzip.
This is enabled by default unless your platform is Win32. By default, this causes the standard SQL dumps to be piped to gzip's "STDIN" and the result is redirected to the destination file. If this option isn't enabled, by default "mysqldump"'s "--result-file" parameter is used to direct the dump to the destination file. When using ``--tab'', this option causes gzip to be called separately on each resulting file after it is dumped (because "SELECT INTO OUTFILE" cannot be directed to a pipe).
- --host
- short form: -h; type: string
Connect to host.
- --ignoredb
- short form: -g; type: Hash
Ignore this comma-separated list of databases.
- --ignoreengine
- short form: -E; type: Hash; default: FEDERATED,MRG_MyISAM
Dump no data for this comma-separated list of storage engines.
The schema file will be dumped as usual. This prevents dumping data for Federated tables and Merge tables.
- --ignoretbl
- short form: -n; type: Hash
Ignore this comma-separated list of table names.
Table names may be qualified with the database name.
- --locktables
- Use "LOCK TABLES" (disables ``--[no]flushlock'').
Disables ``--[no]flushlock'' (unless it was explicitly set) and locks tables with "LOCK TABLES READ". Enabled by default when ``--sets'' is specified. The lock is taken and released with every set of tables dumped.
- --losslessfp
- short form: -L
Dump float types with extra precision for lossless restore (requires ``--tab'').
Wraps these types with a call to "FORMAT()" with 17 digits of precision. According to the comments in Google's patches, this will give lossless dumping and reloading in most cases. (I shamelessly stole this technique from them. I don't know enough about floating-point math to have an opinion).
This works only with ``--tab''.
- --numthread
- short form: -m; type: int
Number of threads to dump concurrently.
Specifies the number of parallel processes to run. The default is 2 (this is mk-parallel-dump, after all --- 1 is not parallel). On GNU/Linux machines, the default is the number of times 'processor' appears in /proc/cpuinfo. On Windows, the default is read from the environment. In any case, the default is at least 2, even when there's only a single processor.
- --password
- short form: -p; type: string
Password to use when connecting.
- --port
- short form: -P; type: int
Port number to use for connection.
- --progress
- Display progress messages.
Progress is displayed each time a table or chunk of a table finishes dumping. Progress is calculated by measuring the data size of each table, and assuming all bytes are created equal. The output is the completed and total bytes, the percent completed, estimated time remaining, and estimated completion time.
- --quiet
- short form: -q
Quiet output; disables ``--verbose''.
- --setperdb
- Dump each database as a separate set.
Each set is named the same as the database. Implies ``--locktables''.
- --sets
- type: array
Dump this comma-separated list of backup sets.
Requires ``--settable''. See ``BACKUP SETS''. The special "default" set is reserved; don't use it as a set name.
- --settable
- type: string
The database.table in which backup sets are kept.
As mentioned earlier, you can manually specify the ordering of tables in a backup set. Therefore ``--[no]biggestfirst'' has no effect when this option is given.
- --setvars
- type: string; default: wait_timeout=10000
Set these MySQL variables.
Specify any variables you want to be set immediately after connecting to MySQL. These will be included in a "SET" command.
- --socket
- short form: -S; type: string
Socket file to use for connection.
- --stopslave
- Issue "STOP SLAVE" on server before dumping data.
This ensures that the data is not changing during the dump. Issues "START SLAVE" after the dump is complete.
If the slave is not running, throws an error and exits. This is to prevent possibly bad things from happening if the slave is not running because of a problem, or because someone intentionally stopped the slave for maintenance or some other purpose.
- --tab
- short form: -T
Dump tab-separated (sets ``--umask'' 0).
Dump via "SELECT INTO OUTFILE", which is similar to what "mysqldump" does with the "--tab" option, but you're not constrained to a single database at a time.
Before you use this option, make sure you know what "SELECT INTO OUTFILE" does! I recommend using it only if you're running mk-parallel-dump on the same machine as the MySQL server, but there is no protection if you don't.
The files will be gzipped after dumping if ``--[no]gzip'' is enabled. This option sets ``--umask'' to zero so auto-created directories are writable by the MySQL server.
Triggers are dumped into ".trg" files, and views are postponed until the end of the dump, then dumped all together into the "00_views.sql" file. This allows restoring data before the triggers, which is important for restoring data accurately. Views must be postponed until the end and dumped together so they can be restored correctly; interdependencies between views and tables may prevent correct restoration otherwise.
- --tables
- short form: -t; type: hash
Dump only this comma-separated list of table names.
Table names may be qualified with the database name.
- --tblregex
- type: string
Dump only tables whose names match this Perl regex.
- --test
- Print commands instead of executing them.
- --umask
- type: string
Set the program's "umask" to this octal value.
This is useful when you want created files and directories to be readable or writable by other users (for example, the MySQL server itself).
- --user
- short form: -u; type: string
User for login if not current user.
- --verbose
- short form: -v; cumulative: yes; default: 1
Be verbose; can specify multiple times.
See ``OUTPUT''.
- --wait
- short form: -w; type: time; default: 5m
Wait limit when the server is down.
If the MySQL server crashes during dumping, waits until the server comes back and then continues with the rest of the tables. "mk-parallel-dump" will check the server every second until this time is exhausted, at which point it will give up and exit.
This implements Peter Zaitsev's ``safe dump'' request: sometimes a dump on a server that has corrupt data will kill the server. mk-parallel-dump will wait for the server to restart, then keep going. It's hard to say which table killed the server, so no tables will be retried. Tables that were being concurrently dumped when the crash happened will not be retried. No additional locks will be taken after the server restarts; it's assumed this behavior is useful only on a server you're not trying to dump while it's in production.
ENVIRONMENT
The environment variable "MKDEBUG" enables verbose debugging output in all of the Maatkit tools:MKDEBUG=1 mk-....
SYSTEM REQUIREMENTS
You need Perl, DBI, DBD::mysql, and some core packages that ought to be installed in any reasonably new version of Perl.This program works best on GNU/Linux. Filename quoting might not work well on Microsoft Windows if you have spaces or funny characters in your database or table names.
BUGS
Please use Google Code Issues and Groups to report bugs or request support: <http://code.google.com/p/maatkit/>.Please include the complete command-line used to reproduce the problem you are seeing, the version of all MySQL servers involved, the complete output of the tool when run with ``--version'', and if possible, debugging output produced by running with the "MKDEBUG=1" environment variable.
COPYRIGHT, LICENSE AND WARRANTY
This program is copyright 2007-2009 Baron Schwartz. Feedback and improvements are welcome.THIS PROGRAM IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2; OR the Perl Artistic License. On UNIX and similar systems, you can issue `man perlgpl' or `man perlartistic' to read these licenses.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
AUTHOR
Baron Schwartz.SEE ALSO
See also mk-parallel-restore.VERSION
This manual page documents Ver 1.0.15 Distrib 3519 $Revision: 3502 $.Contenus ©2006-2024 Benjamin Poulain
Design ©2006-2024 Maxime Vantorre