mk-parallel-restore

Langue: en

Version: 2008-12-29 (fedora - 04/07/09)

Section: 1 (Commandes utilisateur)

NAME

mk-parallel-restore - Load files into MySQL in parallel.

SYNOPSIS

   mk-parallel-restore /path/to/files
   mk-parallel-restore --tab /path/to/files
 
 

Do not rely on mk-parallel-restore for your backups unless you have tested it. You have been warned.

DESCRIPTION

mk-parallel-restore is a way to load SQL or delimited-file dumps into MySQL in parallel at high speed. It is especially designed for restoring files dumped by mk-parallel-dump. It automatically detects whether a file contains SQL or delimited data from the filename extension, and either shells out to "mysql" or executes "LOAD DATA INFILE" with the file. On UNIX-like systems, it will even make a FIFO to decompress gzipped files for "LOAD DATA INFILE".

By default it discovers all files in the directory you specify on the command line. It uses the file's parent directory as the database name and the file's name (up to the first dot) as the table name. It can deal with files named like the following:

   dir/tbl.sql
   dir/tbl.txt
   dir/tbl.csv
   dir/tbl.trg
 
 

It is also happy with files that look like this, where "EXT" is one of the extensions just listed.

   dir/tbl.EXT.000
   dir/tbl.EXT.000.gz
 
 

By default, it loads "SQL" files first, if they exist, then loads "CSV" or "TXT" files next, in order of the numbers in the filename extension as just shown. This makes it easy for you to reload a table's definition followed by its data, in case you dumped them into separate files (as happens with "mysqldump"'s "--tab" option). It loads "TRG" files, which create triggers, last. Creating the triggers before loading data might keep the data from being restored correctly. Files that are named 00_views.sql are loaded even later, after all the parallel restores are finished; dependencies among views and tables make them hard to restore one at a time. See mk-parallel-dump for details on how data is dumped.

Exit status is 0 if everything went well, 1 if any files failed, and any other value indicates an internal error.

OUTPUT

Output depends on verbosity. When ``--test'' is given, output includes commands that would be executed.

When ``--verbose'' is 0, there is normally no output unless there's an error.

When ``--verbose'' is 1, there is one line of output for the entire job, showing how many tables were processed, how many files were loaded with what status, how much time elapsed, and how much time the parallel load jobs added up to. If any files were skipped, the filenames are printed to the output.

When ``--verbose'' is 2, there's one line of output per table, showing extra data such as how many threads were running when each table finished loading:

   DATABASE TABLE            FILES  TIME STATUS THREADS
   sakila   language             2  0.07      0       2
   sakila   film_actor           2  0.07      0       2
   sakila   actor                2  0.06      0       2
   sakila   payment              2  0.07      0       2
   sakila   transport_backup     2  0.05      0       2
   sakila   country              2  0.08      0       2
   sakila   film                 2  0.05      0       2
   sakila   rental               2  0.07      0       2
 
 

SPEED OF PARALLEL LOADING

User-contributed benchmarks are welcome. See <http://www.paragon-cs.com/wordpress/?p=52> for one user's experiences.

DOWNLOADING

You can download Maatkit from Google Code at <http://code.google.com/p/maatkit/>, or you can get any of the tools easily with a command like the following:
    wget http://www.maatkit.org/get/toolname
    or
    wget http://www.maatkit.org/trunk/toolname
 
 

Where "toolname" can be replaced with the name (or fragment of a name) of any of the Maatkit tools. Once downloaded, they're ready to run; no installation is needed. The first URL gets the latest released version of the tool, and the second gets the latest trunk code from Subversion.

OPTIONS

--askpass
Prompt for a password when connecting to MySQL.
--atomicresume
negatable: yes; default: yes

Treat chunks as atomic when resuming restore.

By default "mk-parallel-restore" resumes restoration from the first chunk that is missing all its rows. For dumps of transactionally-safe tables (InnoDB), it cannot happen that a chunk is only partially restored. Therefore, restoring from the first missing chunk is safe.

However, for dumps of non-transactionally safe tables, it is possible that a chunk can be only partially restored. In such cases, the chunk will wrongly appear to be fully restored. Therefore, you must specify "--noatomicresume" so that the partially restored chunk is fully restored.

--basedir
type: string

Directory where FIFO files will be created.

--biggestfirst
negatable: yes; default: yes

Restore the biggest tables first for highest concurrency.

--bulkinsbufsize
type: int

Set bulk_insert_buffer_size before each "LOAD DATA INFILE".

Has no effect without ``--tab''.

--charset
short form: -A; type: string; default: BINARY

Sets the connection, database, and "LOAD DATA INFILE" character set.

The default is "BINARY", which is the safest value to use for "LOAD DATA INFILE". Has no effect without ``--tab''.

--commit
Commit after each load via "LOAD DATA INFILE".
--createdb
Create databases if they don't exist.
--csv
Files are in CSV format (implies ``--tab'').

Changes ``--tab'' options so the following "LOAD DATA INFILE" statement is used:

    LOAD DATA INFILE <filename> INTO TABLE <table>
    FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
    LINES TERMINATED BY '\n';
 
 
--database
short form: -D; type: string

Load all files into this database.

Overrides the database which is normally specified by the directory in which the files live. Does not specify a default database for the connection.

--databases
short form: -d; type: hash

Restore only this comma-separated list of databases.

--dbregex
type: string

Restore only databases whose names match this regex.

--defaults-file
short form: -F; type: string

Only read mysql options from the given file. You must give an absolute pathname.

--disablekeys
negatable: yes

Execute "ALTER TABLE DISABLE KEYS" before each table.

--fifo
negatable: yes; default: yes

Stream files into a FIFO for --tab.

Load compressed tab-separated files by piping them into a FIFO and using the FIFO with "LOAD DATA INFILE", instead of by decompressing the files on disk. Sets ``--umask'' to 0.

--host
short form: -h; type: string

Connect to host.

--ignore
short form: -i

Adds the "IGNORE" modifier to "LOAD DATA INFILE".

--ignoredb
short form: -g; type: Hash

Ignore this comma-separated list of databases.

--ignoretbl
short form: -n; type: Hash

Ignore this comma-separated list of table names.

Table names may be qualified with the database name.

--local
short form: -L

Uses the "LOCAL" option to "LOAD DATA INFILE".

If you enable this option, the files are read locally by the client library, not by the server.

--locktables
negatable: yes

Lock tables before "LOAD DATA INFILE".

--noautovalon0
negatable: yes

Set SQL "NO_AUTO_VALUE_ON_ZERO" before "LOAD DATA INFILE".

--nobinlog
negatable: yes

Set "SQL_LOG_BIN=0" before "LOAD DATA INFILE".

This prevents large loads from being logged to the server's binary log.

--noforeignkeys
negatable: yes

Set "FOREIGN_KEY_CHECKS=0" before "LOAD DATA INFILE".

--noresume
Do not resume restore.

By default, "mk-parallel-restore" checks each tables' chunks for existing rows and restores only from the point where a previous restore stopped. This option disables restore resumption and fully restores every table.

Restore resumption does not work with tab-separated files or dumps that were not chunked.

--nouniquechecks
negatable: yes

Set "UNIQUE_CHECKS=0" before "LOAD DATA INFILE".

--numthread
short form: -m; type: int

Specifies the number of parallel processes to run.

The default is 2 (this is mk-parallel-restore after all --- 1 is not parallel). On GNU/Linux machines, the default is the number of times 'processor' appears in /proc/cpuinfo. On Windows, the default is read from the environment. In any case, the default is at least 2, even when there's only a single processor.

--password
short form: -p; type: string

Password to use when connecting.

--port
short form: -P; type: int

Port number to use for connection.

--progress
Display progress messages.

Progress is displayed each time a table finishes loading. Progress is calculated by measuring the size of each file to be loaded, and assuming all bytes are created equal. The output is the completed and total size, the percent completed, estimated time remaining, and estimated completion time.

--quiet
short form: -q

Sets ``--verbose'' to 0.

--replace
short form: -r

Adds the "REPLACE" modifier to "LOAD DATA INFILE".

--setvars
type: string; default: wait_timeout=10000

Set these MySQL variables.

Specify any variables you want to be set immediately after connecting to MySQL. These will be included in a "SET" command.

--socket
short form: -S; type: string

Socket file to use for connection.

--tab
short form: -T

Load tab-separated files with "LOAD DATA INFILE".

This is similar to what "mysqlimport" does, but more flexible. Enables the following options, unless they are specifically disabled: ``--commit'', ``--disablekeys'', ``--noautovalon0'', ``--nobinlog'', ``--nouniquechecks'', ``--noforeignkeys''.

--tables
short form: -t; type: hash

Restore only this comma-separated list of table names.

Table names may be qualified with the database name.

--tblregex
type: string

Restore only tables whose names match this regex.

--test
Print commands instead of executing them.
--truncate
Run "TRUNCATE TABLE" before "LOAD DATA INFILE".

This will delete all rows from a table before loading the first tab-delimited file into it.

--umask
type: string

Set the program's "umask" to this octal value.

This is useful when you want created files (such as FIFO files) to be readable or writable by other users (for example, the MySQL server itself).

--user
short form: -u; type: string

User for login if not current user.

--verbose
short form: -v; cumulative: yes; default: 1

Verbosity; can specify multiple times.

Repeatedly specifying it increments the verbosity. Default is 1 if not specified. See ``OUTPUT''.

--wait
short form: -w; type: time; default: 5m

Wait limit when server is down.

If the MySQL server crashes during loading, waits until the server comes back and then continues with the rest of the files. "mk-parallel-restore" will check the server every second until this time is exhausted, at which point it will give up and exit.

ENVIRONMENT

The environment variable "MKDEBUG" enables verbose debugging output in all of the Maatkit tools:
    MKDEBUG=1 mk-....
 
 

SYSTEM REQUIREMENTS

You need Perl, DBI, DBD::mysql, and some core packages that ought to be installed in any reasonably new version of Perl.

BUGS

Please use Google Code Issues and Groups to report bugs or request support: <http://code.google.com/p/maatkit/>.

Please include the complete command-line used to reproduce the problem you are seeing, the version of all MySQL servers involved, the complete output of the tool when run with ``--version'', and if possible, debugging output produced by running with the "MKDEBUG=1" environment variable.

COPYRIGHT, LICENSE AND WARRANTY

This program is copyright 2007-2008 Baron Schwartz. Feedback and improvements are welcome.

THIS PROGRAM IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2; OR the Perl Artistic License. On UNIX and similar systems, you can issue `man perlgpl' or `man perlartistic' to read these licenses.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.

AUTHOR

Baron Schwartz.

SEE ALSO

See also mk-parallel-dump.

VERSION

This manual page documents Ver 1.0.10 Distrib 2725 $Revision: 2311 $.