mk-parallel-restore.1p - Man Page
(DEPRECATED) Load files into MySQL in parallel.
Synopsis
This tool is deprecated because after several complete redesigns, we concluded that Perl is the wrong technology for this task. Read "Risks" before you use it, please. It remains useful for some people who we know aren't depending on it in production, and therefore we are not removing it from the distribution.
Usage: mk-parallel-restore [OPTION...] [DSN] PATH [PATH...]
mk-parallel-restore description loads files into MySQL in parallel to make some type of data loads faster. IT IS NOT A BACKUP TOOL!
mk-parallel-restore /path/to/files mk-parallel-restore --tab /path/to/files
Risks
The following section is included to inform users about the potential risks, whether known or unknown, of using this tool. The two main categories of risks are those created by the nature of the tool (e.g. read-only tools vs. read-write tools) and those created by bugs.
mk-parallel-restore is not a backup program! It is only for fast data imports, for purposes such as loading test data into a system quickly. Do not use mk-parallel-restore for backups. mk-parallel-restore inserts data unless you use the "--dry-run" option.
At the time of this release, there is a bug that prevents huge statements from being printed when an error is encountered, a bug applying "--[no]foreign-key-checks" when truncating tables, and a bug with LOAD DATA LOCAL.
The authoritative source for updated information is always the online issue tracking system. Issues that affect this tool will be marked as such. You can see a list of such issues at the following URL: <http://www.maatkit.org/bugs/mk-parallel-restore>.
See also "Bugs" for more information on filing bugs and getting help.
Description
mk-parallel-restore is a way to load SQL or delimited-file dumps into MySQL in parallel at high speed. It is especially designed for restoring files dumped by mk-parallel-dump. It automatically detects whether a file contains SQL or delimited data from the filename extension, and either shells out to mysql
or executes LOAD DATA INFILE
with the file. On UNIX-like systems, it will even make a FIFO to decompress gzipped files for LOAD DATA INFILE
.
By default it discovers all files in the directory you specify on the command line. It uses the file's parent directory as the database name and the file's name (up to the first dot) as the table name. It can deal with files named like the following:
dir/tbl.sql dir/tbl.txt dir/tbl.csv
It is also happy with files that look like this, where EXT
is one of the extensions just listed.
dir/tbl.EXT.000 dir/tbl.EXT.000.gz
By default, it loads SQL
files first, if they exist, then loads CSV
or TXT
files next, in order of the numbers in the filename extension as just shown. This makes it easy for you to reload a table's definition followed by its data, in case you dumped them into separate files (as happens with mysqldump
's --tab
option). See mk-parallel-dump for details on how data is dumped.
Exit status is 0 if everything went well, 1 if any files failed, and any other value indicates an internal error.
Output
Output depends on verbosity. When "--dry-run" is given, output includes commands that would be executed.
When "--verbose" is 0, there is normally no output unless there's an error.
When "--verbose" is 1, there is one line of output for the entire job, showing how many tables were processed, how many files were loaded with what status, how much time elapsed, and how much time the parallel load jobs added up to. If any files were skipped, the filenames are printed to the output.
When "--verbose" is 2, there's one line of output per table, showing extra data such as how many threads were running when each table finished loading:
DATABASE TABLE FILES TIME STATUS THREADS sakila language 2 0.07 0 2 sakila film_actor 2 0.07 0 2 sakila actor 2 0.06 0 2 sakila payment 2 0.07 0 2 sakila transport_backup 2 0.05 0 2 sakila country 2 0.08 0 2 sakila film 2 0.05 0 2 sakila rental 2 0.07 0 2
Speed of Parallel Loading
User-contributed benchmarks are welcome. See <http://www.paragon-cs.com/wordpress/?p=52> for one user's experiences.
Options
This tool accepts additional command-line arguments. Refer to the "Synopsis" and usage information for details.
- --ask-pass
Prompt for a password when connecting to MySQL.
- --[no]atomic-resume
default: yes
Treat chunks as atomic when resuming restore.
By default
mk-parallel-restore
resumes restoration from the first chunk that is missing all its rows. For dumps of transactionally-safe tables (InnoDB), it cannot happen that a chunk is only partially restored. Therefore, restoring from the first missing chunk is safe.However, for dumps of non-transactionally safe tables, it is possible that a chunk can be only partially restored. In such cases, the chunk will wrongly appear to be fully restored. Therefore, you must specify
--no-atomic-resume
so that the partially restored chunk is fully restored.- --base-dir
type: string
Directory where FIFO files will be created.
- --[no]biggest-first
default: yes
Restore the biggest tables first for highest concurrency.
- --[no]bin-log
default: yes
Enable binary logging (
SET SQL_LOG_BIN=1
).Restore operations are replicated by default (SQL_LOG_BIN=1) except for "--tab" restores which are not replicated by default (SQL_LOG_BIN=0). This prevents large loads from being logged to the server's binary log.
The value given on the command line overrides the defaults. Therefore, specifying
--bin-log
with "--tab" will allow the "--tab" restore to replicate.- --bulk-insert-buffer-size
type: int
Set bulk_insert_buffer_size before each
LOAD DATA INFILE
.Has no effect without "--tab".
- --charset
short form: -A; type: string; default: BINARY
Sets the connection, database, and
LOAD DATA INFILE
character set.The default is
BINARY
, which is the safest value to use forLOAD DATA INFILE
. Has no effect without "--tab".- --[no]commit
default: yes
Commit after each file.
- --config
type: Array
Read this comma-separated list of config files; if specified, this must be the first option on the command line.
- --create-databases
Create databases if they don't exist.
- --[no]create-tables
default: yes
Create tables.
See also "--[no]drop-tables".
- --csv
Files are in CSV format (implies "--tab").
Changes "--tab" options so the following
LOAD DATA INFILE
statement is used:LOAD DATA INFILE <filename> INTO TABLE <table> FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' LINES TERMINATED BY '\n';
- --database
short form: -D; type: string
Load all files into this database.
Overrides the database which is normally specified by the directory in which the files live. Does not specify a default database for the connection.
- --databases
short form: -d; type: hash
Restore only this comma-separated list of databases.
- --databases-regex
type: string
Restore only databases whose names match this regex.
- --decompress
type: string; default: gzip -d -c
Command used to decompress and print .gz files to STDOUT (like zcat).
- --defaults-file
short form: -F; type: string
Only read mysql options from the given file. You must give an absolute pathname.
- --[no]disable-keys
default: yes
Execute
ALTER TABLE DISABLE KEYS
before each MyISAM table.This option only works with MyISAM tables.
- --[no]drop-tables
default: yes
Execute
DROP TABLE IF EXISTS
before creating each table.- --dry-run
Print commands instead of executing them.
- --fast-index
Do InnoDB plugin fast index creation by restoring secondary indexes after data.
This option only works with InnoDB tables and the InnoDB plugin.
- --[no]fifo
default: yes
Stream files into a FIFO for "--tab".
Load compressed tab-separated files by piping them into a FIFO and using the FIFO with
LOAD DATA INFILE
, instead of by decompressing the files on disk. Sets "--umask" to 0.- --[no]foreign-key-checks
default: yes
Set
FOREIGN_KEY_CHECKS=1
beforeLOAD DATA INFILE
.- --help
Show help and exit.
- --host
short form: -h; type: string
Connect to host.
- --ignore
Adds the
IGNORE
modifier toLOAD DATA INFILE
.- --ignore-databases
type: Hash
Ignore this comma-separated list of databases.
- --ignore-tables
type: Hash
Ignore this comma-separated list of table names.
Table names may be qualified with the database name.
- --local
Uses the
LOCAL
option toLOAD DATA INFILE
.If you enable this option, the files are read locally by the client library, not by the server.
- --[no]lock-tables
Lock tables before
LOAD DATA INFILE
.- --[no]no-auto-value-on-0
default: yes
Set SQL
NO_AUTO_VALUE_ON_ZERO
.- --only-empty-databases
Restore only to empty databases.
By default mk-parallel-restore will restore tables into a database so long as it exists (or is created by "--create-databases"). This option is a safety feature that prevents any tables from being restored into a database that already has tables even if those tables are the same ones being restored. If you specify this option, every database must have zero tables.
This implicitly disables "--[no]resume". "--create-databases" will work if the database doesn't already exist and it creates it.
The databases are checked after all filters ("--databases", etc.)
- --password
short form: -p; type: string
Password to use when connecting.
- --pid
type: string
Create the given PID file. The file contains the process ID of the script. The PID file is removed when the script exits. Before starting, the script checks if the PID file already exists. If it does not, then the script creates and writes its own PID to it. If it does, then the script checks the following: if the file contains a PID and a process is running with that PID, then the script dies; or, if there is no process running with that PID, then the script overwrites the file with its own PID and starts; else, if the file contains no PID, then the script dies.
- --port
short form: -P; type: int
Port number to use for connection.
- --progress
Display progress messages.
Progress is displayed each time a table finishes loading. Progress is calculated by measuring the size of each file to be loaded, and assuming all bytes are created equal. The output is the completed and total size, the percent completed, estimated time remaining, and estimated completion time.
- --quiet
short form: -q
Sets "--verbose" to 0.
- --replace
Adds the
REPLACE
modifier toLOAD DATA INFILE
.- --[no]resume
default: yes
Resume the restore from a previously incomplete restore.
By default,
mk-parallel-restore
checks each table's chunks for existing rows and restores only from the point where a previous restore stopped. Specify --no-resume to disable restore resumption and fully restores every table.Restore resumption does not work with tab-separated files or dumps that were not chunked.
- --set-vars
type: string; default: wait_timeout=10000
Set these MySQL variables. Immediately after connecting to MySQL, this string will be appended to SET and executed.
- --socket
short form: -S; type: string
Socket file to use for connection.
- --tab
Load tab-separated files with
LOAD DATA INFILE
.This is similar to what
mysqlimport
does, but more flexible.The following options are enabled unless they are specifically disabled on the command line:
L<"--commit"> L<"--[no]disable-keys"> L<"--[no]no-auto-value-on-0">
And the following options are disabled (
--no-bin-log
, etc.) unless they are specifically enabled on the command line:L<"--[no]bin-log"> L<"--[no]unique-checks"> L<"--[no]foreign-key-checks">
- --tables
short form: -t; type: hash
Restore only this comma-separated list of table names.
Table names may be qualified with the database name.
- --tables-regex
type: string
Restore only tables whose names match this regex.
- --threads
type: int; default: 2
Specifies the number of parallel processes to run.
The default is 2 (this is mk-parallel-restore after all -- 1 is not parallel). On GNU/Linux machines, the default is the number of times 'processor' appears in /proc/cpuinfo. On Windows, the default is read from the environment. In any case, the default is at least 2, even when there's only a single processor.
- --truncate
Run
TRUNCATE TABLE
beforeLOAD DATA INFILE
.This will delete all rows from a table before loading the first tab-delimited file into it.
- --umask
type: string
Set the program's
umask
to this octal value.This is useful when you want created files (such as FIFO files) to be readable or writable by other users (for example, the MySQL server itself).
- --[no]unique-checks
default: yes
Set
UNIQUE_CHECKS=1
beforeLOAD DATA INFILE
.- --user
short form: -u; type: string
User for login if not current user.
- --verbose
short form: -v; cumulative: yes; default: 1
Verbosity; can specify multiple times.
Repeatedly specifying it increments the verbosity. Default is 1 if not specified. See "Output".
- --version
Show version and exit.
- --wait
short form: -w; type: time; default: 5m
Wait limit when server is down.
If the MySQL server crashes during loading, waits until the server comes back and then continues with the rest of the files.
mk-parallel-restore
will check the server every second until this time is exhausted, at which point it will give up and exit.
DSN Options
These DSN options are used to create a DSN. Each option is given like option=value
. The options are case-sensitive, so P and p are not the same option. There cannot be whitespace before or after the =
and if the value contains whitespace it must be quoted. DSN options are comma-separated. See the maatkit manpage for full details.
A
dsn: charset; copy: yes
Default character set.
D
dsn: database; copy: yes
Default database.
F
dsn: mysql_read_default_file; copy: yes
Only read default options from the given file
h
dsn: host; copy: yes
Connect to host.
p
dsn: password; copy: yes
Password to use when connecting.
P
dsn: port; copy: yes
Port number to use for connection.
S
dsn: mysql_socket; copy: yes
Socket file to use for connection.
u
dsn: user; copy: yes
User for login if not current user.
Downloading
You can download Maatkit from Google Code at <http://code.google.com/p/maatkit/>, or you can get any of the tools easily with a command like the following:
wget http://www.maatkit.org/get/toolname or wget http://www.maatkit.org/trunk/toolname
Where toolname
can be replaced with the name (or fragment of a name) of any of the Maatkit tools. Once downloaded, they're ready to run; no installation is needed. The first URL gets the latest released version of the tool, and the second gets the latest trunk code from Subversion.
Environment
The environment variable MKDEBUG
enables verbose debugging output in all of the Maatkit tools:
MKDEBUG=1 mk-....
System Requirements
You need Perl, DBI, DBD::mysql, and some core packages that ought to be installed in any reasonably new version of Perl.
Bugs
For a list of known bugs see <http://www.maatkit.org/bugs/mk-parallel-restore>.
Please use Google Code Issues and Groups to report bugs or request support: <http://code.google.com/p/maatkit/>. You can also join #maatkit on Freenode to discuss Maatkit.
Please include the complete command-line used to reproduce the problem you are seeing, the version of all MySQL servers involved, the complete output of the tool when run with "--version", and if possible, debugging output produced by running with the MKDEBUG=1
environment variable.
Copyright, License and Warranty
This program is copyright 2007-2011 Baron Schwartz. Feedback and improvements are welcome.
THIS PROGRAM IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2; OR the Perl Artistic License. On UNIX and similar systems, you can issue `man perlgpl' or `man perlartistic' to read these licenses.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
See Also
See also mk-parallel-dump.
Author
Baron Schwartz
About Maatkit
This tool is part of Maatkit, a toolkit for power users of MySQL. Maatkit was created by Baron Schwartz; Baron and Daniel Nichter are the primary code contributors. Both are employed by Percona. Financial support for Maatkit development is primarily provided by Percona and its clients.
Version
This manual page documents Ver 1.0.24 Distrib 7540 $Revision:
7477 $.