mk-table-checksum.1p - Man Page

Perform an online replication consistency check, or checksum MySQL tables efficiently on one or many servers.

Synopsis

Usage: mk-table-checksum [OPTION...] DSN [DSN...]

mk-table-checksum checksums MySQL tables efficiently on one or more hosts. Each host is specified as a DSN and missing values are inherited from the first host. If you specify multiple hosts, the first is assumed to be the master.

STOP! Are you checksumming a slave(s) against its master? Then be sure to learn what "--replicate" does. It is probably the option you want to use.

   mk-table-checksum --replicate=mydb.checksum master-host
   ... time passses, replication catches up ...
   mk-table-checksum --replicate=mydb.checksum --replicate-check 2 \
      master-host

Or,

   mk-table-checksum h=host1,u=user,p=password h=host2 ...

Or,

   mk-table-checksum host1 host2 ... hostN | mk-checksum-filter

See "Specifying Hosts" for more on the syntax of the host arguments.

Risks

The following section is included to inform users about the potential risks, whether known or unknown, of using this tool. The two main categories of risks are those created by the nature of the tool (e.g. read-only tools vs. read-write tools) and those created by bugs.

mk-table-checksum executes queries that cause the MySQL server to checksum its data. This can cause significant server load. It is read-only unless you use the "--replicate" option, in which case it inserts a small amount of data into the specified table.

At the time of this release, we know of no bugs that could cause serious harm to users. There are miscellaneous bugs that might be annoying.

The authoritative source for updated information is always the online issue tracking system. Issues that affect this tool will be marked as such. You can see a list of such issues at the following URL: <http://www.maatkit.org/bugs/mk-table-checksum>.

See also "Bugs" for more information on filing bugs and getting help.

Description

mk-table-checksum generates table checksums for MySQL tables, typically useful for verifying your slaves are in sync with the master. The checksums are generated by a query on the server, and there is very little network traffic as a result.

Checksums typically take about twice as long as COUNT(*) on very large InnoDB tables in my tests. For smaller tables, COUNT(*) is a good bit faster than the checksums. See "--algorithm" for more details on performance.

If you specify more than one server, mk-table-checksum assumes the first server is the master and others are slaves. Checksums are parallelized for speed, forking off a child process for each table. Duplicate server names are ignored, but if you want to checksum a server against itself you can use two different forms of the hostname (for example, "localhost 127.0.0.1", or "h=localhost,P=3306 h=localhost,P=3307").

If you want to compare the tables in one database to those in another database on the same server, just checksum both databases:

   mk-table-checksum --databases db1,db2

You can then use mk-checksum-filter to compare the results in both databases easily.

mk-table-checksum examines table structure only on the first host specified, so if anything differs on the others, it won't notice. It ignores views.

The checksums work on MySQL version 3.23.58 through 6.0-alpha. They will not necessarily produce the same values on all versions. Differences in formatting and/or space-padding between 4.1 and 5.0, for example, will cause the checksums to be different.

Specifying Hosts

mk-table-checksum connects to a theoretically unlimited number of MySQL servers. You specify a list of one or more host definitions on the command line, such as "host1 host2". Each host definition can be just a hostname, or it can be a complex string that specifies connection options as well. You can specify connection options two ways:

Format a host definition in a key=value,key=value form. If an argument on the command line contains the letter '=', mk-table-checksum will parse it into its component parts. Examine the "--help" output for details on the allowed keys.
Specifying a list of simple host definitions "host1 host2" is equivalent to the more complicated "h=host1 h=host2" format.
With the command-line options such as "--user" and "--password". These options, if given, apply globally to all host definitions.

In addition to specifying connection options this way, mk-table-checksum allows shortcuts. Any options specified for the first host definition on the command line fill in missing values in subsequent ones. Any options that are still missing after this are filled in from the command-line options if possible.

In other words, the places you specify connection options have precedence: highest precedence is the option specified directly in the host definition, next is the option specified in the first host definition, and lowest is the command-line option.

You can mix simple and complex host definitions and/or command-line arguments. For example, if all your servers except one of your slaves uses a non-standard port number:

   mk-table-checksum --port 4500 master h=slave1,P=3306 slave2 slave3

If you are confused about how mk-table-checksum will connect to your servers, give the "--explain-hosts" option and it will tell you.

How Fast is It?

Speed and efficiency are important, because the typical use case is checksumming large amounts of data.

mk-table-checksum is designed to do very little work itself, and generates very little network traffic aside from inspecting table structures with SHOW CREATE TABLE. The results of checksum queries are typically 40-character or shorter strings.

The MySQL server does the bulk of the work, in the form of the checksum queries. The following benchmarks show the checksum query times for various checksum algorithms. The first two results are simply running COUNT(col8) and CHECKSUM TABLE on the table. CHECKSUM TABLE is just CRC32 under the hood, but it's implemented inside the storage engine layer instead of at the MySQL layer.

 ALGORITHM       HASH FUNCTION  EXTRA           TIME
 ==============  =============  ==============  =====
 COUNT(col8)                                    2.3
 CHECKSUM TABLE                                 5.3
 BIT_XOR         FNV_64                         12.7
 ACCUM           FNV_64                         42.4
 BIT_XOR         MD5            --optimize-xor  80.0
 ACCUM           MD5                            87.4
 BIT_XOR         SHA1           --optimize-xor  90.1
 ACCUM           SHA1                           101.3
 BIT_XOR         MD5                            172.0
 BIT_XOR         SHA1                           197.3

The tests are entirely CPU-bound. The sample data is an InnoDB table with the following structure:

 CREATE TABLE test (
   col1 int NOT NULL,
   col2 date NOT NULL,
   col3 int NOT NULL,
   col4 int NOT NULL,
   col5 int,
   col6 decimal(3,1),
   col7 smallint unsigned NOT NULL,
   col8 timestamp NOT NULL,
   PRIMARY KEY  (col2, col1),
   KEY (col7),
   KEY (col1)
 ) ENGINE=InnoDB

The table has 4303585 rows, 365969408 bytes of data and 173457408 bytes of indexes. The server is a Dell PowerEdge 1800 with dual 32-bit Xeon 2.8GHz processors and 2GB of RAM. The tests are fully CPU-bound, and the server is otherwise idle. The results are generally consistent to within a tenth of a second on repeated runs.

CRC32 is the default checksum function to use, and should be enough for most cases. If you need stronger guarantees that your data is identical, you should use one of the other functions.

Algorithm Selection

The "--algorithm" option allows you to specify which algorithm you would like to use, but it does not guarantee that mk-table-checksum will use this algorithm. mk-table-checksum will ultimately select the best algorithm possible given various factors such as the MySQL version and other command line options.

The three basic algorithms in descending order of preference are CHECKSUM, BIT_XOR and ACCUM. CHECKSUM cannot be used if any one of these criteria is true:

  * L<"--where"> is used.
  * L<"--since"> is used.
  * L<"--chunk-size"> is used.
  * L<"--replicate"> is used.
  * L<"--count"> is used.
  * MySQL version less than 4.1.1.

The BIT_XOR algorithm also requires MySQL version 4.1.1 or later.

After checking these criteria, if the requested "--algorithm" remains then it is used, otherwise the first remaining algorithm with the highest preference is used.

Consistent Checksums

If you are using this tool to verify your slaves still have the same data as the master, which is why I wrote it, you should read this section.

The best way to do this with replication is to use the "--replicate" option. When the queries are finished running on the master and its slaves, you can go to the slaves and issue SQL queries to see if any tables are different from the master. Try the following:

  SELECT db, tbl, chunk, this_cnt-master_cnt AS cnt_diff,
     this_crc <> master_crc OR ISNULL(master_crc) <> ISNULL(this_crc)
        AS crc_diff
  FROM checksum
  WHERE master_cnt <> this_cnt OR master_crc <> this_crc
     OR ISNULL(master_crc) <> ISNULL(this_crc);

The "--replicate-check" option can do this query for you. If you can't use this method, try the following:

If your servers are not being written to, you can just run the tool with no further ado:
```
  mk-table-checksum server1 server2 ... serverN
```
If the servers are being written to, you need some way to make sure they are consistent at the moment you run the checksums. For situations other than master-slave replication, you will have to figure this out yourself. You may be able to use the "--where" option with a date or time column to only checksum data that's not recent.
If you are checksumming a master and slaves, you can do a fast parallel checksum and assume the slaves are caught up to the master. In practice, this tends to work well except for tables which are constantly updated. You can use the "--slave-lag" option to see how far behind each slave was when it checksummed a given table. This can help you decide whether to investigate further.
The next most disruptive technique is to lock the table on the master, then take checksums. This should prevent changes from propagating to the slaves. You can just lock on the master (with "--lock"), or you can both lock on the master and wait on the slaves till they reach that point in the master's binlog ("--wait"). Which is better depends on your workload; only you know that.

If you decide to make the checksums on the slaves wait until they're guaranteed to be caught up to the master, the algorithm looks like this:

 For each table,
   Master: lock table
   Master: get pos
   In parallel,
     Master: checksum
     Slave(s): wait for pos, then checksum
   End
   Master: unlock table
 End

What I typically do when I'm not using the "--replicate" option is simply run the tool on all servers with no further options. This runs fast, parallel, non-blocking checksums simultaneously. If there are tables that look different, I re-run with "--wait"=600 on the tables in question. This makes the tool lock on the master as explained above.

Output

Output is to STDOUT, one line per server and table, with header lines for each database. I tried to make the output easy to process with awk. For this reason columns are always present. If there's no value, mk-table-checksum prints 'NULL'.

The default is column-aligned output for human readability, but you can change it to tab-separated if you want. Use the "--tab" option for this.

Output is unsorted, though all lines for one table should be output together. For speed, all checksums are done in parallel (as much as possible) and may complete out of the order in which they were started. You might want to run them through another script or command-line utility to make sure they are in the order you want. If you pipe the output through mk-checksum-filter, you can sort the output and/or avoid seeing output about tables that have no differences.

The columns in the output are as follows. The database, table, and chunk come first so you can sort by them easily (they are the "primary key").

Output from "--replicate-check" and "--checksum" are different.

DATABASE: The database the table is in.
TABLE: The table name.
CHUNK: The chunk (see "--chunk-size"). Zero if you are not doing chunked checksums.
HOST: The server's hostname.
ENGINE: The table's storage engine.
COUNT: The table's row count, unless you specified to skip it. If OVERSIZE is printed, the chunk was skipped because the actual number of rows was greater than "--chunk-size" times "--chunk-size-limit".
CHECKSUM: The table's checksum, unless you specified to skip it or the table has no rows. some types of checksums will be 0 if there are no rows; others will print NULL.
TIME: How long it took to checksum the CHUNK, not including WAIT time. Total checksum time is WAIT + TIME.
WAIT: How long the slave waited to catch up to its master before beginning to checksum. WAIT is always 0 for the master. See "--wait".
STAT: The return value of MASTER_POS_WAIT(). STAT is always NULL for the master.
LAG: How far the slave lags the master, as reported by SHOW SLAVE STATUS. LAG is always NULL for the master.

Replicate Table Maintenance

If you use "--replicate" to store and replicate checksums, you may need to perform maintenance on the replicate table from time to time to remove old checksums. This section describes when checksums in the replicate table are deleted automatically by mk-table-checksum and when you must manually delete them.

Before starting, mk-table-checksum calculates chunks for each table, even if "--chunk-size" is not specified (in that case there is one chunk: "1=1"). Then, before checksumming each table, the tool deletes checksum chunks in the replicate table greater than the current number of chunks. For example, if a table is chunked into 100 chunks, 0-99, then mk-table-checksum does:

  DELETE FROM replicate table WHERE db=? AND tbl=? AND chunk > 99

That removes any high-end chunks from previous runs which no longer exist. Currently, this operation cannot be disabled.

If you use "--resume", "--resume-replicate", or "--modulo", then you need to be careful that the number of rows in a table does not decrease so much that the number of chunks decreases too, else some checksum chunks may be deleted. The one exception is if only rows at the high end of the range are deleted. In that case, the high-end chunks are deleted and lower chunks remain unchanged. An increasing number of rows or chunks should not cause any adverse affects.

Changing the "--chunk-size" between runs with "--resume", "--resume-replicate", or "--modulo" can cause odd or invalid checksums. You should not do this. It won't work with the resume options. With "--modulo", the safest thing to do is manually delete all the rows in the replicate table for the table in question and start over.

If the replicate table becomes cluttered with old or invalid checksums and the auto-delete operation is not deleting them, then you will need to manually clean up the replicate table. Alternatively, if you specify "--empty-replicate-table", then the tool deletes every row in the replicate table.

Exit Status

An exit status of 0 (sometimes also called a return value or return code) indicates success. If there is an error checksumming any table, the exit status is 1.

When running "--replicate-check", if any slave has chunks that differ from the master, the exit status is 1.

Queries

If you are using innotop (see <http://code.google.com/p/innotop>), mytop, or another tool to watch currently running MySQL queries, you may see the checksum queries. They look similar to this:

  REPLACE /*test.test_tbl:'2'/'5'*/ INTO test.checksum(db, ...

Since mk-table-checksum's queries run for a long time and tend to be textually very long, and thus won't fit on one screen of these monitoring tools, I've been careful to place a comment at the beginning of the query so you can see what it is and what it's doing. The comment contains the name of the table that's being checksummed, the chunk it is currently checksumming, and how many chunks will be checksummed. In the case above, it is checksumming chunk 2 of 5 in table test.test_tbl.

Options

"--schema" is restricted to option groups Connection, Filter, Output, Help, Config, Safety.

"--empty-replicate-table", "--resume" and "--resume-replicate" are mutually exclusive.

This tool accepts additional command-line arguments. Refer to the "Synopsis" and usage information for details.

--algorithm

type: string

Checksum algorithm (ACCUM|CHECKSUM|BIT_XOR).

Specifies which checksum algorithm to use. Valid arguments are CHECKSUM, BIT_XOR and ACCUM. The latter two do cryptographic hash checksums. See also "Algorithm Selection".

CHECKSUM is built into MySQL, but has some disadvantages. BIT_XOR and ACCUM are implemented by SQL queries. They use a cryptographic hash of all columns concatenated together with a separator, followed by a bitmap of each nullable column that is NULL (necessary because CONCAT_WS() skips NULL columns).

CHECKSUM is the default. This method uses MySQL's built-in CHECKSUM TABLE command, which is a CRC32 behind the scenes. It cannot be used before MySQL 4.1.1, and various options disable it as well. It does not simultaneously count rows; that requires an extra COUNT(*) query. This is a good option when you are using MyISAM tables with live checksums enabled; in this case both the COUNT(*) and CHECKSUM queries will run very quickly.

The BIT_XOR algorithm is available for MySQL 4.1.1 and newer. It uses BIT_XOR(), which is order-independent, to reduce all the rows to a single checksum.

ACCUM uses a user variable as an accumulator. It reduces each row to a single checksum, which is concatenated with the accumulator and re-checksummed. This technique is order-dependent. If the table has a primary key, it will be used to order the results for consistency; otherwise it's up to chance.

The pathological worst case is where identical rows will cancel each other out in the BIT_XOR. In this case you will not be able to distinguish a table full of one value from a table full of another value. The ACCUM algorithm will distinguish them.

However, the ACCUM algorithm is order-dependent, so if you have two tables with identical data but the rows are out of order, you'll get different checksums with ACCUM.

If a given algorithm won't work for some reason, mk-table-checksum falls back to another. The least common denominator is ACCUM, which works on MySQL 3.23.2 and newer.

--arg-table

type: string

The database.table with arguments for each table to checksum.

This table may be named anything you wish. It must contain at least the following columns:

  CREATE TABLE checksum_args (
     db         char(64)     NOT NULL,
     tbl        char(64)     NOT NULL,
     -- other columns as desired
     PRIMARY KEY (db, tbl)
  );

In addition to the columns shown, it may contain any of the other columns listed here (Note: this list is used by the code, MAGIC_overridable_args):

  algorithm chunk-column chunk-index chunk-size columns count crc function lock
  modulo use-index offset optimize-xor chunk-size-limit probability separator
  save-since single-chunk since since-column sleep sleep-coef trim wait where

Each of these columns corresponds to the long form of a command-line option. Each column should be NULL-able. Column names with hyphens should be enclosed in backticks (e.g. `chunk-size`) when the table is created. The data type does not matter, but it's suggested you use a sensible data type to prevent garbage data.

When mk-table-checksum checksums a table, it will look for a matching entry in this table. Any column that has a defined value will override the corresponding command-line argument for the table being currently processed. In this way it is possible to specify custom command-line arguments for any table.

If you add columns to the table that aren't in the above list of allowable columns, it's an error. The exceptions are db, tbl, and ts. The ts column can be used as a timestamp for easy visibility into the last time the since column was updated with "--save-since".

This table is assumed to be located on the first server given on the command-line.

--ask-pass

group: Connection

Prompt for a password when connecting to MySQL.

--check-interval

type: time; group: Throttle; default: 1s

How often to check for slave lag if "--check-slave-lag" is given.

--[no]check-replication-filters

default: yes; group: Safety

Do not "--replicate" if any replication filters are set. When --replicate is specified, mk-table-checksum tries to detect slaves and look for options that filter replication, such as binlog_ignore_db and replicate_do_db. If it finds any such filters, it aborts with an error. Replication filtering makes it impossible to be sure that the checksum queries won't break replication or simply fail to replicate. If you are sure that it's OK to run the checksum queries, you can negate this option to disable the checks. See also "--replicate-database".

--check-slave-lag

type: DSN; group: Throttle

Pause checksumming until the specified slave's lag is less than "--max-lag".

If this option is specified and "--throttle-method" is set to slavelag then "--throttle-method" only checks this slave.

--checksum

group: Output

Print checksums and table names in the style of md5sum (disables "--[no]count").

Makes the output behave more like the output of md5sum. The checksum is first on the line, followed by the host, database, table, and chunk number, concatenated with dots.

--chunk-column

type: string

Prefer this column for dividing tables into chunks. By default, mk-table-checksum chooses the first suitable column for each table, preferring to use the primary key. This option lets you specify a preferred column, which mk-table-checksum uses if it exists in the table and is chunkable. If not, then mk-table-checksum will revert to its default behavior. Be careful when using this option; a poor choice could cause bad performance. This is probably best to use when you are checksumming only a single table, not an entire server. See also "--chunk-index".

--chunk-index

type: string

Prefer this index for chunking tables. By default, mk-table-checksum chooses an appropriate index for the "--chunk-column" (even if it chooses the chunk column automatically). This option lets you specify the index you prefer. If the index doesn't exist, then mk-table-checksum will fall back to its default behavior. mk-table-checksum adds the index to the checksum SQL statements in a FORCE INDEX clause. Be careful when using this option; a poor choice of index could cause bad performance. This is probably best to use when you are checksumming only a single table, not an entire server.

--chunk-range

type: string; default: open

Set which ends of the chunk range are open or closed. Possible values are one of MAGIC_chunk_range:

   VALUE       OPENS/CLOSES
   ==========  ======================
   open        Both ends are open
   openclosed  Low end open, high end closed

By default mk-table-checksum uses an open range of chunks like:

  `id` <  '10'
  `id` >= '10' AND < '20'
  `id` >= '20'

That range is open because the last chunk selects any row with id greater than (or equal to) 20. An open range can be a problem in cases where a lot of new rows are inserted with IDs greater than 20 while mk-table-checksumming is running because the final open-ended chunk will select all the newly inserted rows. (The less common case of inserting rows with IDs less than 10 would require a closedopen range but that is not currently implemented.) Specifying openclosed will cause the final chunk to be closed like:

  `id` >= '20' AND `id` <= N

N is the MAX(`id`) that mk-table-checksum used when it first chunked the rows. Therefore, it will only chunk the range of rows that existed when the tool started and not any newly inserted rows (unless those rows happen to be inserted with IDs less than N).

See also "--modulo".

--progress

type: array; default: time,30

Print progress reports to STDERR. Currently, this feature is only for when "--throttle-method" waits for slaves to catch up.

The value is a comma-separated list with two parts. The first part can be percentage, time, or iterations; the second part specifies how often an update should be printed, in percentage, seconds, or number of iterations.

--quiet

short form: -q; group: Output

Do not print checksum results.

--recheck

Re-checksum chunks that "--replicate-check" found to be different.

--recurse

type: int; group: Throttle

Number of levels to recurse in the hierarchy when discovering slaves. Default is infinite.

See "--recursion-method".

--recursion-method

type: string

Preferred recursion method for discovering slaves.

Possible methods are:

  METHOD       USES
  ===========  ================
  processlist  SHOW PROCESSLIST
  hosts        SHOW SLAVE HOSTS

The processlist method is preferred because SHOW SLAVE HOSTS is not reliable. However, the hosts method is required if the server uses a non-standard port (not 3306). Usually mk-table-checksum does the right thing and finds the slaves, but you may give a preferred method and it will be used first. If it doesn't find any slaves, the other methods will be tried.

--replicate

type: string

Replicate checksums to slaves (disallows --algorithm CHECKSUM).

This option enables a completely different checksum strategy for a consistent, lock-free checksum across a master and its slaves. Instead of running the checksum queries on each server, you run them only on the master. You specify a table, fully qualified in db.table format, to insert the results into. The checksum queries will insert directly into the table, so they will be replicated through the binlog to the slaves.

When the queries are finished replicating, you can run a simple query on each slave to see which tables have differences from the master. With the "--replicate-check" option, mk-table-checksum can run the query for you to make it even easier. See "Consistent Checksums" for details.

If you find tables that have differences, you can use the chunk boundaries in a WHERE clause with mk-table-sync to help repair them more efficiently. See mk-table-sync for details.

The table must have at least these columns: db, tbl, chunk, boundaries, this_crc, master_crc, this_cnt, master_cnt. The table may be named anything you wish. Here is a suggested table structure, which is automatically used for "--create-replicate-table" (MAGIC_create_replicate):

  CREATE TABLE checksum (
     db         char(64)     NOT NULL,
     tbl        char(64)     NOT NULL,
     chunk      int          NOT NULL,
     boundaries char(100)    NOT NULL,
     this_crc   char(40)     NOT NULL,
     this_cnt   int          NOT NULL,
     master_crc char(40)         NULL,
     master_cnt int              NULL,
     ts         timestamp    NOT NULL,
     PRIMARY KEY (db, tbl, chunk)
  );

Be sure to choose an appropriate storage engine for the checksum table. If you are checksumming InnoDB tables, for instance, a deadlock will break replication if the checksum table is non-transactional, because the transaction will still be written to the binlog. It will then replay without a deadlock on the slave and break replication with "different error on master and slave." This is not a problem with mk-table-checksum, it's a problem with MySQL replication, and you can read more about it in the MySQL manual.

This works only with statement-based replication (mk-table-checksum will switch the binlog format to STATEMENT for the duration of the session if your server uses row-based replication).

In contrast to running the tool against multiple servers at once, using this option eliminates the complexities of synchronizing checksum queries across multiple servers, which normally requires locking and unlocking, waiting for master binlog positions, and so on. Thus, it disables "--lock", "--wait", and "--slave-lag" (but not "--check-slave-lag", which is a way to throttle the execution speed).

The checksum queries actually do a REPLACE into this table, so existing rows need not be removed before running. However, you may wish to do this anyway to remove rows related to tables that don't exist anymore. The "--empty-replicate-table" option does this for you.

Since the table must be qualified with a database (e.g. db.checksums), mk-table-checksum will only USE this database. This may be important if any replication options are set because it could affect whether or not changes to the table are replicated.

If the slaves have any --replicate-do-X or --replicate-ignore-X options, you should be careful not to checksum any databases or tables that exist on the master and not the slaves. Changes to such tables may not normally be executed on the slaves because of the --replicate options, but the checksum queries modify the contents of the table that stores the checksums, not the tables whose data you are checksumming. Therefore, these queries will be executed on the slave, and if the table or database you're checksumming does not exist, the queries will cause replication to fail. For more information on replication rules, see <http://dev.mysql.com/doc/en/replication-rules.html>.

The table specified by "--replicate" will never be checksummed itself.

--replicate-check

type: int

Check results in "--replicate" table, to the specified depth. You must use this after you run the tool normally; it skips the checksum step and only checks results.

It recursively finds differences recorded in the table given by "--replicate". It recurses to the depth you specify: 0 is no recursion (check only the server you specify), 1 is check the server and its slaves, 2 is check the slaves of its slaves, and so on.

It finds differences by running the query shown in "Consistent Checksums", and prints results, then exits after printing. This is just a convenient way of running the query so you don't have to do it manually.

The output is one informational line per slave host, followed by the results of the query, if any. If "--quiet" is specified, there is no output. If there are no differences between the master and any slave, there is no output. If any slave has chunks that differ from the master, mk-table-checksum's exit status is 1; otherwise it is 0.

This option makes mk-table-checksum look for slaves by running SHOW PROCESSLIST. If it finds connections that appear to be from slaves, it derives connection information for each slave with the same default-and-override method described in "Specifying Hosts".

If SHOW PROCESSLIST doesn't return any rows, mk-table-checksum looks at SHOW SLAVE HOSTS instead. The host and port, and user and password if available, from SHOW SLAVE HOSTS are combined into a DSN and used as the argument. This requires slaves to be configured with report-host, report-port and so on.

This requires the @@SERVER_ID system variable, so it works only on MySQL 3.23.26 or newer.

--replicate-database

type: string

USE only this database with "--replicate". By default, mk-table-checksum executes USE to set its default database to the database that contains the table it's currently working on. It changes its default database as it works on different tables. This is is a best effort to avoid problems with replication filters such as binlog_ignore_db and replicate_ignore_db. However, replication filters can create a situation where there simply is no one right way to do things. Some statements might not be replicated, and others might cause replication to fail on the slaves. In such cases, it is up to the user to specify a safe default database. This option specifies a default database that mk-table-checksum selects with USE, and never changes afterwards. See also <L"--[no]check-replication-filters">.

--resume

type: string

Resume checksum using given output file from a previously interrupted run.

The given output file should be the literal output from a previous run of mk-table-checksum. For example:

   mk-table-checksum host1 host2 -C 100 > checksum_results.txt
   mk-table-checksum host1 host2 -C 100 --resume checksum_results.txt

The command line options given to the first run and the resumed run must be identical (except, of course, for --resume). If they are not, the result will be unpredictable and probably wrong.

"--resume" does not work with "--replicate"; for that, use "--resume-replicate".

--resume-replicate

Resume "--replicate".

This option resumes a previous checksum operation using "--replicate". It is like "--resume" but does not require an output file. Instead, it uses the checksum table given to "--replicate" to determine where to resume the checksum operation.

--save-since

When "--arg-table" and "--since" are given, save the current "--since" value into that table's since column after checksumming. In this way you can incrementally checksum tables by starting where the last one finished.

The value to be saved could be the current timestamp, or it could be the maximum existing value of the column given by "--since-column". It depends on what options are in effect. See the description of "--since" to see how timestamps are different from ordinary values.

--schema

Checksum SHOW CREATE TABLE instead of table data.

--separator

type: string; default: #

The separator character used for CONCAT_WS().

This character is used to join the values of columns when checksumming with "--algorithm" of BIT_XOR or ACCUM.

--set-vars

type: string; default: wait_timeout=10000; group: Connection

Set these MySQL variables. Immediately after connecting to MySQL, this string will be appended to SET and executed.

--since

type: string

Checksum only data newer than this value.

If the table is chunk-able or nibble-able, this value will apply to the first column of the chunked or nibbled index.

This is not too different to "--where", but instead of universally applying a WHERE clause to every table, it selectively finds the right column to use and applies it only if such a column is found. See also "--since-column".

The argument may be an expression, which is evaluated by MySQL. For example, you can specify CURRENT_DATE - INTERVAL 7 DAY to get the date of one week ago.

A special bit of extra magic: if the value is temporal (looks like a date or datetime), then the table is checksummed only if the create time (or last modified time, for tables that report the last modified time, such as MyISAM tables) is newer than the value. In this sense it's not applied as a WHERE clause at all.

--since-column

type: string

The column name to be used for "--since".

The default is for the tool to choose the best one automatically. If you specify a value, that will be used if possible; otherwise the best auto-determined one; otherwise none. If the column doesn't exist in the table, it is just ignored.

--single-chunk

Permit skipping with "--probability" if there is only one chunk.

Normally, if a table isn't split into many chunks, it will always be checksummed regardless of "--probability". This setting lets the probabilistic behavior apply to tables that aren't divided into chunks.

--slave-lag

group: Output

Report replication delay on the slaves.

If this option is enabled, the output will show how many seconds behind the master each slave is. This can be useful when you want a fast, parallel, non-blocking checksum, and you know your slaves might be delayed relative to the master. You can inspect the results and make an educated guess whether any discrepancies on the slave are due to replication delay instead of corrupt data.

If you're using "--replicate", a slave that is delayed relative to the master does not invalidate the correctness of the results, so this option is disabled.

--sleep

type: int; group: Throttle

Sleep time between checksums.

If this option is specified, mk-table-checksum will sleep the specified number of seconds between checksums. That is, it will sleep between every table, and if you specify "--chunk-size", it will also sleep between chunks.

This is a very crude way to throttle checksumming; see "--sleep-coef" and "--check-slave-lag" for techniques that permit greater control.

--sleep-coef

type: float; group: Throttle

Calculate "--sleep" as a multiple of the last checksum time.

If this option is specified, mk-table-checksum will sleep the amount of time elapsed during the previous checksum, multiplied by the specified coefficient. This option is ignored if "--sleep" is specified.

This is a slightly more sophisticated way to throttle checksum speed: sleep a varying amount of time between chunks, depending on how long the chunks are taking. Even better is to use "--check-slave-lag" if you're checksumming master/slave replication.

--socket

short form: -S; type: string; group: Connection

Socket file to use for connection.

--tab

group: Output

Print tab-separated output, not column-aligned output.

--tables

short form: -t; type: hash; group: Filter

Do only this comma-separated list of tables.

Table names may be qualified with the database name.

--tables-regex

type: string

Only checksum tables whose names match this Perl regex.

--throttle-method

type: string; default: none; group: Throttle

Throttle checksumming when doing "--replicate".

At present there is only one method: slavelag. When "--replicate" is used, mk-table-checksum automatically sets "--throttle-method" to slavelag and discovers every slave and throttles checksumming if any slave lags more than "--max-lag". Specify -throttle-method none to disable this behavior completely, or specify "--check-slave-lag" and mk-table-checksum will only check that slave.

See also "--recurse" and "--recursion-method".

--trim

Trim VARCHAR columns (helps when comparing 4.1 to >= 5.0).

This option adds a TRIM() to VARCHAR columns in BIT_XOR and ACCUM modes.

This is useful when you don't care about the trailing space differences between MySQL versions which vary in their handling of trailing spaces. MySQL 5.0 and later all retain trailing spaces in VARCHAR, while previous versions would remove them.

--unchunkable-tables

group: Safety

Checksum tables that cannot be chunked when "--chunk-size" is specified.

By default mk-table-checksum will not checksum a table that cannot be chunked when "--chunk-size" is specified because this might result in a huge, non-chunkable table being checksummed in one huge, memory-intensive chunk.

Specifying this option allows checksumming tables that cannot be chunked. Be careful when using this option! Make sure any non-chunkable tables are not so large that they will cause the tool to consume too much memory or CPU.

DSN Options

These DSN options are used to create a DSN. Each option is given like option=value. The options are case-sensitive, so P and p are not the same option. There cannot be whitespace before or after the = and if the value contains whitespace it must be quoted. DSN options are comma-separated. See the maatkit manpage for full details.

A
dsn: charset; copy: yes
Default character set.
D
dsn: database; copy: yes
Default database.
F
dsn: mysql_read_default_file; copy: yes
Only read default options from the given file
h
dsn: host; copy: yes
Connect to host.
p
dsn: password; copy: yes
Password to use when connecting.
P
dsn: port; copy: yes
Port number to use for connection.
S
dsn: mysql_socket; copy: yes
Socket file to use for connection.
u
dsn: user; copy: yes
User for login if not current user.

Downloading

You can download Maatkit from Google Code at <http://code.google.com/p/maatkit/>, or you can get any of the tools easily with a command like the following:

   wget http://www.maatkit.org/get/toolname
   or
   wget http://www.maatkit.org/trunk/toolname

Where toolname can be replaced with the name (or fragment of a name) of any of the Maatkit tools. Once downloaded, they're ready to run; no installation is needed. The first URL gets the latest released version of the tool, and the second gets the latest trunk code from Subversion.

Environment

The environment variable MKDEBUG enables verbose debugging output in all of the Maatkit tools:

   MKDEBUG=1 mk-....

System Requirements

You need Perl, DBI, DBD::mysql, and some core packages that ought to be installed in any reasonably new version of Perl.

Bugs

For a list of known bugs see <http://www.maatkit.org/bugs/mk-table-checksum>.

Please use Google Code Issues and Groups to report bugs or request support: <http://code.google.com/p/maatkit/>. You can also join #maatkit on Freenode to discuss Maatkit.

Please include the complete command-line used to reproduce the problem you are seeing, the version of all MySQL servers involved, the complete output of the tool when run with "--version", and if possible, debugging output produced by running with the MKDEBUG=1 environment variable.

Copyright, License and Warranty

THIS PROGRAM IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2; OR the Perl Artistic License. On UNIX and similar systems, you can issue `man perlgpl' or `man perlartistic' to read these licenses.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.

Author

Baron "Xaprb" Schwartz

About Maatkit

This tool is part of Maatkit, a toolkit for power users of MySQL. Maatkit was created by Baron Schwartz; Baron and Daniel Nichter are the primary code contributors. Both are employed by Percona. Financial support for Maatkit development is primarily provided by Percona and its clients.

mk-table-checksum.1p - Man Page

Synopsis

Risks

Description

Specifying Hosts

How Fast is It?

Algorithm Selection

Consistent Checksums

Output

Replicate Table Maintenance

Exit Status

Queries

Options

DSN Options

Downloading

Environment

System Requirements

Bugs

Copyright, License and Warranty

See Also

Author

About Maatkit

Acknowledgments

Version

Info