csvclean - Man Page
csvclean Documentation
Examples (TL;DR)
- Clean a CSV file:
csvclean bad.csv
- List locations of syntax errors in a CSV file:
csvclean -n bad.csv
Description
Cleans a CSV file of common syntax errors:
- reports rows that have a different number of columns than the header row
- attempts to correct the CSV by joining short rows into a single row
Note that every csvkit tool does the following:
- removes optional quote characters, unless the –quoting (-u) option is set to change this behavior
- changes the field delimiter to a comma, if the input delimiter is set with the –delimiter (-d) or –tabs (-t) options
- changes the record delimiter to a line feed (LF or \n)
- changes the quote character to a double-quotation mark, if the character is set with the –quotechar (-q) option
- changes the character encoding to UTF-8, if the input encoding is set with the –encoding (-e) option
Outputs [basename]_out.csv and [basename]_err.csv, the former containing all valid rows and the latter containing all error rows along with line numbers and descriptions:
usage: csvclean [-h] [-d DELIMITER] [-t] [-q QUOTECHAR] [-u {0,1,2,3}] [-b] [-p ESCAPECHAR] [-z FIELD_SIZE_LIMIT] [-e ENCODING] [-S] [-H] [-K SKIP_LINES] [-v] [-l] [--zero] [-V] [-n] [FILE] Fix common errors in a CSV file. positional arguments: FILE The CSV file to operate on. If omitted, will accept input as piped data via STDIN. optional arguments: -h, --help show this help message and exit -n, --dry-run Do not create output files. Information about what would have been done will be printed to STDERR.
See also: Arguments common to all tools.
Examples
Test a file with known bad rows:
csvclean -n examples/bad.csv Line 1: Expected 3 columns, found 4 columns Line 2: Expected 3 columns, found 2 columns
To change the line ending from line feed (LF or \n) to carriage return and line feed (CRLF or \r\n) use:
csvformat -M $'\r\n' examples/dummy.csv
Author
Christopher Groskopf
Copyright
2023, Christopher Groskopf
Info
Jul 21, 2023 1.1.1 csvkit