wget2 - Man Page

a recursive metalink/file/website downloader.

Examples (TL;DR)

Download the contents of a URL to a file using multiple threads (default behavior differs from wget): wget2 https://example.com/resource
Limit the number of threads used for downloading (default is 5 threads): wget2 --max-threads 10 https://example.com/resource
Download a single web page and all its resources (scripts, stylesheets, images, etc.): wget2 [-p|--page-requisites] [-k|--convert-links] https://example.com/somepage.html
Mirror a website, but do not ascend to the parent directory (does not download embedded page elements): wget2 [-m|--mirror] [-np|--no-parent] https://example.com/somepath/
Limit the download speed and the number of connection retries: wget2 --limit-rate 300k [-t|--tries] 100 https://example.com/somepath/
Continue an incomplete download (behavior is consistent with wget): wget2 [-c|--continue] https://example.com
Download all URLs stored in a text file to a specific directory: wget2 [-P|--directory-prefix] path/to/directory [-i|--input-file] URLs.txt
Download a file from an HTTP server using Basic Auth (also works for HTTPS): wget2 --user username --password password https://example.com

Description

GNU Wget2 is a free utility for non-interactive download of files from the Web. It supports HTTP and HTTPS protocols, as well as retrieval through HTTP(S) proxies.

Wget2 is non-interactive, meaning that it can work in the background, while the user is not logged on. This allows you to start a retrieval and disconnect from the system, letting Wget2 finish the work. By contrast, most of the Web browsers require constant user’s presence, which can be a great hindrance when transferring a lot of data.

Wget2 can follow links in HTML, XHTML, CSS, RSS, Atom and sitemap files to create local versions of remote web sites, fully recreating the directory structure of the original site. This is sometimes referred to as recursive downloading. While doing that, Wget2 respects the Robot Exclusion Standard (/robots.txt). Wget2 can be instructed to convert the links in downloaded files to point at the local files, for offline viewing.

Wget2 has been designed for robustness over slow or unstable network connections; if a download fails due to a network problem, it will keep retrying until the whole file has been retrieved. If the server supports partial downloads, it may continue the download from where it left off.

Options

Option Syntax

Every option has a long form and sometimes also a short one. Long options are more convenient to remember, but take time to type. You may freely mix different option styles. Thus you may write:

  wget2 -r --tries=10 https://example.com/ -o log

The space between the option accepting an argument and the argument may be omitted. Instead of -o log you can write -olog.

You may put several options that do not require arguments together, like:

  wget2 -drc <URL>

This is equivalent to:

  wget2 -d -r -c <URL>

Since the options can be specified after the arguments, you may terminate them with --. So the following will try to download URL -x, reporting failure to log:

  wget2 -o log -- -x

The options that accept comma-separated lists all respect the convention that prepending --no- clears its value. This can be useful to clear the .wget2rc settings. For instance, if your .wget2rc sets exclude-directories to /cgi-bin, the following example will first reset it, and then set it to exclude /priv and /trash. You can also clear the lists in .wget2rc.

  wget2 --no-exclude-directories -X /priv,/trash

Most options that do not accept arguments are boolean options, so named because their state can be captured with a yes-or-no (“boolean”) variable. A boolean option is either affirmative or negative (beginning with --no-). All such options share several properties.

Affirmative options can be negated by prepending the --no- to the option name; negative options can be negated by omitting the --no- prefix. This might seem superfluous - if the default for an affirmative option is to not do something, then why provide a way to explicitly turn it off? But the startup file may in fact change the default. For instance, using timestamping = on in .wget2rc makes Wget2 download updated files only. Using --no-timestamping is the only way to restore the factory default from the command line.

Basic Startup Options

-V, --version

Display the version of Wget2.

-h, --help

Print a help message describing all of Wget2’s command-line options.

-b, --background

Go to background immediately after startup. If no output file is specified via the -o, output is redirected to wget-log.

-e, --execute=command

Execute command as if it were a part of .wget2rc. A command thus invoked will be executed after the commands in .wget2rc, thus taking precedence over them. If you need to specify more than one wget2rc command, use multiple instances of -e.

--hyperlink

Hyperlink names of downloaded files so that they can opened from the terminal by clicking on them. Only a few terminal emulators currently support hyperlinks. Enable this option if you know your terminal supports hyperlinks.

Logging and Input File Options

-o, --output-file=logfile

Log all messages to logfile. The messages are normally reported to standard error.

-a, --append-output=logfile

Append to logfile. This is the same as -o, only it appends to logfile instead of overwriting the old log file. If logfile does not exist, a new file is created.

-d, --debug

Turn on debug output, meaning various information important to the developers of Wget2 if it does not work properly. Your system administrator may have chosen to compile Wget2 without debug support, in which case -d will not work. Please note that compiling with debug support is always safe, Wget2 compiled with the debug support will not print any debug info unless requested with -d.

-q, --quiet

Turn off Wget2’s output.

-v, --verbose

Turn on verbose output, with all the available data. The default output is verbose.

-nv, --no-verbose

Turn off verbose without being completely quiet (use -q for that), which means that error messages and basic information still get printed.

--report-speed=type

Output bandwidth as type. The only accepted values are bytes (which is set by default) and bits. This option only works if --progress=bar is also set.

-i, --input-file=file

Read URLs from a local or external file. If - is specified as file, URLs are read from the standard input. Use ./- to read from a file literally named -.

If this function is used, no URLs need be present on the command line. If there are URLs both on the command line and in an input file, those on the command lines will be the first ones to be retrieved. file is expected to contain one URL per line, except one of the --force- options specifies a different format.

If you specify --force-html, the document will be regarded as HTML. In that case you may have problems with relative links, which you can solve either by adding <base href="url"> to the documents or by specifying --base=url on the command line.

If you specify --force-css, the document will be regarded as CSS.

If you specify --force-sitemap, the document will be regarded as XML sitemap.

If you specify --force-atom, the document will be regarded as Atom Feed.

If you specify --force-rss, the document will be regarded as RSS Feed.

If you specify --force-metalink, the document will be regarded as Metalink description.

If you have problems with relative links, you should use --base=url on the command line.

-F, --force-html

When input is read from a file, force it to be treated as an HTML file. This enables you to retrieve relative links from existing HTML files on your local disk, by adding <base href="url"> to HTML, or using the --base command-line option.

--force-css

Read and parse the input file as CSS. This enables you to retrieve links from existing CSS files on your local disk. You will need --base to handle relative links correctly.

--force-sitemap

Read and parse the input file as sitemap XML. This enables you to retrieve links from existing sitemap files on your local disk. You will need --base to handle relative links correctly.

--force-atom

Read and parse the input file as Atom Feed XML. This enables you to retrieve links from existing sitemap files on your local disk. You will need --base to handle relative links correctly.

--force-rss

Read and parse the input file as RSS Feed XML. This enables you to retrieve links from existing sitemap files on your local disk. You will need --base to handle relative links correctly.

--force-metalink

Read and parse the input file as Metalink. This enables you to retrieve links from existing Metalink files on your local disk. You will need --base to handle relative links correctly.

-B, --base=URL

Resolves relative links using URL as the point of reference, when reading links from an HTML file specified via the -i/--input-file option (together with a --force... option, or when the input file was fetched remotely from a server describing it as HTML, CSS, Atom or RSS). This is equivalent to the presence of a “BASE” tag in the HTML input file, with URL as the value for the “href” attribute.

For instance, if you specify https://example.com/bar/a.html for URL, and Wget2 reads ../baz/b.html from the input file, it would be resolved to https://example.com/baz/b.html.

--config=FILE

Specify the location of configuration files you wish to use. If you specify more than one file, either by using a comma-separated list or several --config options, these files are read in left-to-right order. The files given in $SYSTEM_WGET2RC and ($WGET2RC or ~/.wget2rc) are read in that order and then the user-provided config file(s). If set, $WGET2RC replaces ~/.wget2rc.

--no-config empties the internal list of config files. So if you want to prevent reading any config files, give --no-config on the command line.

--no-config followed by --config=file just reads file and skips reading the default config files.

Wget will attempt to tilde-expand filenames written in the configuration file on supported platforms. To use a file that starts with the character literal `~', use “./~” or an absolute path.

--rejected-log=logfile [Not implemented yet]

Logs all URL rejections to logfile as comma separated values. The values include the reason of rejection, the URL and the parent URL it was found in.

--local-db

Enables reading/writing to local database files (default: on).

These are the files for --hsts, --hpkp, --ocsp, etc.

With --no-local-db you can switch reading/writing off, e.g. useful for testing.

This option does not influence the reading of config files.