trurl - Man Page
transpose URLs
Synopsis
trurl [options]
Description
trurl parses, manipulates and outputs URLs and parts of URLs.
It uses the RFC 3986 definition of URLs and it uses libcurl's URL parser to do so, which includes a few "extensions". The URL support is limited to "hierarchical" URLs, the ones that use "://" separators after the scheme.
Typically you pass in one or more URLs and decide what of that you want output. Posssibly modifying the URL as well.
trurl knows URLs and every URL consists of up to ten separate and independent "components". These components can be extracted, removed and updated with trurl and they are referred to by their respective names: scheme, user, password, options, host, port, path, query, fragment and zoneid.
Options
- -a, --append [component]=[data]
Append data to a component. This can only append data to the path and the query components.
For path, this URL encodes and appends the new segment to the path, separated with a slash.
For query, this URL encodes and appends the new segment to the query, separated with an ampersand (&). If the appended segment contains an equal sign ('=') that one will be kept verbatim and both sides of the first occurrence will be URL encoded separately.
- --accept-space
When set, trurl will try to accept spaces as part of the URL and instead URL encode such occurrences accordingly.
According to RFC 3986, a space cannot legally be part of a URL. This option provides a best-effort to convert the provided string into a valid URL.
- -f, --url-file [file name]
Read URLs to work on from the given file. Use the file name "-" (a single minus) to tell trurl to read the URLs from stdin.
Each line needs to be a single valid URL - but trurl will trim off any trailing spaces and tabs from the end of the line. The maximum URL length supported in a file like this is 4095 bytes.
- -g, --get [format]
Output text and URL data according to the provided format string. Components from the URL can be output when specified as {component} or [component], with the name of the part show within curly braces or brackets. You can not mix braces and brackets for this purpose in the same command line.
The following component names are available (case sensitive): url, scheme, user, password, options, host, port, path, query, fragment and zoneid.
Components are shown URL decoded by default. If you instead write the component prefixed with a colon like "{:path}", it gets output URL encoded.
Hosts provided as IPv6 numerical addresses will be provided within square brackets. Like "[fe80::20c:29ff:fe9c:409b]".
Hosts provided as IPv4 numerical addresses will be "normalized" and provided as four dot-separated decimal numbers when output.
You can access specific keys in the query string using the format {query:key}. Then the value of the first matching key will be output using a case sensitive match. When extracting a URL decoded query key that contains %00, such octet will be replaced with a single period '.' in the output.
You can access specific keys in the query string and out all values using the format {query-all:key}. This looks for 'key' case sensitively and will output all values for that key space-separated.
You can access the url and host components in their "punycoded" version, which is how International Domain Names are converted into plain ascii, by using the form {puny:yrl} and {puny:host}. If the host name is not using IDN, this option provides the regular ascii name.
The "format" string supports the following backslash sequences:
\\ - backslash
\t - tab
\n - newline
\r - carriage return
\{ - an open curly brace that does not start a variable
\[ - an open bracket that does not start a variable
All other text in the format string will be shown as-is.
- -h, --help
Show the help output.
- --iterate [component]=[item1 item2 ...]
Set the component to multiple values and output the result once for each iteration. Several combined iterations are allowed to generate combinations, but only one --iterate option per component.
- --json
Outputs all set components of the URLs as JSON objects. All components of the URL that has data will get populated in the object using their component names.
- --query-separator [what]
Specify the single letter used for separating query pairs. The default is "&" but at least in the past sometimes semicolons ";" or even colons ":" have been used for this purpose. If your URL uses something other than the default letter, setting the right one makes sure trurl can do its query operations properly.
- --redirect [URL]
Redirect the URL to this new location. It requires that you set the base url with --url
- -s, --set [component][:]=[data]
Set this URL component. Setting blank string ("") will clear the component from the URL.
The following components can be set: url, scheme, user, password, options, host, port, path, query, fragment and zoneid.
If a simple "="-assignment is used, the data is URL encoded when applied. If ":=" is used, the data is assumed to already be URL encoded and will be stored as-is.
- --sort-query
The "variable=content" tuplets in the query component are sorted in a case insensitive alphabetical order. This helps making URLs identical that otherwise only had their query pairs in different orders.
- --url [URL]
Set the input URL to work with. The URL may be provided without a scheme, which then typically is not actually a legal URL but trurl will try to figure out what is meant and guess what scheme to use.
Providing multiple URLs will make trurl act on all URLs in a serial fashion.
If the URL cannot be parsed for whatever reason, trurl will simply move on to the next provided URL - unless --verify is used.
- --trim [component]=[what]
Trims data off a component. Currently this can only trim a query component.
"what" is specified as a full word or as a word prefix (using a single trailing asterisk ('*')) which makes trurl remove the tuples from the query string that match the instruction.
- -v, --version
Show version information and exit.
- --verify
When a URL is provided, return error immediately if it does not parse as a valid URL. In normal cases, trurl can forgive a bad URL input.
Examples
- Replace the host name of a URL
$ trurl --url https://curl.se --set host=example.com https://example.com/
- Create a URL by setting components
$ trurl --set host=example.com --set scheme=ftp ftp://example.com/
- Redirect a URL
$ trurl --url https://curl.se/we/are.html --redirect here.html https://curl.se/we/here.html
- Change port number
This also shows how trurl will remove dot-dot sequences
$ trurl --url https://curl.se/we/../are.html --set port=8080 https://curl.se:8080/are.html
- Extract the path from a URL
$ trurl --url https://curl.se/we/are.html --get '{path}' /we/are.html
- Extract the port from a URL
This gets the default port based on the scheme if the port is not set in the URL.
$ trurl --url https://curl.se/we/are.html --get '{port}' 443
- Append a path segment to a URL
$ trurl --url https://curl.se/hello --append path=you https://curl.se/hello/you
- Append a query segment to a URL
$ trurl --url "https://curl.se?name=hello" --append query=search=string https://curl.se/?name=hello&search=string
- Read URLs from stdin
$ cat urllist.txt | trurl --url-file - ...
- Output JSON
$ trurl "https://fake.host/hello#frag" --set user=::moo:: --json [ { "url": "https://%3a%3amoo%3a%3a@fake.host/hello#frag", "scheme": "https", "user": "::moo::", "host": "fake.host", "port": "443", "path": "/hello", "fragment": "frag" } ]
- Remove tracking tuples from query
$ trurl "https://curl.se?search=hey&utm_source=tracker" --trim query="utm_*" https://curl.se/?search=hey
- Show a specific query key value
$ trurl "https://example.com?a=home&here=now&thisthen" -g '{query:a}' home
- Sort the key/value pairs in the query component
$ trurl "https://example.com?b=a&c=b&a=c" --sort-query https://example.com?a=c&b=a&c=b
- Work with a query that uses a semicolon separator
$ trurl "https://curl.se?search=fool;page=5" --trim query="search" --query-separator ";" https://curl.se?page=5
- Accept spaces in the URL path
$ trurl "https://curl.se/this has space/index.html" --accept-space https://curl.se/this%20has%20space/index.html