urifind man page

urifind — find URIs in a document and dump them to STDOUT.


    $ urifind file


urifind is a simple script that finds URIs in one or more files (using "URI::Find"), and outputs them to to STDOUT.  That's it.

To find all the URIs in file1, use:

    $ urifind file1

To find the URIs in multiple files, simply list them as arguments:

    $ urifind file1 file2 file3

urifind will read from "STDIN" if no files are given or if a filename of "-" is specified:

    $ wget http://www.boston.com/ -O - | urifind

When multiple files are listed, urifind prefixes each found URI with the file from which it came:

    $ urifind file1 file2
    file1: http://www.boston.com/index.html
    file2: http://use.perl.org/

This can be turned on for single files with the "-p" ("prefix") switch:

    $urifind -p file3
    file1: http://fsck.com/rt/

It can also be turned off for multiple files with the "-n" ("no prefix") switch:

    $ urifind -n file1 file2

By default, URIs will be displayed in the order found; to sort them ascii-betically, use the "-s" ("sort") option.  To reverse sort them, use the "-r" ("reverse") flag ("-r" implies "-s").

    $ urifind -s file1 file2

    $ urifind -r file1 file2

Finally, urifind supports limiting the returned URIs by scheme or by arbitrary pattern, using the "-S" option (for schemes) and the "-P" option.  Both "-S" and "-P" can be specified multiple times:

    $ urifind -S mailto file1

    $ urifind -S mailto -S http file1

"-P" takes an arbitrary Perl regex.  It might need to be protected from the shell:

    $ urifind -P 's?html?' file1

    $ urifind -P '\.org\b' -S http file4

Add a "-d" to have urifind dump the refexen generated from "-S" and "-P" to "STDERR".  "-D" does the same but exits immediately:

    $ urifind -P '\.org\b' -S http -D 
    $scheme = '^(\bhttp\b):'
    @pats = ('^(\bhttp\b):', '\.org\b')

To remove duplicates from the results, use the "-u" ("unique") switch.

Option Summary


Sort results.


Reverse sort results (implies -s).


Return unique results only.


Don't include filename in output.


Include filename in output (0 by default, but 1 if multiple files are included on the command line).

-P $re

Print only lines matching regex '$re' (may be specified multiple times).

-S $scheme

Only this scheme (may be specified multiple times).


Help summary.


Display version and exit.


Dump compiled regexes for "-S" and "-P" to "STDERR".


Same as "-d", but exit after dumping.


darren chamberlain <darren@cpan.org>

See Also



2017-02-11 perl v5.24.1 User Contributed Perl Documentation