libextractor-extract - Man Page

determine meta-information about a file

Synopsis

extract [ -bgihLmnvV ] [ -l library ] [ -p type ] [ -x type ] file ...

Description

This manual page documents version 1.0.0 of the extract command.

extract tests each file specified in the argument list in an attempt to infer meta-information from it. Each file is subjected to the meta-data extraction libraries from libextractor.

libextractor classifies meta-information (also referred to as keywords) into types. A list of all types can be obtained with the -L option.

Options

-b: Display the output in BiBTeX format.
-g: Use grep-friendly output (all keywords on a single line for each file). Use the verbose option to print the filename first, followed by the keywords. Use the verbose option twice to also display the keyword types. This option will not print keyword types or non-textual metadata.
-h: Print a brief summary of the options.
-i: Run plugins in-process (for debugging). By default, each plugin is run in its own process.
-l libraries: Use the specified libraries to extract keywords. The general format of libraries is .I [[-]LIBRARYNAME[:[-]LIBRARYNAME]*] where LIBRARYNAME is a libextractor compatible library and typically of the form .Ijpeg. The minus before the libraryname indicates that this library should be removed from the existing list. To run only a few selected plugins, use -l in combination with -n.
-L: Print a list of all known keyword types.
-m: Load the file into memory and perform extraction from memory (for debugging).
-n: Do not use the default set of extractors (typically all standard extractors, currently mp3, ogg, jpg, gif, png, tiff, real, html, pdf and mime-types), use only the extractors specified with the .B -l option.
-p type: Print only the keywords matching the specified type. By default, all keywords that are found and not removed as duplicates are printed.
-v: Print the version number and exit.
-V: Be verbose. This option can be specified multiple times to increase verbosity further.
-x type: Exclude keywords of the specified type from the output. By default, all keywords that are found and not removed as duplicates are printed.

Examples

$ extract test/test.jpg
comment - (C) 2001 by Christian Grothoff, using gimp 1.2 1
mimetype - image/jpeg

$ extract -V -x comment test/test.jpg
Keywords for file test/test.jpg:
mimetype - image/jpeg

$ extract -p comment test/test.jpg
comment - (C) 2001 by Christian Grothoff, using gimp 1.2 1

$ extract -nV -l png.so -p comment test/test.jpg test/test.png
Keywords for file test/test.jpg:
Keywords for file test/test.png:
comment - Testing keyword extraction

Legal Notice

libextractor and the extract tool are released under the GPL. libextractor is a GNU package.

Bugs

A couple of file-formats (on the order of 10^3) are not recognized...

Authors

extract was originally written by Christian Grothoff <christian@grothoff.org> and Vidyut Samanta <vids@cs.ucla.edu>. Use <libextractor@gnu.org> to contact the current maintainer(s).

Availability

You can obtain the original author's latest version from http://www.gnu.org/software/libextractor/

Info

Aug 7, 2012 libextractor 1.0.0