Sympa::Tools::Text.3Sympa - Man Page

Text-related functions


This package provides some text-related functions.


addrencode ( $addr, [ $phrase, [ $charset, [ $comment ] ] ] )

Returns formatted (and encoded) name-addr as RFC5322 3.4.

canonic_email ( $email )

Function. Returns canonical form of e-mail address.

Leading and trailing white spaces are removed. Latin letters without accents are lower-cased.

For malformed inputs returns undef.

canonic_message_id ( $message_id )

Returns canonical form of message ID without trailing or leading whitespaces or <, >.

canonic_text ( $text )

Canonicalizes text. $text should be a binary string encoded by UTF-8 character set or a Unicode string. Forbidden sequences in binary string will be replaced by U+FFFD REPLACEMENT CHARACTERs, and Normalization Form C (NFC) will be applied.

clip ( $string, $length )

Function. Clips $string according to $length by bytes, considering boundary of grapheme clusters. UTF-8 is assumed for $string as bytestring.

decode_filesystem_safe ( $str )

Function. Decodes a string encoded by encode_filesystem_safe().



String to be decoded.


Decoded string, stripped utf8 flag if any.

decode_html ( $str )

Function. Decodes HTML entities in a string encoded by UTF-8 or a Unicode string.



String to be decoded.


Decoded string, stripped utf8 flag if any.

encode_filesystem_safe ( $str )

Function. Encodes a string $str to be suitable for filesystem.



String to be encoded.


Encoded string, stripped utf8 flag if any. All bytes except '-', '+', '.', '@' and alphanumeric characters are encoded to sequences '_' followed by two hexdigits.

Note that '/' will also be encoded.

encode_html ( $str, [ $additional_unsafe ] )

Function. Encodes characters in a string $str to HTML entities. By default '<', '>', '&' and '"' are encoded.



String to be encoded.


Character or range of characters additionally encoded as entity references.

This optional parameter was introduced on Sympa 6.2.37b.3.


Encoded string, not stripping utf8 flag if any.

encode_uri ( $str, [ omit => $chars ] )

Function. Encodes potentially unsafe characters in the string using "percent" encoding suitable for URIs.



String to be encoded.

omit => $chars

By default, all characters except those defined as "unreserved" in RFC 3986 are encoded, that is, [^-A-Za-z0-9._~]. If this parameter is given, it will prevent encoding additional characters.


Encoded string, stripped utf8 flag if any.

escape_chars ( $str )

Deprecated. Use "encode_filesystem_safe".

Escape weird characters.

escape_url ( $str )

DEPRECATED. Would be better to use "encode_uri" or "mailtourl".

foldcase ( $str )

Function. Returns "fold-case" string suitable for case-insensitive match. For example, a code below looks for a needle in haystack not regarding case, even if they are non-ASCII UTF-8 strings.

  $haystack = Sympa::Tools::Text::foldcase($HayStack);
  $needle   = Sympa::Tools::Text::foldcase($NeedLe);
  if (index $haystack, $needle >= 0) {



A string.

guessed_to_utf8( $text, [ lang, ... ] )

Function. Guesses text charset considering language context and returns the text reencoded by UTF-8.



Text to be reencoded.

lang, ...

Language tag(s) which may be given by "implicated_langs" in Sympa::Language.


Reencoded text. If any charsets could not be guessed, iso-8859-1 will be used as the last resort, just because it covers full range of 8-bit.

mailtourl ( $email, [ decode_html => 1 ], [ query => {key => val, ...} ] )

Function. Constructs a mailto: URL for given e-mail.



E-mail address.

decode_html => 1

If set, arguments are assumed to include HTML entities.

query => {key => val, ...}

Optional query.


Constructed URL.

pad ( $str, $width )

Pads space a string so that result will not be narrower than given width.



A string.


If $width is false value or width of $str is not less than $width, does nothing. If $width is less than 0, pads right. Otherwise, pads left.


Padded string.

permalink_id ( $message_id )

Calculates permalink ID from mesage ID.

qdecode_filename ( $filename )

Q-Decodes web file name.

ToDo: This should be obsoleted in the future release: Would be better to use "decode_filesystem_safe".

qencode_filename ( $filename )

Q-Encodes web file name.

ToDo: This should be obsoleted in the future release: Would be better to use "encode_filesystem_safe".

slurp ( $file )

Get entire content of the file. Normalization by canonic_text() is applied. $file is the path to text file.

unescape_chars ( $str )

Deprecated. Use "decode_filesystem_safe".

Unescape weird characters.

valid_email ( $string )

Basic check of an email address.

weburl ( $base, \@paths, [ decode_html => 1 ], [ fragment => $fragment ], [ query => \%query ] )

Constructs a http: or https: URL under given base URI.



Base URI.


Additional path components.

decode_html => 1

If set, arguments are assumed to include HTML entities. Exception is $base: It is assumed not to include entities.

fragment => $fragment

Optional fragment.

query => \%query

Optional query.



wrap_text ( $text, [ $init_tab, [ $subsequent_tab, [ $cols ] ] ] )

Function. Returns line-wrapped text.



The text to be folded.


Indentation prepended to the first line of paragraph. Default is '', no indentation.


Indentation prepended to each subsequent line of folded paragraph. Default is '', no indentation.


Max number of columns of folded text. Default is 78.


Sympa::Tools::Text appeared on Sympa 6.2a.41.

decode_filesystem_safe() and encode_filesystem_safe() were added on Sympa 6.2.10.

decode_html(), encode_html(), encode_uri() and mailtourl() were added on Sympa 6.2.14, and escape_url() was deprecated.

guessed_to_utf8() and pad() were added on Sympa 6.2.17.

canonic_text() and slurp() were added on Sympa 6.2.53b.

clip() was added on Sympa 6.2.61b.


2023-07-22 sympa 6.2.72