NAME HTML::Entities::ImodePictogram - encode / decode i-mode pictogram SYNOPSIS use HTML::Entities::ImodePictogram; $html = encode_pictogram($rawtext); $rawtext = decode_pictogram($html); $cleantext = remove_pictogram($rawtext); use HTML::Entities::ImodePictogram qw(find_pictogram); $num_found = find_pictogram($rawtext, \&callback); DESCRIPTION HTML::Entities::ImodePictogram handles HTML entities for i-mode pictogram (emoji), which are assigned in Shift_JIS private area. See http://www.nttdocomo.co.jp/i/tag/emoji/index.html for details about i-mode pictogram. FUNCTIONS In all functions in this module, input/output strings are asssumed as encoded in Shift_JIS. See the Jcode manpage for conversion between Shift_JIS and other encodings like EUC-JP or UTF-8. This module exports following functions by default. encode_pictogram $html = encode_pictogram($rawtext); $html = encode_pictogram($rawtext, unicode => 1); Encodes pictogram characters in raw-text into HTML entities. If $rawtext contains extended pictograms, they are encoded in Unicode format. If you add "unicode" option explicitly, all pictogram characters are encoded in Unicode format ("￿"). Otherwise, encoding is done in decimal format ("&#NNNNN;"). decode_pictogram $rawtext = decode_pictogram($html); Decodes HTML entities (both for "￿" and "&#NNNNN;") for pictogram into raw-text in Shift_JIS. remove_pictogram $cleantext = remove_pictogram($rawtext); Removes pictogram characters in raw-text. This module also exports following functions on demand. find_pictogram $num_found = find_pictorgram($rawtext, \&callback); Finds pictogram characters in raw-text and executes callback when found. It returns the total numbers of charcters found in text. The callback is given three arguments. The first is a found pictogram character itself, and the second is a decimal number which represents Shift_JIS codepoint of the character. The third is a Unicode codepoint. Whatever the callback returns will replace the original text. Here is a stub implementation of encode_pictogram(), which will be the good example for the usage of find_pictogram(). Note that this example version doesn't support extended pictograms. sub encode_pictogram { my $text = shift; find_pictogram($text, sub { my($char, $number, $cp) = @_; return '&#' . $number . ';'; }); return $text; } CAVEAT * This module works so slow, because regex used here matches "ANY" characters in the text. This is due to the difficulty of extracting character boundaries of Shift_JIS encoding. * Extended pictogram support of this module is not complete. If you handle pictogram characters in Unicode, try Encode module with perl 5.8.0, or Unicode::Japanese. AUTHOR Tatsuhiko Miyagawa This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. SEE ALSO the HTML::Entities manpage, the Unicode::Japanese manpage, http://www.nttdocomo.co.jp/p_s/imode/tag/emoji/