root/lang/perl/Encode-JP-Mobile/trunk/lib/Encode/JP/Mobile.pm @ 4705

Revision 4705, 8.0 kB (checked in by miyagawa, 5 years ago)

add cp932.ucm to the repository for convenience

  • Property svn:keywords set to Id Revision
Line 
1package Encode::JP::Mobile;
2our $VERSION = "0.15";
3
4use Encode;
5use XSLoader;
6XSLoader::load(__PACKAGE__, $VERSION);
7
8use base qw( Exporter );
9@EXPORT_OK = qw( InDoCoMoPictograms InKDDIPictograms InSoftBankPictograms InAirEdgePictograms InMobileJPPictograms );
10%EXPORT_TAGS = ( props => [@EXPORT_OK] );
11
12use Encode::Alias;
13# sjis
14define_alias('x-sjis-docomo' => 'x-sjis-imode');
15define_alias('x-sjis-ezweb' => 'x-sjis-kddi');
16define_alias('x-sjis-ezweb-auto' => 'x-sjis-kddi-auto');
17define_alias('x-sjis-airedge' => 'cp932');
18define_alias('x-sjis-airh' => 'cp932');
19define_alias('x-sjis-vodafone-auto' => 'x-sjis-softbank-auto');
20
21# backward compatiblity
22define_alias('shift_jis-imode' => 'x-sjis-imode');
23define_alias('shift_jis-kddi' => 'x-sjis-kddi');
24define_alias('shift_jis-kddi-auto' => 'x-sjis-kddi-auto');
25define_alias('shift_jis-airedge' => 'cp932');
26define_alias('shift_jis-docomo' => 'x-sjis-imode');
27define_alias('shift_jis-ezweb' => 'x-sjis-kddi');
28define_alias('shift_jis-ezweb-auto' => 'x-sjis-kddi-auto');
29define_alias('shift_jis-airh' => 'cp932');
30
31# utf8
32define_alias( 'x-utf8-imode'    => 'x-utf8-docomo' );
33define_alias( 'x-utf8-ezweb'    => 'x-utf8-kddi' );
34define_alias( 'x-utf8-vodafone' => 'x-utf8-softbank' );
35
36use Encode::JP::Mobile::Vodafone;
37use Encode::JP::Mobile::KDDIJIS;
38
39sub InDoCoMoPictograms {
40    return <<END;
41E63E\tE6A5
42E6AC\tE6AE
43E6B1\tE6B3
44E6B7\tE6BA
45E6CE\tE757
46END
47}
48
49sub InKDDIPictograms {
50    return <<END;
51E468\tE5DF
52EA80\tEB88
53EC40\tEC7E
54EC80\tECFC
55ED40\tED8D
56EF40\tEF7E
57EF80\tEFFC
58F040\tF07E
59F080\tF0FC
60END
61}
62
63sub InSoftBankPictograms {
64    return <<END;
65E001\tE05A
66E101\tE15A
67E201\tE253
68E255\tE257
69E301\tE34D
70E401\tE44C
71E501\tE537
72END
73}
74
75sub InAirEdgePictograms {
76    return <<END;
77E000\tE096
78E098
79E09A
80E09F
81E0A2
82E0A6
83E0A8
84E0AF
85E0BB
86E0C4
87E0C9
88END
89}
90
91sub InMobileJPPictograms {
92    # +utf8::InDoCoMoPictograms etc. don't work here
93    return join "\n", InDoCoMoPictograms, InKDDIPictograms, InSoftBankPictograms, InAirEdgePictograms;
94}
95
961;
97__END__
98
99=head1 NAME
100
101Encode::JP::Mobile - Shift_JIS (CP932) variants of Japanese cellphone pictograms
102
103=head1 SYNOPSIS
104
105  use Encode;
106  use Encode::JP::Mobile;
107
108  my $bytes = "\x82\xb1\xf9\x5d\xf8\xa0\x82\xb1"; # Shift_JIS bytes containing NTT DoCoMo pictograms
109  my $chars = decode("x-sjis-imode", $bytes);     # \x{3053}\x{e6b9}\x{e63f}\x{3053}
110
111  use Encode::JP::Mobile ':props';
112  if ($chars =~ /\p{InDoCoMoPictograms}/) {
113      warn "It has DoCoMo pictogram characters!";
114  }
115
116=head1 DESCRIPTION
117
118Encode::JP::Mobile is an Encode module to support Shift_JIS (CP032)
119extended characters mapped in Unicode Private Area.
120
121This module is B<EXPERIMENTAL>. That means API and implementations
122will sometimge be backward incompatible.
123
124=head1 ENCODINGS
125
126This module currently supports the following encodings.
127
128=over 4
129
130=item x-sjis-imode
131
132Mapping for NTT DoCoMo i-mode handsets. Pictograms are mapped in
133Shift_JIS private area and Unicode private area. The conversion rule
134is equivalent to that of cp932.
135
136For example, C<U+E64E> is I<Fine> character (or I<The Sun>) and is
137encoded as C<\xF8\x9F> in this encoding.
138
139This encoding is a subset of cp932 encoding, but has a reverse mapping
140from KDDI/AU Unicode private area characters to DoCoMo pictogram
141encodings. For example,
142
143  my $kddi  = "\xf6\x59"; # [!] in KDDI/AU
144  my $char  = decode("x-sjis-kddi", $bytes); # \x{E481}
145  my $imode = encode("x-sjis-imode", $char); # \xf9\xdc -- [!] in DoCoMo
146
147I<x-sjis-docomo> is an alias.
148
149=item x-sjis-softbank
150
151Escape sequence based Shift_JIS encoding for SoftBank
152pictograms. Decoding algorithm is not based on an ucm file, but a perl
153code.
154
155I<x-sjis-vodafone> is an alias.
156
157For example, C<U+E001> is I<A Boy> character and is encoded
158as C<\x1b$G!\x0f> in this encoding (C<\x1b$G> is the beginning of
159escape sequence and C<\x0f> is the end.)
160
161=item x-sjis-softbank-auto
162
163Maps Unicode private area characters to Shift_JIS private area (Gaiji)
164characters. This encoding is used in 3GC phones when you input
165pictogram charaters in a web form on Shift_JIS pages and submit.
166Handsets also can decode these encodings and display pictogram characters.
167
168I<x-sjis-vodafone-auto> is an alias.
169
170The private area mapping seems similar to CP932 but with a bit of
171offset.
172
173For example, U<+E001> is I<A Boy> character (same as
174I<x-sjis-softbank>) and is encoded as I<\xF9\x41>.
175
176=item x-sjis-kddi
177
178Mapping for KDDI/AU pictograms. It's based on cp932 (I guess) but
179there are more private characters that are not included in CP932.TXT.
180
181For example, I<U+E481> is I<!> (the exclamation) character and is
182encoded as I<\xF6\x59> (same as cp932). I<U+EB88> is I<Angry>
183character and is encoded in I<\xF4\x8D> while cp932 doesn't have a map
184for it.
185
186I<x-sjis-ezweb> is an alias.
187
188=item x-sjis-kddi-auto
189
190Mapping for KDDI/AU pictograms, based on handset's internal Shift_JIS
191to UTF-8 translations and vice verca. When you input some pictogram
192characters in a web form on a UTF-8 page and submit them, this mapping
193is used (instead of CP932 based I<x-sjis-kddi>) to represent the
194pictogram characters.
195
196I<x-sjis-kddi-auto> and I<x-sjis-kddi> shares Unicode to encoding
197mapping each other and hence round-trip safe, which means:
198
199  my $bytes = "\xf6\x59";                 # [!] in KDDI/AU
200  decode("x-sjis-kddi", $bytes);          # \x{E481}
201  decode("x-sjis-kddi-auto", $bytes);     # \x{EF59}
202  encode("x-sjis-kddi", "\x{EF59}");      # same as $bytes
203  encode("x-sjis-kddi-auto", "\x{E481}"); # same as $bytes
204
205C<x-sjis-ezweb-auto> is an alias.
206
207=item x-iso-2022-jp-kddi
208
209Encoding used to encode KDDI/AU pictogram characters in Email. It's
210based on I<iso-2022-jp> which is still a de-facto standard encoding
211when we sned emails.
212
213Actually most KDDI/AU cellphones can receive emails encoded in
214Shift_JIS, so you can just use I<x-sjis-kddi> to encode the pictogram
215characters. This encoding might be still needed to decode incoming
216emails sent from KDDI/AU phones containing pictogram characters.
217
218C<x-iso-2022-jp-ezweb> is an alias.
219
220=item x-sjis-airedge
221
222Mapping for AirEDGE pictograms. It's a complete subset of cp932C<x-sjis-airh> is an alias.
223
224=back
225
226=head1 UNICODE PROPERTIES
227
228By importing this module with ':props' flag, you'll have following Unicode properties.
229
230=over 4
231
232=item InDoCoMoPictograms
233
234=item InKDDIPictograms
235
236=item InSoftBankPictograms
237
238=item InAirEdgePictograms
239
240=back
241
242Note that if the input is one of x-sjis-* variants, first you need to
243know what encoding the bytes are encoded, and decode the bytes back to
244Unicode, to know if the strings contain these pictogram character
245sets. So it might be only handy if the input is UTF-8 in reality.
246
247
248=head1 BACKWARD COMPATIBLITY
249
250As of 0.07, this module now uses I<x-sjis-*> as its encoding names. It
251still supports the old I<shift_jis-*> aliases though. I'm planning to
252deprecate them sometime in the future release.
253
254=head1 NOTES
255
256=over 4
257
258=item *
259
260Pictogram characters are defined to be round-trip safe. However, they
261use Unicode Private Area for such characters, that means you'll have
262interoperability issues, which this module doesn't try yet to solve
263completely. We have a partial support for roundtrip (automatic
264conversion) between I<x-sjis-imode> and I<x-sjis-kddi>.
265
266=item *
267
268As of version 0.04, this module tries to do auto-conversion of KDDI/AU
269and NTT-DoCoMo pictogram characters. Supporting SoftBank characters
270are still left TODO.
271
272=back
273
274=head1 TODO
275
276=over 4
277
278=item *
279
280Implement all merged C<x-sjis-mobile-jp> encoding.
281
282=back
283
284=head1 AUTHORS
285
286Tatsuhiko Miyagawa E<lt>miyagawa@bulknews.netE<gt> with contributions from:
287
288Tokuhiro Matsuno
289
290Naoki Tomita
291
292Masahiro Chiba
293
294=head1 LICENSE
295
296This library is free software, licensed under the same terms with Perl.
297
298=head1 SEE ALSO
299
300L<Encode>, L<HTML::Entities::ImodePictogram>, L<Unicode::Japanese>
301
302http://www.nttdocomo.co.jp/service/imode/make/content/pictograph/basic/
303http://www.nttdocomo.co.jp/service/imode/make/content/pictograph/extention/
304http://www.au.kddi.com/ezfactory/tec/spec/3.html
305http://developers.softbankmobile.co.jp/dp/tool_dl/web/picword_top.php
306http://www.willcom-inc.com/ja/service/contents_service/club_air_edge/for_phone/homepage/index.html
307http://www.nttdocomo.co.jp/service/mail/imode_mail/emoji_convert/
308
309=cut
Note: See TracBrowser for help on using the browser.