root/lang/perl/URI-Escape-XS/trunk/README @ 28505

Revision 28505, 6.4 kB (checked in by dankogai, 4 years ago)

VERSION 0.04

Line 
1NAME
2    URI::Escape::XS - Drop-In replacement for URI::Escape
3
4VERSION
5    $Id: XS.pm,v 0.3 2009/01/16 06:38:52 dankogai Exp dankogai $
6
7SYNOPSIS
8        # use it instead of URI::Escape
9        use URI::Escape::XS qw/uri_escape uri_unescape/;
10        $safe = uri_escape("10% is enough\n");
11        $verysafe = uri_escape("foo", "\0-\377");
12        $str  = uri_unescape($safe);
13
14        # or use encodeURIComponent and decodeURIComponent
15        use URI::Escape::XS;
16        $safe = encodeURIComponent("10% is enough\n");
17        $str  = decodeURIComponent("10%25%20is%20enough%0A");
18
19        # if you have CNet::IDN::Encode installed
20        $safe = encodeURIComponentIDN("http://弾.jp/dan/")
21        $str  = decodeURIComponentIDN("http:%2F%2Fxn--81t.jp%2Fdan%2F");
22
23EXPORT
24  by default
25    "encodeURIComponent" and "decodeURIComponent"
26
27    "encodeURIComponentIDN" and "decodeURIComponentIDN" if Net::IDN::Encode
28    is available
29
30  on demand
31    "uri_escape" and "uri_unescape"
32
33FUNCTIONS
34  encodeURIComponent
35    Does what JavaScript's encodeURIComponent does.
36
37      $uri = encodeURIComponent("http://www.example.com/");
38      # http%3A%2F%2Fwww.example.com%2F
39
40    Note you cannot customize characters to escape. If you need to do so,
41    use "uri_escape".
42
43  decodeURIComponent
44    Does what JavaScript's decodeURIComponent does.
45
46      $str = decodeURIComponent("http%3A%2F%2Fwww.example.com%2F");
47      # http://www.example.com/
48
49    It decode not only %HH sequences but also %uHHHH sequences, with
50    surrogate pairs correctly decoded.
51
52      $str = decodeURIComponent("%uD869%uDEB2%u5F3E%u0061");
53      # \x{2A6B2}\x{5F3E}a
54
55    This function UNCONDITIONALLY returns the decoded string with utf8 flag
56    off. To get utf8-decoded string, use Encode and
57
58      decode_utf8(decodeURIComponent($uri));
59
60    This is the correct behavior because you cannot tell if the decoded
61    string actually contains UTF-8 decoded string, like ISO-8859-1 and
62    Shift_JIS.
63
64  encodeURIComponentIDN
65    Same as "encodeURIComponent" except that the host part is encoded in
66    punycode. Net::IDN::Encode is required to use this function.
67
68    URIs with Internationalizing Domain Names require two encodings:
69    Punycode for host part and URI escape for the rest.
70
71    Currently only FULL URIs with "http:" or "https:" are supported.
72
73  decodeURIComponentIDN
74    Same as "decodeURIComponent" except that the host part is encoded in
75    punycode. Net::IDN::Encode is required to use this function.
76
77  uri_escape
78    Does exactly the same as URI::Escape::uri_escape() except when
79    utf8-flagged string is fed.
80
81    URI::Escape::uri_escape() croak and urge you to "uri_escape_utf8()" but
82    it is pointless because URI itself has no such things as utf8 flag. The
83    function in this module ALWAYS TREATS the string as byte sequence. That
84    way you can safely use this function without worring about utf8 flags.
85
86    Note this function is NOT EXPORTED by default. That way you can use
87    URI::Escape and URI::Escape::XS simultaneously.
88
89  uri_unescape
90    Does exactly the same as URI::Escape::uri_escape() except when %uHHHH is
91    fed.
92
93    URI::Escape::uri_unescape() simply ignores %uHHHH sequences while the
94    function in this module does decode it into the corresponding UTF-8 byte
95    sequence.
96
97    Like uri_escape, this funciton is NOT EXPORTED by default.
98
99  Note on the %uHHHH sequence
100    With this module the resulting strings never have the utf8 flag on. So
101    if you want to decode it to perl utf8, You have to explicitly decode via
102    Encode. Remember. URIs have always been a byte sequence, not UTF-8
103    characters.
104
105    If the %uHHHH sequence became standard, you could have safely told if a
106    given URI is in Unicode. But more fortunately than unfortunately, the
107    RFC proposal was rejected so you cannot tell which encoding is used just
108    by looking at the URI.
109
110    <http://en.wikipedia.org/wiki/Percent-encoding#Non-standard_implementati
111    ons>
112
113    I said fortunately because %uHHHH can be nasty for non-BMP characters.
114    Since each %uHHHH can hold one 16-bit value, you need a *surrogate pair*
115    to represent it if it is U+10000 and above.
116
117    In spite of that, there are a significant number of URIs with %uHHHH
118    escapes. Therefore this module supports decoding only.
119
120SPEED
121    Since this module uses XS, it is really fast except for
122    uri_escape("noop").
123
124    Regexp which is used in URI::Escape is really fast for non-matching but
125    slows down significantly when it has to replace string.
126
127  BENCHMARK
128    On Macbook Pro 2GHz, Perl 5.8.8.
129
130     http://www.google.co.jp/search?q=%E5%B0%8F%E9%A3%BC%E5%BC%BE
131     ============================================================
132     Unescape it
133     -----------
134     U::E      58526/s       --     -88%
135     U::E::XS 486968/s     732%       --
136     --------------
137     Escape it back
138     --------------
139     U::E      30046/s       --     -78%
140     U::E::XS 136992/s     356%       --
141
142     www.example.com
143     ===============
144     Unescape it
145     -----------
146                   Rate     U::E U::E::XS
147      U::E     821972/s       --      -4%
148      U::E::XS 854732/s       4%       --
149     --------------
150     Escape it back
151     -------------
152     U::E::XS 522969/s       --      -7%
153     U::E     565112/s       8%       --
154
155AUTHOR
156    Dan Kogai, "<dankogai at dan.co.jp>"
157
158BUGS
159    Please report any bugs or feature requests to "bug-uri-escape-xs at
160    rt.cpan.org", or through the web interface at
161    <http://rt.cpan.org/NoAuth/ReportBug.html?Queue=URI-Escape-XS>. I will
162    be notified, and then you'll automatically be notified of progress on
163    your bug as I make changes.
164
165SUPPORT
166    You can find documentation for this module with the perldoc command.
167
168        perldoc URI::Escape::XS
169
170    You can also look for information at:
171
172    *   AnnoCPAN: Annotated CPAN documentation
173
174        <http://annocpan.org/dist/URI-Escape-XS>
175
176    *   CPAN Ratings
177
178        <http://cpanratings.perl.org/d/URI-Escape-XS>
179
180    *   RT: CPAN's request tracker
181
182        <http://rt.cpan.org/NoAuth/Bugs.html?Dist=URI-Escape-XS>
183
184    *   Search CPAN
185
186        <http://search.cpan.org/dist/URI-Escape-XS>
187
188ACKNOWLEDGEMENTS
189    Gisle Aas for URI::Escape
190
191    Koichi Taniguchi for URI::Escape::JavaScript
192
193    Claus F�er for Net::IDN::Encode
194
195COPYRIGHT & LICENSE
196    Copyright 2007-2008 Dan Kogai, all rights reserved.
197
198    This program is free software; you can redistribute it and/or modify it
199    under the same terms as Perl itself.
200
Note: See TracBrowser for help on using the browser.