| 1 | NAME |
|---|
| 2 | URI::Escape::XS - Drop-In replacement for URI::Escape |
|---|
| 3 | |
|---|
| 4 | VERSION |
|---|
| 5 | $Id: XS.pm,v 0.3 2009/01/16 06:38:52 dankogai Exp dankogai $ |
|---|
| 6 | |
|---|
| 7 | SYNOPSIS |
|---|
| 8 | # use it instead of URI::Escape |
|---|
| 9 | use URI::Escape::XS qw/uri_escape uri_unescape/; |
|---|
| 10 | $safe = uri_escape("10% is enough\n"); |
|---|
| 11 | $verysafe = uri_escape("foo", "\0-\377"); |
|---|
| 12 | $str = uri_unescape($safe); |
|---|
| 13 | |
|---|
| 14 | # or use encodeURIComponent and decodeURIComponent |
|---|
| 15 | use URI::Escape::XS; |
|---|
| 16 | $safe = encodeURIComponent("10% is enough\n"); |
|---|
| 17 | $str = decodeURIComponent("10%25%20is%20enough%0A"); |
|---|
| 18 | |
|---|
| 19 | # if you have CNet::IDN::Encode installed |
|---|
| 20 | $safe = encodeURIComponentIDN("http://弾.jp/dan/") |
|---|
| 21 | $str = decodeURIComponentIDN("http:%2F%2Fxn--81t.jp%2Fdan%2F"); |
|---|
| 22 | |
|---|
| 23 | EXPORT |
|---|
| 24 | by default |
|---|
| 25 | "encodeURIComponent" and "decodeURIComponent" |
|---|
| 26 | |
|---|
| 27 | "encodeURIComponentIDN" and "decodeURIComponentIDN" if Net::IDN::Encode |
|---|
| 28 | is available |
|---|
| 29 | |
|---|
| 30 | on demand |
|---|
| 31 | "uri_escape" and "uri_unescape" |
|---|
| 32 | |
|---|
| 33 | FUNCTIONS |
|---|
| 34 | encodeURIComponent |
|---|
| 35 | Does what JavaScript's encodeURIComponent does. |
|---|
| 36 | |
|---|
| 37 | $uri = encodeURIComponent("http://www.example.com/"); |
|---|
| 38 | # http%3A%2F%2Fwww.example.com%2F |
|---|
| 39 | |
|---|
| 40 | Note you cannot customize characters to escape. If you need to do so, |
|---|
| 41 | use "uri_escape". |
|---|
| 42 | |
|---|
| 43 | decodeURIComponent |
|---|
| 44 | Does what JavaScript's decodeURIComponent does. |
|---|
| 45 | |
|---|
| 46 | $str = decodeURIComponent("http%3A%2F%2Fwww.example.com%2F"); |
|---|
| 47 | # http://www.example.com/ |
|---|
| 48 | |
|---|
| 49 | It decode not only %HH sequences but also %uHHHH sequences, with |
|---|
| 50 | surrogate pairs correctly decoded. |
|---|
| 51 | |
|---|
| 52 | $str = decodeURIComponent("%uD869%uDEB2%u5F3E%u0061"); |
|---|
| 53 | # \x{2A6B2}\x{5F3E}a |
|---|
| 54 | |
|---|
| 55 | This function UNCONDITIONALLY returns the decoded string with utf8 flag |
|---|
| 56 | off. To get utf8-decoded string, use Encode and |
|---|
| 57 | |
|---|
| 58 | decode_utf8(decodeURIComponent($uri)); |
|---|
| 59 | |
|---|
| 60 | This is the correct behavior because you cannot tell if the decoded |
|---|
| 61 | string actually contains UTF-8 decoded string, like ISO-8859-1 and |
|---|
| 62 | Shift_JIS. |
|---|
| 63 | |
|---|
| 64 | encodeURIComponentIDN |
|---|
| 65 | Same as "encodeURIComponent" except that the host part is encoded in |
|---|
| 66 | punycode. Net::IDN::Encode is required to use this function. |
|---|
| 67 | |
|---|
| 68 | URIs with Internationalizing Domain Names require two encodings: |
|---|
| 69 | Punycode for host part and URI escape for the rest. |
|---|
| 70 | |
|---|
| 71 | Currently only FULL URIs with "http:" or "https:" are supported. |
|---|
| 72 | |
|---|
| 73 | decodeURIComponentIDN |
|---|
| 74 | Same as "decodeURIComponent" except that the host part is encoded in |
|---|
| 75 | punycode. Net::IDN::Encode is required to use this function. |
|---|
| 76 | |
|---|
| 77 | uri_escape |
|---|
| 78 | Does exactly the same as URI::Escape::uri_escape() except when |
|---|
| 79 | utf8-flagged string is fed. |
|---|
| 80 | |
|---|
| 81 | URI::Escape::uri_escape() croak and urge you to "uri_escape_utf8()" but |
|---|
| 82 | it is pointless because URI itself has no such things as utf8 flag. The |
|---|
| 83 | function in this module ALWAYS TREATS the string as byte sequence. That |
|---|
| 84 | way you can safely use this function without worring about utf8 flags. |
|---|
| 85 | |
|---|
| 86 | Note this function is NOT EXPORTED by default. That way you can use |
|---|
| 87 | URI::Escape and URI::Escape::XS simultaneously. |
|---|
| 88 | |
|---|
| 89 | uri_unescape |
|---|
| 90 | Does exactly the same as URI::Escape::uri_escape() except when %uHHHH is |
|---|
| 91 | fed. |
|---|
| 92 | |
|---|
| 93 | URI::Escape::uri_unescape() simply ignores %uHHHH sequences while the |
|---|
| 94 | function in this module does decode it into the corresponding UTF-8 byte |
|---|
| 95 | sequence. |
|---|
| 96 | |
|---|
| 97 | Like uri_escape, this funciton is NOT EXPORTED by default. |
|---|
| 98 | |
|---|
| 99 | Note on the %uHHHH sequence |
|---|
| 100 | With this module the resulting strings never have the utf8 flag on. So |
|---|
| 101 | if you want to decode it to perl utf8, You have to explicitly decode via |
|---|
| 102 | Encode. Remember. URIs have always been a byte sequence, not UTF-8 |
|---|
| 103 | characters. |
|---|
| 104 | |
|---|
| 105 | If the %uHHHH sequence became standard, you could have safely told if a |
|---|
| 106 | given URI is in Unicode. But more fortunately than unfortunately, the |
|---|
| 107 | RFC proposal was rejected so you cannot tell which encoding is used just |
|---|
| 108 | by looking at the URI. |
|---|
| 109 | |
|---|
| 110 | <http://en.wikipedia.org/wiki/Percent-encoding#Non-standard_implementati |
|---|
| 111 | ons> |
|---|
| 112 | |
|---|
| 113 | I said fortunately because %uHHHH can be nasty for non-BMP characters. |
|---|
| 114 | Since each %uHHHH can hold one 16-bit value, you need a *surrogate pair* |
|---|
| 115 | to represent it if it is U+10000 and above. |
|---|
| 116 | |
|---|
| 117 | In spite of that, there are a significant number of URIs with %uHHHH |
|---|
| 118 | escapes. Therefore this module supports decoding only. |
|---|
| 119 | |
|---|
| 120 | SPEED |
|---|
| 121 | Since this module uses XS, it is really fast except for |
|---|
| 122 | uri_escape("noop"). |
|---|
| 123 | |
|---|
| 124 | Regexp which is used in URI::Escape is really fast for non-matching but |
|---|
| 125 | slows down significantly when it has to replace string. |
|---|
| 126 | |
|---|
| 127 | BENCHMARK |
|---|
| 128 | On Macbook Pro 2GHz, Perl 5.8.8. |
|---|
| 129 | |
|---|
| 130 | http://www.google.co.jp/search?q=%E5%B0%8F%E9%A3%BC%E5%BC%BE |
|---|
| 131 | ============================================================ |
|---|
| 132 | Unescape it |
|---|
| 133 | ----------- |
|---|
| 134 | U::E 58526/s -- -88% |
|---|
| 135 | U::E::XS 486968/s 732% -- |
|---|
| 136 | -------------- |
|---|
| 137 | Escape it back |
|---|
| 138 | -------------- |
|---|
| 139 | U::E 30046/s -- -78% |
|---|
| 140 | U::E::XS 136992/s 356% -- |
|---|
| 141 | |
|---|
| 142 | www.example.com |
|---|
| 143 | =============== |
|---|
| 144 | Unescape it |
|---|
| 145 | ----------- |
|---|
| 146 | Rate U::E U::E::XS |
|---|
| 147 | U::E 821972/s -- -4% |
|---|
| 148 | U::E::XS 854732/s 4% -- |
|---|
| 149 | -------------- |
|---|
| 150 | Escape it back |
|---|
| 151 | ------------- |
|---|
| 152 | U::E::XS 522969/s -- -7% |
|---|
| 153 | U::E 565112/s 8% -- |
|---|
| 154 | |
|---|
| 155 | AUTHOR |
|---|
| 156 | Dan Kogai, "<dankogai at dan.co.jp>" |
|---|
| 157 | |
|---|
| 158 | BUGS |
|---|
| 159 | Please report any bugs or feature requests to "bug-uri-escape-xs at |
|---|
| 160 | rt.cpan.org", or through the web interface at |
|---|
| 161 | <http://rt.cpan.org/NoAuth/ReportBug.html?Queue=URI-Escape-XS>. I will |
|---|
| 162 | be notified, and then you'll automatically be notified of progress on |
|---|
| 163 | your bug as I make changes. |
|---|
| 164 | |
|---|
| 165 | SUPPORT |
|---|
| 166 | You can find documentation for this module with the perldoc command. |
|---|
| 167 | |
|---|
| 168 | perldoc URI::Escape::XS |
|---|
| 169 | |
|---|
| 170 | You can also look for information at: |
|---|
| 171 | |
|---|
| 172 | * AnnoCPAN: Annotated CPAN documentation |
|---|
| 173 | |
|---|
| 174 | <http://annocpan.org/dist/URI-Escape-XS> |
|---|
| 175 | |
|---|
| 176 | * CPAN Ratings |
|---|
| 177 | |
|---|
| 178 | <http://cpanratings.perl.org/d/URI-Escape-XS> |
|---|
| 179 | |
|---|
| 180 | * RT: CPAN's request tracker |
|---|
| 181 | |
|---|
| 182 | <http://rt.cpan.org/NoAuth/Bugs.html?Dist=URI-Escape-XS> |
|---|
| 183 | |
|---|
| 184 | * Search CPAN |
|---|
| 185 | |
|---|
| 186 | <http://search.cpan.org/dist/URI-Escape-XS> |
|---|
| 187 | |
|---|
| 188 | ACKNOWLEDGEMENTS |
|---|
| 189 | Gisle Aas for URI::Escape |
|---|
| 190 | |
|---|
| 191 | Koichi Taniguchi for URI::Escape::JavaScript |
|---|
| 192 | |
|---|
| 193 | Claus F�er for Net::IDN::Encode |
|---|
| 194 | |
|---|
| 195 | COPYRIGHT & LICENSE |
|---|
| 196 | Copyright 2007-2008 Dan Kogai, all rights reserved. |
|---|
| 197 | |
|---|
| 198 | This program is free software; you can redistribute it and/or modify it |
|---|
| 199 | under the same terms as Perl itself. |
|---|
| 200 | |
|---|