| 1 | NAME |
|---|
| 2 | URI::Escape::XS - Drop-In replacement for URI::Escape |
|---|
| 3 | |
|---|
| 4 | VERSION |
|---|
| 5 | $Id: README,v 0.2 2008/05/30 23:53:13 dankogai Exp $ |
|---|
| 6 | |
|---|
| 7 | SYNOPSIS |
|---|
| 8 | # use it instead of URI::Escape |
|---|
| 9 | use URI::Escape::XS qw/uri_escape uri_unescape/; |
|---|
| 10 | $safe = uri_escape("10% is enough\n"); |
|---|
| 11 | $verysafe = uri_escape("foo", "\0-377); |
|---|
| 12 | $str = uri_unescape($safe); |
|---|
| 13 | |
|---|
| 14 | # or use encodeURIComponent and decodeURIComponent |
|---|
| 15 | use URI::Escape::XS; |
|---|
| 16 | $safe = encodeURIComponent("10% is enough\n"); |
|---|
| 17 | $str = decodeURIComponent("10%25%20is%20enough%0A"); |
|---|
| 18 | |
|---|
| 19 | EXPORT |
|---|
| 20 | by default |
|---|
| 21 | "encodeURIComponent" and "decodeURIComponent" |
|---|
| 22 | |
|---|
| 23 | on demand |
|---|
| 24 | "uri_escape" and "uri_unescape" |
|---|
| 25 | |
|---|
| 26 | FUNCTIONS |
|---|
| 27 | encodeURIComponent |
|---|
| 28 | Does what JavaScript's encodeURIComponent does. |
|---|
| 29 | |
|---|
| 30 | $uri = encodeURIComponent("http://www.example.com/"); |
|---|
| 31 | # http%3A%2F%2Fwww.example.com%2F |
|---|
| 32 | |
|---|
| 33 | Note you cannot customize characters to escape. If you need to do so, |
|---|
| 34 | use "uri_escape". |
|---|
| 35 | |
|---|
| 36 | decodeURIComponent |
|---|
| 37 | Does what JavaScript's decodeURIComponent does. |
|---|
| 38 | |
|---|
| 39 | $str = decodeURIComponent("http%3A%2F%2Fwww.example.com%2F"); |
|---|
| 40 | # http://www.example.com/ |
|---|
| 41 | |
|---|
| 42 | It decode not only %HH sequences but also %uHHHH sequences, with |
|---|
| 43 | surrogate pairs correctly decoded. |
|---|
| 44 | |
|---|
| 45 | $str = decodeURIComponent("%uD869%uDEB2%u5F3E%u0061"); |
|---|
| 46 | # \x{2A6B2}\x{5F3E}a |
|---|
| 47 | |
|---|
| 48 | This function UNCONDITIONALLY returns the decoded string with utf8 flag |
|---|
| 49 | off. To get utf8-decoded string, use Encode and |
|---|
| 50 | |
|---|
| 51 | decode_utf8(decodeURIComponent($uri)); |
|---|
| 52 | |
|---|
| 53 | This is the correct behavior because you can't tell if the decoded |
|---|
| 54 | string actually contains UTF-8 decoded string, like ISO-8859-1 and |
|---|
| 55 | Shift_JIS. |
|---|
| 56 | |
|---|
| 57 | uri_escape |
|---|
| 58 | Does exactly the same as URI::Escape::uri_escape() except when |
|---|
| 59 | utf8-flagged string is fed. |
|---|
| 60 | |
|---|
| 61 | URI::Escape::uri_escape() croak and urge you to "uri_escape_utf8()" but |
|---|
| 62 | it is pointless because URI itself has no such things as utf8 flag. The |
|---|
| 63 | function in this module ALWAYS TREATS the string as byte sequence. That |
|---|
| 64 | way you can safely use this function without worring about utf8 flags. |
|---|
| 65 | |
|---|
| 66 | Note this function is NOT EXPORTED by default. That way you can use |
|---|
| 67 | URI::Escape and URI::Escape::XS simultaneously. |
|---|
| 68 | |
|---|
| 69 | uri_unescape |
|---|
| 70 | Does exactly the same as URI::Escape::uri_escape() except when %uHHHH is |
|---|
| 71 | fed. |
|---|
| 72 | |
|---|
| 73 | URI::Escape::uri_unescape() simply ignores %uHHHH sequences while the |
|---|
| 74 | function in this module does decode it into the corresponding UTF-8 byte |
|---|
| 75 | sequence. |
|---|
| 76 | |
|---|
| 77 | Like uri_escape, this funciton is NOT EXPORTED by default. |
|---|
| 78 | |
|---|
| 79 | Note on the %uHHHH sequence |
|---|
| 80 | With this module the resulting strings never have the utf8 flag on. So |
|---|
| 81 | if you want to decode it to perl utf8, You have to explicitly decode via |
|---|
| 82 | Encode. Remember. URIs have always been a byte sequence, not UTF-8 |
|---|
| 83 | characters. |
|---|
| 84 | |
|---|
| 85 | If %uHHHH sequence became standard, you could've safely told if a given |
|---|
| 86 | URI is in Unicode. But more fortunately than unfortunately, the RFC |
|---|
| 87 | proposal was rejected so you can't tell which encoding is used just by |
|---|
| 88 | looking at the URI. |
|---|
| 89 | |
|---|
| 90 | <http://en.wikipedia.org/wiki/Percent-encoding#Non-standard_implementati |
|---|
| 91 | ons> |
|---|
| 92 | |
|---|
| 93 | I said fortunately because %uHHHH can be nasty for non-BMP characters. |
|---|
| 94 | Since each %uHHHH can hold one 16-bit value, you need a *surrogate pair* |
|---|
| 95 | to represent it if it is U+10000 and above. |
|---|
| 96 | |
|---|
| 97 | In spite of that, there are a significant number of URIs with %uHHHH |
|---|
| 98 | escapes. Therefore this module supports decoding only. |
|---|
| 99 | |
|---|
| 100 | SPEED |
|---|
| 101 | Since this module uses XS, it is really fast except for |
|---|
| 102 | uri_escape("noop"). |
|---|
| 103 | |
|---|
| 104 | Regexp which is used in URI::Escape is really fast for non-matching but |
|---|
| 105 | slows down significantly when it has to replace string. |
|---|
| 106 | |
|---|
| 107 | BENCHMARK |
|---|
| 108 | On Macbook Pro 2GHz, Perl 5.8.8. |
|---|
| 109 | |
|---|
| 110 | http://www.google.co.jp/search?q=%E5%B0%8F%E9%A3%BC%E5%BC%BE |
|---|
| 111 | ============================================================ |
|---|
| 112 | Unescape it |
|---|
| 113 | ----------- |
|---|
| 114 | U::E 58526/s -- -88% |
|---|
| 115 | U::E::XS 486968/s 732% -- |
|---|
| 116 | -------------- |
|---|
| 117 | Escape it back |
|---|
| 118 | -------------- |
|---|
| 119 | U::E 30046/s -- -78% |
|---|
| 120 | U::E::XS 136992/s 356% -- |
|---|
| 121 | |
|---|
| 122 | www.example.com |
|---|
| 123 | =============== |
|---|
| 124 | Unescape it |
|---|
| 125 | ----------- |
|---|
| 126 | Rate U::E U::E::XS |
|---|
| 127 | U::E 821972/s -- -4% |
|---|
| 128 | U::E::XS 854732/s 4% -- |
|---|
| 129 | -------------- |
|---|
| 130 | Escape it back |
|---|
| 131 | ------------- |
|---|
| 132 | U::E::XS 522969/s -- -7% |
|---|
| 133 | U::E 565112/s 8% -- |
|---|
| 134 | |
|---|
| 135 | AUTHOR |
|---|
| 136 | Dan Kogai, "<dankogai at dan.co.jp>" |
|---|
| 137 | |
|---|
| 138 | BUGS |
|---|
| 139 | Please report any bugs or feature requests to "bug-uri-escape-xs at |
|---|
| 140 | rt.cpan.org", or through the web interface at |
|---|
| 141 | <http://rt.cpan.org/NoAuth/ReportBug.html?Queue=URI-Escape-XS>. I will |
|---|
| 142 | be notified, and then you'll automatically be notified of progress on |
|---|
| 143 | your bug as I make changes. |
|---|
| 144 | |
|---|
| 145 | SUPPORT |
|---|
| 146 | You can find documentation for this module with the perldoc command. |
|---|
| 147 | |
|---|
| 148 | perldoc URI::Escape::XS |
|---|
| 149 | |
|---|
| 150 | You can also look for information at: |
|---|
| 151 | |
|---|
| 152 | * AnnoCPAN: Annotated CPAN documentation |
|---|
| 153 | <http://annocpan.org/dist/URI-Escape-XS> |
|---|
| 154 | |
|---|
| 155 | * CPAN Ratings |
|---|
| 156 | <http://cpanratings.perl.org/d/URI-Escape-XS> |
|---|
| 157 | |
|---|
| 158 | * RT: CPAN's request tracker |
|---|
| 159 | <http://rt.cpan.org/NoAuth/Bugs.html?Dist=URI-Escape-XS> |
|---|
| 160 | |
|---|
| 161 | * Search CPAN |
|---|
| 162 | <http://search.cpan.org/dist/URI-Escape-XS> |
|---|
| 163 | |
|---|
| 164 | ACKNOWLEDGEMENTS |
|---|
| 165 | Gisle Aas for URI::Escape |
|---|
| 166 | |
|---|
| 167 | Koichi Taniguchi for URI::Escape::JavaScript |
|---|
| 168 | |
|---|
| 169 | COPYRIGHT & LICENSE |
|---|
| 170 | Copyright 2007 Dan Kogai, all rights reserved. |
|---|
| 171 | |
|---|
| 172 | This program is free software; you can redistribute it and/or modify it |
|---|
| 173 | under the same terms as Perl itself. |
|---|
| 174 | |
|---|