What Is URL Percent-Encoding? Core Rules of RFC 3986
URL percent-encoding was originally proposed by Tim Berners-Lee in 1994 for the URI specification, and was later standardized as RFC 3986 (Uniform Resource Identifier, Generic Syntax). It solves a very practical problem: URLs only allow "unreserved" ASCII characters to appear literally — everything else — Chinese characters, emoji, spaces, special symbols — must be "escaped".
A URL can be abstracted into the following components:
scheme:[//authority]path[?query][#fragment]
Each component has its own allowed character set, and percent-encoding is the "universal interface" between them: any character can be written as %XX, where XX is the hexadecimal representation of that character's UTF-8 byte (or multiple %XX groups for multi-byte characters).
Examples:
- A space " " is encoded as %20; for historical application/x-www-form-urlencoded forms, it may also appear as +.
- The characters "土豆" in UTF-8 are 0xE5 0x9C 0x9F 0xE8 0xB1 0x86, hence they encode as %E5%9C%9F%E8%B1%86.
- The reserved character & encodes as %26.
With our tool, you can paste any text, choose "Encode" or "Decode", and instantly see its percent-encoded form or original characters.
Reserved vs Unreserved Characters: What Must Be Encoded
RFC 3986 divides characters into two categories. Understanding them helps you decide when to encode.
1. Unreserved Characters — Never require encoding
A-Z a-z 0-9 - . _ ~
These can appear literally anywhere in a URL. Encoding them is legal but produces a fully equivalent result — no semantic difference.
2. Reserved Characters — Must be encoded when used as data
There are two sub-groups:
- Generic Delimiters (separate URL layers): : / ? # [ ] @
- Sub-Delimiters (separate internal parameter structure): ! $ & ' ( ) * + , ; =
When they serve as delimiters, they must remain literal; when they appear inside data values, they must be encoded. For example:
- In ?a=1&b=2, & is a parameter separator and must stay literal.
- In ?q=Tom%26Jerry, the user's "&" is part of the value and must be encoded as %26; otherwise the server would read it as an extra parameter.
Rule of thumb: when building URL parameter values, always use encodeURIComponent and let the browser / server decide what to decode.
encodeURI vs encodeURIComponent — The Right Choice
Browsers expose two encoding functions, and they are not interchangeable. Here is the comparison:
1. encodeURI
Purpose: encode a complete URL. It preserves all "semantically meaningful" characters, including: : / ? # [ ] @ ! $ & ' ( ) * + , ; = - . _ ~.
Typical scenario: you have a URL containing non-ASCII characters or emoji, and you want to turn it into pure ASCII that browsers can open. E.g. turning https://示例.com/搜索 土豆 into https://%E7%A4%BA%E4%BE%8B.com/%E6%90%9C%E7%B4%A2%20%E5%9C%9F%E8%B1%86.
2. encodeURIComponent
Purpose: encode a single parameter value or path segment. It encodes every reserved character, leaving only the unreserved subset (letters, digits, - . _ ~ ! ~ * ' ( )).
Typical scenario: when building ?keyword=user-input, redirect_uri=target-address, etc. — always encode the "value" with encodeURIComponent.
Counterparts on the server side:
- Node.js: decodeURIComponent(...) or querystring.unescape;
- Python: urllib.parse.unquote / unquote_plus;
- Java: URLDecoder.decode(value, StandardCharsets.UTF_8);
- PHP: urldecode / rawurldecode.
Note: most server-side frameworks perform one urldecode pass automatically on query strings. Do NOT decode manually on the client side again — otherwise "double-decoding" attacks become possible: %2526 → %26 → &, allowing attackers to potentially bypass WAF rules.
Common Encoding Pitfalls & Debugging Techniques
The following mistakes are the most common in production. When you face "links that won't open, lost parameters, garbled characters", check this list first.
Pitfall 1: Running the whole URL through encodeURIComponent
The result is that :, /, ? all get encoded as %3A %2F %3F, turning the URL into a long hex string that the browser cannot resolve. Correct approach: only encode the "parameter values" and "path segments" individually, then assemble them with template strings.
Pitfall 2: Double-Encoding
The classic symptom: "Tom & Jerry" → "Tom%20%26%20Jerry" → then someone encodes it again → "Tom%2520%2526%2520Jerry". After the server performs its single decode, it still sees percent-signs in the value. Solution: encode exactly once, in the layer closest to the user input.
Pitfall 3: Spaces encoded as + instead of %20
+ is a legacy convention of application/x-www-form-urlencoded. Strictly speaking, it is not part of RFC 3986. In paths, general query strings, or OAuth signatures, spaces must use %20. When debugging APIs, always check the raw query string in browser DevTools, not the human-friendly preview.
Pitfall 4: Using decodeURI on encodeURIComponent output
decodeURI expects a "valid-looking URL". If the string contains encoded reserved characters like %2F or %3F, decodeURI will throw a URIError. Always use decodeURIComponent to decode values produced by encodeURIComponent.
Debugging Checklist
- Copy the URL from the browser address bar into our online tool, pick "Decode", and check whether the raw values match expectations;
- In Chrome DevTools' Network panel, compare "Query String Parameters — view parsed" versus "view encoded" to spot double-encoding;
- For OAuth 1.0a, URL-Safe Base64, and similar special scenarios, be aware that they use their own "URL-Safe" tables — e.g. + → -, / → _. These are different rules from standard percent-encoding.
Real-World Scenarios: APIs, OAuth and Deep Links
Here are several scenarios where correct URL encoding is critical in production systems.
Scenario 1: REST API Query Parameters
When a user searches for "Tom & Jerry", simple string concatenation would result in ?q=Tom&Jerry. The server would read q as only "Tom" and treat "Jerry" as a spurious extra parameter. Correct form:
const url = `/search?q=${encodeURIComponent(keyword)}`;
Scenario 2: OAuth 2.0 redirect_uri
Authorization servers require the redirect_uri to match the registered value byte-for-byte. The client must run the redirect_uri through encodeURIComponent itself, so it appears in the query string like https%3A%2F%2Fapp.example.com%2Fcallback. A common bug: forgetting this step causes "invalid_redirect_uri" from the authorization server.
Scenario 3: Mobile Deep Links
When a WebView navigates to myapp://product?id=123&ref=search, any spaces, ampersands or non-ASCII characters inside "ref" must be individually encoded; otherwise iOS / Android parsers will truncate at the first literal &. Typical pattern: encodeURIComponent each field value, join with &, then append to the scheme.
Scenario 4: Open-Redirect & Phishing
An often-overlooked risk: if a server accepts a redirect parameter without validation, an attacker can craft ?redirect=//evil.com and forward users to a phishing site. Encoding alone does not solve this, but correct encoding combined with a strict host whitelist (only allow redirects to trusted domains) is the industry recommendation.
Multilingual Characters, Symbols and the UTF-8 Effect
In modern URL specifications, characters are always encoded in UTF-8. Most programming languages and frameworks use this by default, but you still need to watch the following details.
1. Non-ASCII characters are always encoded
All Chinese characters, Japanese kana, emoji, Greek letters, diacritics, etc. must be written as %XX%XX... groups. Examples:
- "你好" → %E4%BD%A0%E5%A5%BD
- emoji "🎉" → %F0%9F%8E%89 (4 bytes → 4 %XX groups)
2. Length ≠ Character Count
A single Chinese character expands to 9 URL characters (3 bytes × 3 chars per byte: e.g. %E4%BD%A0). In constrained URL-length environments (such as the legacy 2083-char IE limit, or WAF header-length limits), long non-ASCII parameters can cause truncation. Recommendations:
- Transfer extremely long parameters in a POST body instead;
- When a URL is the only option, base64-encode the string first, then percent-encode;
- Always verify the server-side max-query-string configuration.
3. Internationalized Domain Names (IDN / Punycode)
When a domain contains non-ASCII characters (e.g. 示例.com), browsers internally convert it to Punycode form xn--fsq668b.com before performing DNS lookups. This is a separate mechanism from "percent-encoding of path / query parameters" — do not mix them up:
- Domain name → Punycode (xn--...);
- Path / Query → Percent-Encoding (%XX).
When you paste a complete non-ASCII URL into our online tool, the path and query portions get correctly percent-encoded while structural characters (".", "/", "?") are preserved.
Data Security & Privacy: Why Locally-Run Online Tools Matter
URLs often carry sensitive information: internal API paths, user IDs, redirect targets, OAuth state, search keywords, and more. Sending them to a third-party server means they can be logged, analyzed, or used for ad targeting.
Our tool follows a strict "100% frontend-only" principle:
- All encode / decode logic calls the browser's native encodeURIComponent / decodeURIComponent or equivalent;
- No input, output, or intermediate result is sent to any backend;
- Nothing is persisted via localStorage, cookies, or similar mechanisms;
- You can disconnect from the network and keep using it.
For users who need to process URLs containing tokens, internal addresses, or user data, we additionally recommend:
- Work in an offline or otherwise controlled environment, or manually redact sensitive fields before pasting;
- Avoid pasting production addresses on public computers;
- Never share sensitive URLs over social media or IM — use separate, revocable short links or internal documents instead.
Final reminder: when evaluating any online URL processing tool, first confirm that it does not send your input to a server. A quick way to verify: open your browser's DevTools Network panel, then click "Execute" and see whether new requests are made — our tool makes none. Feel free to try it yourself at our URL encoder / decoder.