The Complete URL Encoding Guide: RFC 3986 / encodeURIComponent / Percent-Encoding

#01

What Is URL Percent-Encoding? Core Rules of RFC 3986

URL percent-encoding was originally proposed by Tim Berners-Lee in 1994 for the URI specification, and was later standardized as RFC 3986 (Uniform Resource Identifier, Generic Syntax). It solves a very practical problem: URLs only allow "unreserved" ASCII characters to appear literally — everything else — Chinese characters, emoji, spaces, special symbols — must be "escaped".

A URL can be abstracted into the following components:

scheme:[//authority]path[?query][#fragment]

Each component has its own allowed character set, and percent-encoding is the "universal interface" between them: any character can be written as %XX, where XX is the hexadecimal representation of that character's UTF-8 byte (or multiple %XX groups for multi-byte characters).

Examples:

A space " " is encoded as %20; for historical application/x-www-form-urlencoded forms, it may also appear as +.
The characters "土豆" in UTF-8 are 0xE5 0x9C 0x9F 0xE8 0xB1 0x86, hence they encode as %E5%9C%9F%E8%B1%86.
The reserved character & encodes as %26.

With our tool, you can paste any text, choose "Encode" or "Decode", and instantly see its percent-encoded form or original characters.

#02

Reserved vs Unreserved Characters: What Must Be Encoded

RFC 3986 divides characters into two categories. Understanding them helps you decide when to encode.

1. Unreserved Characters — Never require encoding

A-Z a-z 0-9 - . _ ~

These can appear literally anywhere in a URL. Encoding them is legal but produces a fully equivalent result — no semantic difference.

2. Reserved Characters — Must be encoded when used as data

There are two sub-groups:

Generic Delimiters (separate URL layers): : / ? # [ ] @
Sub-Delimiters (separate internal parameter structure): ! $ & ' ( ) * + , ; =

When they serve as delimiters, they must remain literal; when they appear inside data values, they must be encoded. For example:

In ?a=1&b=2, & is a parameter separator and must stay literal.
In ?q=Tom%26Jerry, the user's "&" is part of the value and must be encoded as %26; otherwise the server would read it as an extra parameter.

Rule of thumb: when building URL parameter values, always use encodeURIComponent and let the browser / server decide what to decode.

#03

encodeURI vs encodeURIComponent — The Right Choice

Browsers expose two encoding functions, and they are not interchangeable. Here is the comparison:

1. encodeURI

Purpose: encode a complete URL. It preserves all "semantically meaningful" characters, including: : / ? # [ ] @ ! $ & ' ( ) * + , ; = - . _ ~.

Typical scenario: you have a URL containing non-ASCII characters or emoji, and you want to turn it into pure ASCII that browsers can open. E.g. turning https://示例.com/搜索土豆 into https://%E7%A4%BA%E4%BE%8B.com/%E6%90%9C%E7%B4%A2%20%E5%9C%9F%E8%B1%86.

2. encodeURIComponent

Purpose: encode a single parameter value or path segment. It encodes every reserved character, leaving only the unreserved subset (letters, digits, - . _ ~ ! ~ * ' ( )).

Typical scenario: when building ?keyword=user-input, redirect_uri=target-address, etc. — always encode the "value" with encodeURIComponent.

Counterparts on the server side:

Node.js: decodeURIComponent(...) or querystring.unescape;
Python: urllib.parse.unquote / unquote_plus;
Java: URLDecoder.decode(value, StandardCharsets.UTF_8);
PHP: urldecode / rawurldecode.

Note: most server-side frameworks perform one urldecode pass automatically on query strings. Do NOT decode manually on the client side again — otherwise "double-decoding" attacks become possible: %2526 → %26 → &, allowing attackers to potentially bypass WAF rules.

#04

Common Encoding Pitfalls & Debugging Techniques

The following mistakes are the most common in production. When you face "links that won't open, lost parameters, garbled characters", check this list first.

Pitfall 1: Running the whole URL through encodeURIComponent

The result is that :, /, ? all get encoded as %3A %2F %3F, turning the URL into a long hex string that the browser cannot resolve. Correct approach: only encode the "parameter values" and "path segments" individually, then assemble them with template strings.

Pitfall 2: Double-Encoding

The classic symptom: "Tom & Jerry" → "Tom%20%26%20Jerry" → then someone encodes it again → "Tom%2520%2526%2520Jerry". After the server performs its single decode, it still sees percent-signs in the value. Solution: encode exactly once, in the layer closest to the user input.

Pitfall 3: Spaces encoded as + instead of %20

+ is a legacy convention of application/x-www-form-urlencoded. Strictly speaking, it is not part of RFC 3986. In paths, general query strings, or OAuth signatures, spaces must use %20. When debugging APIs, always check the raw query string in browser DevTools, not the human-friendly preview.

Pitfall 4: Using decodeURI on encodeURIComponent output

decodeURI expects a "valid-looking URL". If the string contains encoded reserved characters like %2F or %3F, decodeURI will throw a URIError. Always use decodeURIComponent to decode values produced by encodeURIComponent.

Debugging Checklist

Copy the URL from the browser address bar into our online tool, pick "Decode", and check whether the raw values match expectations;
In Chrome DevTools' Network panel, compare "Query String Parameters — view parsed" versus "view encoded" to spot double-encoding;
For OAuth 1.0a, URL-Safe Base64, and similar special scenarios, be aware that they use their own "URL-Safe" tables — e.g. + → -, / → _. These are different rules from standard percent-encoding.

#05

Real-World Scenarios: APIs, OAuth and Deep Links

Here are several scenarios where correct URL encoding is critical in production systems.

Scenario 1: REST API Query Parameters

When a user searches for "Tom & Jerry", simple string concatenation would result in ?q=Tom&Jerry. The server would read q as only "Tom" and treat "Jerry" as a spurious extra parameter. Correct form:

const url = `/search?q=${encodeURIComponent(keyword)}`;

Scenario 2: OAuth 2.0 redirect_uri

Authorization servers require the redirect_uri to match the registered value byte-for-byte. The client must run the redirect_uri through encodeURIComponent itself, so it appears in the query string like https%3A%2F%2Fapp.example.com%2Fcallback. A common bug: forgetting this step causes "invalid_redirect_uri" from the authorization server.

Scenario 3: Mobile Deep Links

When a WebView navigates to myapp://product?id=123&ref=search, any spaces, ampersands or non-ASCII characters inside "ref" must be individually encoded; otherwise iOS / Android parsers will truncate at the first literal &. Typical pattern: encodeURIComponent each field value, join with &, then append to the scheme.

Scenario 4: Open-Redirect & Phishing

An often-overlooked risk: if a server accepts a redirect parameter without validation, an attacker can craft ?redirect=//evil.com and forward users to a phishing site. Encoding alone does not solve this, but correct encoding combined with a strict host whitelist (only allow redirects to trusted domains) is the industry recommendation.

#06

Multilingual Characters, Symbols and the UTF-8 Effect

In modern URL specifications, characters are always encoded in UTF-8. Most programming languages and frameworks use this by default, but you still need to watch the following details.

1. Non-ASCII characters are always encoded

All Chinese characters, Japanese kana, emoji, Greek letters, diacritics, etc. must be written as %XX%XX... groups. Examples:

"你好" → %E4%BD%A0%E5%A5%BD
emoji "🎉" → %F0%9F%8E%89 (4 bytes → 4 %XX groups)

2. Length ≠ Character Count

A single Chinese character expands to 9 URL characters (3 bytes × 3 chars per byte: e.g. %E4%BD%A0). In constrained URL-length environments (such as the legacy 2083-char IE limit, or WAF header-length limits), long non-ASCII parameters can cause truncation. Recommendations:

Transfer extremely long parameters in a POST body instead;
When a URL is the only option, base64-encode the string first, then percent-encode;
Always verify the server-side max-query-string configuration.

3. Internationalized Domain Names (IDN / Punycode)

When a domain contains non-ASCII characters (e.g. 示例.com), browsers internally convert it to Punycode form xn--fsq668b.com before performing DNS lookups. This is a separate mechanism from "percent-encoding of path / query parameters" — do not mix them up:

Domain name → Punycode (xn--...);
Path / Query → Percent-Encoding (%XX).

When you paste a complete non-ASCII URL into our online tool, the path and query portions get correctly percent-encoded while structural characters (".", "/", "?") are preserved.

#07

Data Security & Privacy: Why Locally-Run Online Tools Matter

URLs often carry sensitive information: internal API paths, user IDs, redirect targets, OAuth state, search keywords, and more. Sending them to a third-party server means they can be logged, analyzed, or used for ad targeting.

Our tool follows a strict "100% frontend-only" principle:

All encode / decode logic calls the browser's native encodeURIComponent / decodeURIComponent or equivalent;
No input, output, or intermediate result is sent to any backend;
Nothing is persisted via localStorage, cookies, or similar mechanisms;
You can disconnect from the network and keep using it.

For users who need to process URLs containing tokens, internal addresses, or user data, we additionally recommend:

Work in an offline or otherwise controlled environment, or manually redact sensitive fields before pasting;
Avoid pasting production addresses on public computers;
Never share sensitive URLs over social media or IM — use separate, revocable short links or internal documents instead.

Final reminder: when evaluating any online URL processing tool, first confirm that it does not send your input to a server. A quick way to verify: open your browser's DevTools Network panel, then click "Execute" and see whether new requests are made — our tool makes none. Feel free to try it yourself at our URL encoder / decoder.

The Complete URL Encoding Guide