URL Slug Generator

Generate ASCII URL slugs from any text. Korean and Japanese kana are romanized in-place; non-ASCII symbols are stripped or transliterated. Handy for blog post URLs and filename normalization.

Loading…

All processing runs in your browser — no files or inputs are uploaded to a server.

How to use

Paste a title or filename and the tool produces an ASCII slug suitable for a URL path, a CSS class, or a sanitized filename. Pick the separator (`-`, `_`, or `.`), toggle lowercase, and set an optional max length to truncate at a clean word boundary. Korean Hangul converts to Revised Romanization (`서울` → `seoul`), Japanese kana to Hepburn (`ありがとう` → `arigatou`), and Latin diacritics decompose via NFKD then strip (`café` → `cafe`, `naïve` → `naive`).

Reach for this when authoring blog posts, content management entries, or filenames where a CJK or accented title needs a stable ASCII handle. Google indexes Unicode URLs fine, but ASCII slugs travel better through copy / paste, terminal logs, monitoring dashboards, and team chat that may garble multibyte text. Everything runs in the browser via internal romanization tables for Hangul and Hepburn, so a confidential post title never reaches a server.

Examples

Korean title → romanized ASCII slug

Input
서울에서 일본 IT 회사 다니는 이야기
Output
seoul-eseo-ilbon-it-hoesa-daninun-iyagi

Hangul converts through Revised Romanization, the standard Korean → Latin scheme issued by the National Institute of Korean Language in 2000. Each syllable maps consistently — `서울` → `seoul`, `회사` → `hoesa`. Latin segments like `IT` pass through unchanged. The result is reversible enough that a Korean speaker can recognize the title, but stays ASCII-safe across every system that touches it.

Japanese mixed title → Hepburn slug

Input
日本企業のAWSコスト管理でよくある失敗
Output
ri-ben-qi-ye-noawskosutoguan-li-deyokuarushi-bai

Hiragana and katakana convert via Hepburn (`コスト` → `kosuto`, `よくある` → `yokuaru`). Kanji is harder — without a dictionary lookup the tool falls back to a Pinyin-style reading for each character, which is why `日本企業` becomes `ri-ben-qi-ye` rather than `nihon-kigyou`. For Japanese-heavy URLs many sites keep the Japanese in the URL (`/日本企業の…`) and rely on browser percent-encoding, or hand-write the romanization. The romanization here is a usable default, not a translation.

Latin title with diacritics, length cap

Input
Pourquoi mon application Node.js consomme trop de mémoire en production ?
separator: -
lowercase: yes
max length: 60
Output
pourquoi-mon-application-node-js-consomme-trop-de-memoire

NFKD decomposes `é` into `e` + combining acute accent, then the accent is dropped. The trailing question mark and `en production ?` segment fall outside the 60-character cap and the truncation lands at a clean word boundary so the slug never ends mid-word. Lower the cap further if your CMS or framework imposes a stricter limit; 75 characters is the soft ceiling Google recommends.

FAQ

Why romanize CJK instead of keeping the Unicode characters?

Browsers handle Unicode URLs fine, but everything downstream might not. Server logs, Slack snippets, copy-pasted links in email, analytics dashboards, monitoring tools, and many CLI utilities mangle non-ASCII paths or display them as percent-encoded gibberish (`%EC%84%9C%EC%9A%B8`). ASCII slugs survive every hop. The cost is a slight loss of "scannability" in the URL bar; the win is logs and dashboards stay readable. Some teams accept the gibberish trade-off and keep CJK URLs; others go the romanization route. Pick whichever pain you can absorb.

Does this strip the accent from `café` or keep it?

Stripped. The tool runs Unicode NFKD normalization, which decomposes `é` into the base letter `e` plus a combining acute accent (U+0301), then removes all combining marks. `naïve` → `naive`, `crème brûlée` → `creme-brulee`. This matches what most CMS slug generators do (WordPress, Hugo, Jekyll). If you need accent-preserving slugs the URL has to stay Unicode; pure-ASCII rules and accent preservation are mutually exclusive.

How long should a slug be?

Aim for 3–5 meaningful words, soft cap 60–75 characters. Google's SEO documentation does not give a hard limit but advises "short, descriptive". Search snippets truncate URLs visually at around 60 characters; logs and dashboards display the full slug fine, but a 200-character path looks spammy in social previews and is hard to share by voice. WordPress defaults to no cap; Hugo, Jekyll, and most static-site frameworks also accept long slugs but recommend keeping titles concise.

Why does Kanji come out as Pinyin instead of Japanese readings?

Mapping a Kanji to its Japanese reading requires a dictionary lookup — `日` could be `nichi`, `hi`, `jitsu`, or part of a compound like `nihon` — and the right answer depends on context. Without an embedded morphological analyzer (kuromoji, MeCab) the tool would have to ship megabytes of dictionary data, so it falls back to per-character romanization based on the Unicode CJK Unified Ideographs block, which yields the Pinyin-ish form (`日` → `ri`). For Japanese-heavy titles, hand-write the romaji in the slug field or use a CMS plugin with a dictionary backing.

Apostrophes — why does `don't` become `dont` instead of `don-t`?

Apostrophes are dropped without inserting a separator because the surrounding letters belong to one word, not two. `don-t-think` reads awkwardly and breaks word recognition; `dont-think` matches what readers expect. Most slug libraries do the same. If your style guide requires the apostrophe split for some reason, post-process the output with a single find-replace.

Can I add stop-word removal (`the`, `a`, `is`, …)?

Not built in — this tool keeps the result close to the input. Stop-word removal is opinionated (which words count?) and language-specific, and shortening "10 ways to improve your SEO" to "10-ways-improve-your-seo" trades a few characters for slightly worse readability. Most SEO experts now advise leaving short stop words in the slug. If you really want to strip them, run the output through a quick sed or hand-edit; the tool does no harm by preserving them.

Related concepts

A URL slug is the human-readable, URL-safe portion of a path — `seoul-eseo-ilbon-it` in `/blog/seoul-eseo-ilbon-it/`. Two transformations turn arbitrary text into a slug. **Romanization** maps non-Latin scripts to Latin characters: Hangul through the Revised Romanization of Korean (2000), Japanese kana through Hepburn (1867, still the most widespread standard). **Unicode normalization + diacritic stripping** decomposes accented Latin letters into base + combining mark via NFKD (RFC 3454 IDNA, Unicode TR15), then removes the combining marks. The combination is irreversible by design — two different titles can produce the same slug — so the slug should be paired with a stable record ID in your storage.

The broader choice is **ASCII vs. Unicode URLs**. Modern browsers, search engines, and HTTP libraries handle IRI (Internationalized Resource Identifiers, RFC 3987) fine. The decision is pragmatic: ASCII slugs travel cleanly through logs, terminals, copy-paste, monitoring tools, and analytics. Unicode URLs preserve readability for native speakers but appear as percent-encoded blobs (`%EC%84%9C%EC%9A%B8`) anywhere a UI does not decode them. Mature CMSes go either direction with consistent results — WordPress romanizes Cyrillic but keeps CJK as Unicode by default; Hugo lets you pick per-permalink. Whichever you choose, lock it in early and stick with it because changing slug rules later orphans every existing URL.

Three adjacent concepts live nearby. **IDN / Punycode** (`xn--…`) handles the domain side — hostnames cannot use percent-encoding, so internationalized domains hide non-ASCII in an ASCII envelope, e.g., `日本.jp` → `xn--wgv71a.jp`. **Permalinks** are the policy layer that decides what a published URL looks like and promises it will not change — slugs are the building block, permalinks are the contract. **Filename normalization** runs almost the same algorithm but with different separator preferences (often `_` for filenames vs. `-` for URLs) and stricter length caps (255 bytes on most filesystems, much less in practice for cross-platform safety).

Related articles

Related tools