Why does AI-generated text contain em dashes and curly quotes?

AI language models are trained on professionally edited text — books, articles, academic papers — which use typographic symbols like em dashes and curly quotes. The model reproduces these patterns without understanding they are unusual in casual human typing.

Is typographic normalization enough to bypass AI detectors?

Typographic normalization is a critical first step but not a complete solution on its own. AI detectors also analyze perplexity, burstiness, and semantic patterns.

What typographic symbols does HumanType replace?

HumanType applies 19 replacement rules across 7 categories: dashes, ellipsis, quotes, special spaces, fractions and math symbols, legal symbols, and directional arrows.

How HumanType Works: The Complete Guide

Last updated: May 11, 2026 · Reading time: 9 min

What are typographic symbols?

Typographic symbols are special Unicode characters that go beyond basic ASCII. They include em dashes (—), curly quotes (“ ”), the ellipsis character (…), non-breaking spaces, and dozens more. These symbols were designed to make text look polished in print and professional typesetting — but in the digital world, their presence can become an unintended signal when content is run through AI detection tools.

The distinction matters because AI detection algorithms do not just read text for meaning — they scan it character by character, looking for statistical patterns that distinguish machine-generated output from human typing. A professionally typeset book and an average person’s email sit at opposite ends of the typographic spectrum, and large language models tend to produce text that leans heavily toward the former.

Side-by-side comparison of AI typographic symbols versus plain ASCII equivalents after normalization with HumanType

Why does AI-generated text contain them?

Large language models (LLMs) like GPT-4, Claude, Gemini, and others are trained on vast corpora of professionally edited text — books, articles, and academic papers. These sources are rich in typographic symbols because they were originally formatted with professional typesetting tools. When an AI generates text, it replicates these formatting patterns, including the use of em dashes instead of hyphens, curly quotes instead of straight quotes, and the single-glyph ellipsis (…) instead of three dots (...).

For the AI, these are just tokens. The model does not know or care that — and - are visually different; it simply reproduces what it has seen during training. This is not a flaw in the model — it is a natural consequence of learning from high-quality editorial sources. But it creates a measurable typographic fingerprint that detectors can exploit.

Research into LLM output characteristics consistently shows that models trained on curated, professionally edited corpora tend to reproduce the typographic conventions of those corpora. This finding appears, among other places, in work on text provenance and stylometric analysis — fields that study how the origin of text can be inferred from its surface-level features. See, for example, overviews of stylometric detection methods discussed in the AI safety and NLP literature (e.g., surveys indexed in arXiv cs.CL from 2023 onward).

How AI detectors use typographic symbols

AI detection tools — including GPTZero, Originality.ai, and Turnitin’s AI detection — look at multiple signals: perplexity, burstiness, semantic patterns, and — critically — typographic fingerprints. The presence of certain Unicode characters is a statistically significant indicator that a text was machine-generated rather than human-typed.

Most humans do not know how to type an em dash. They use a hyphen (-) or a double hyphen (--). They do not insert a non-breaking space; they just hit the spacebar. They do not use the Unicode ellipsis; they type three periods. These small differences form a detectable pattern that AI detectors exploit.

Diagram showing the five signal layers AI detectors analyse, with typographic fingerprints highlighted as the easiest to fix

Key insight: Typographic symbols are not the only detection signal, but they are one of the easiest to remove. By normalizing them to plain ASCII, you directly reduce the statistical markers that AI detectors rely on — in under a second.

What does HumanType replace?

HumanType applies 19 replacement rules across 7 categories:

Dashes: Em dashes (—) at line starts become hyphens; inline em dashes become en dashes (–). Bullet points (•) become hyphens.
Ellipsis: The single Unicode glyph (…) becomes three separate dots (...).
Quotes: Curly single quotes (‘ ’), curly double quotes (“ ”), and guillemets (« ») become straight ASCII equivalents (' and ").
Special spaces: Non-breaking space (U+00A0), thin space (U+2009), and hair space (U+200A) become regular spaces.
Fractions & math: ½ → 1/2, ¼ → 1/4, ¾ → 3/4, × → *.
Legal symbols: ® → (R), ™ → (TM), © → (c), † ‡ → *.
Arrows: → ➔ → ->, ⇒ → =>, ← → <-, ⇐ → <=.

Symbol comparison table

The table below shows the most common AI typographic symbols, their Unicode code points, and the plain ASCII equivalents HumanType substitutes. All substitutions preserve meaning for human readers while removing the statistical marker.

Symbol	Name	Unicode	AI text?	Replaced with	Detector signal?
—	Em dash	U+2014	Very common	`-` or `–`	Strong
“ ”	Curly double quotes	U+201C/D	Common	`"`	Strong
…	Ellipsis glyph	U+2026	Common	`...`	Moderate
	Non-breaking space	U+00A0	Occasional	Space U+0020	Moderate
‘ ’	Curly single quotes	U+2018/9	Common	`'`	Moderate
•	Bullet point	U+2022	Occasional	`-`	Weak
® ™	Legal symbols	U+00AE/2122	Rare	(R) (TM)	Weak
½ ¼	Fraction glyphs	U+00BD/BC	Rare	1/2, 1/4	Weak

Real-world case study

Illustrative scenario

A graduate student submits a 600-word literature review draft to Turnitin. The text, generated with the help of an LLM, contains around a dozen em dashes, several curly-quote pairs, and multiple single-glyph ellipses — a typical density for AI-assisted academic prose.

After running the text through HumanType with all categories active, every typographic character is replaced with its plain ASCII equivalent in under a second. The structural content, the argument, and every sentence remain unchanged. The student then reviews and edits the text for voice and accuracy before submitting.

Note: results from AI detection tools vary by document, version, and configuration. Typographic normalization reduces one class of signals; it does not guarantee any specific outcome from any specific detector.

Does this affect readability?

No. The replacements are designed to preserve meaning while changing only the underlying character codes. An en dash (–) looks almost identical to an em dash (—) to a human reader but carries a different statistical signature. Three dots (...) read the same as the ellipsis glyph (…). The text remains fully readable and professional.

In fact, plain ASCII punctuation is often preferred in digital contexts: it renders consistently across all devices, email clients, and content management systems without any risk of encoding issues. Many style guides for web content explicitly recommend straight quotes and hyphen-based dashes precisely for this reason.

Is this enough to bypass AI detection?

Typographic normalization alone is not a silver bullet. Sophisticated AI detectors also analyze sentence structure, vocabulary distribution, and semantic coherence. However, removing typographic “tells” is a critical first step that reduces several detection signals simultaneously. Combined with other humanization techniques — varying sentence length, adding minor imperfections, injecting personal voice — it meaningfully improves the odds of passing AI detection.

Think of it in layers: typographic fingerprints sit at the surface of AI-generated text and are the fastest to address. Deeper signals like perplexity and burstiness require more involved editing. Starting with typographic normalization clears the most visible markers before you invest time in deeper revision. Read our full breakdown of AI detection signals.

Privacy and security

HumanType runs entirely in your browser. No text is ever sent to a server. The JavaScript code processes everything locally, so your content remains completely private and secure. There are no analytics, no tracking, and no data collection of any kind.

This architecture also means HumanType works offline once the page has loaded, is not subject to server outages, and introduces no latency from network round-trips. For anyone handling sensitive drafts — legal documents, internal reports, personal writing — the local-only model is the most private option available.

Step-by-step: using HumanType

Four-step visual guide to using HumanType: paste text, select rules, click replace, copy result

Step 1 Paste your text into the input field on the main page.
Step 2 Select replacement rules — choose categories to replace or leave all active for maximum normalization.
Step 3 Click Replace — processing happens instantly in your browser.
Step 4 Copy the result and use it wherever you need it.

Ready to remove AI typographic fingerprints? Try the tool now — free, instant, no sign-up.

Try HumanType →