Foundations #4: The Accidental Standard

How a weekend project from 2004 became the language of the AI era.

Jun 08, 2026

This article is part of the “Foundations” series, exploring the roots of well-known technologies, patterns, and solutions that quietly revolutionized our industry.

Every post I write for Old Mug is written in Markdown. Every response you’ve probably read from ChatGPT, Claude, or Gemini is formatted in Markdown. The README files your team maintains, the pull request descriptions your engineers write, the documentation your platform serves, all Markdown.

And yet, if you asked John Gruber what he was building in 2004, he’d have given you a modest answer: a Perl script to make blogging less annoying.

This article covers:

The problem Gruber was solving and who helped him solve it;
Why the lack of a formal spec turned into both a blessing and a crisis;
The naming war that almost fractured the community;
Why Markdown survived when so many competitors didn’t;
How a blogging tool became the native language of AI.

The Problem: Writing for the Web Was Painful

To understand why Markdown matters, you need to remember what writing for the web looked like in the early 2000s.

If you ran a blog — and in 2004, many engineers and writers did — your only real option for formatted content was HTML. You wanted emphasis? <em>. A link? <a href="...">. A block of code? <pre><code>. It worked, but it was exhausting. Proofreading raw HTML meant parsing angle brackets instead of reading sentences.

John Gruber was living this frustration daily through his blog, Daring Fireball. A writer and UI designer by training, a CS graduate from Drexel University, Gruber had no problem writing HTML technically — but as he put it:

“Eventually it grew tiresome, and it just felt like I was making work for myself, and I really thought that HTML made it hard to proofread my work.”

The solution he had in mind was disarmingly simple: define a set of plain-text conventions, already in use in emails and Usenet posts, and write a tool to convert them to valid HTML automatically. Asterisks for emphasis. Hashes for headings. Dashes for lists. Conventions that writers were already using intuitively, just without a formal name.

The Collaboration: Gruber, Swartz, and atx

Gruber didn’t build Markdown alone.

Aaron Swartz, then 17 years old, already co-author of the RSS 1.0 specification and a contributor to Creative Commons, had been wrestling with the same problem independently. In 2002, he had created atx, his own structured text format. The name was a nod to the markup language TeX, and the syntax was remarkably prescient: # for H1, ## for H2, and so on. Swartz’s frustration mirrored Gruber’s:

“I’m sick of bringing my writing down to the level of the computer. Why should I have to cover everything in annoying pointy brackets just so it knows what I mean?”

Swartz became Markdown’s “sole beta-tester,” as Gruber later described. He provided continuous feedback on syntax decisions, and ultimately wrote html2text, a reverse converter that could take HTML and turn it back into Markdown. Both tools were released as free software.

Gruber was candid about Swartz’s role: “Aaron was my sounding board, my muse.” Wikipedia credits Swartz as co-creator. The truth is somewhere in between: Gruber designed the syntax and wrote the reference implementation; Swartz’s atx headers — the # convention — were directly absorbed into Markdown. The influence was substantial enough that calling him a co-author isn’t an overstatement.

On March 15, 2004, Gruber announced Markdown on Daring Fireball with a simple definition: “a text-to-HTML conversion tool for web writers.” The reference implementation was Markdown.pl, approximately 1,400 lines of Perl, processing input through a series of regular expression substitutions. It was released as a plugin for Movable Type and Blossom — the dominant blogging platforms of the era — and as a standalone filter for BBEdit.

The response was immediate. Within days, Michel Fortin had ported it to PHP. Within months, ports existed in Python, Ruby, and Java. On December 17, 2004, Gruber released what would become the last update to Markdown.pl — version 1.0.1 — and effectively stopped maintaining it.

He would not touch the core specification again for twenty years.

The Fragmentation: When No Spec Becomes a Problem

Gruber’s decision to leave Markdown underspecified was deliberate. He believed different communities had different needs and that a rigid standard would constrain more than it would enable. The philosophy was fundamentally anti-bureaucratic: “Different sites and people have different needs. No one syntax would make all happy.”

For a few years, this worked. Markdown spread organically across the web because it was easy to implement and easy to extend. But as adoption scaled from blogs to developer platforms, the lack of a formal specification became a serious problem.

Stack Overflow launched in September 2008 with Markdown support. GitHub adopted it around 2009, and in 2011 shipped GitHub Flavored Markdown (GFM), introducing fenced code blocks, tables, and task lists — features Markdown.pl never supported. Reddit built its own parser. Pandoc, the document conversion tool developed by philosopher-turned-programmer John MacFarlane starting in 2006, implemented yet another interpretation of ambiguous edge cases.

The result was a fragmented ecosystem. A document that rendered correctly on GitHub might display incorrectly when converted by Pandoc, or break entirely on Stack Overflow. Researchers built Babelmark, a tool to compare how different Markdown parsers handled the same input — and found that a single ambiguous list item could render in up to 15 different ways depending on the parser.

For teams building infrastructure on top of Markdown, like documentation systems, content pipelines, publishing tools, it was a genuine engineering tax.

The War Over “Standard Markdown”

Tension had been building for years. In 2009, Jeff Atwood — co-founder of Stack Overflow and one of the most prominent voices in developer culture — published a post criticizing Gruber’s stewardship of Markdown, calling it “negligent open source code parenting.” It was harsh, but it reflected a genuine frustration in the community.

Then, on October 12, 2012, Atwood published “The Future of Markdown,” a public call to action addressed directly to Gruber:

“So I’m asking you, John Gruber: as the original creator of Markdown, will you bless this endeavor?”

Atwood proposed that Stack Exchange, GitHub, Meteor, Reddit, and other major platforms work together to create an official Markdown specification and standard test suite. Gruber never responded.

The working group — which included Atwood, MacFarlane, and representatives from GitHub, Reddit, and Stack Exchange — spent nearly two years drafting a specification and reference implementation. On September 3, 2014, they announced the result publicly under the name “Standard Markdown.”

Gruber’s reaction was immediate and unambiguous. In a private email to Atwood, he called the name “infuriating” and demanded the group rename the project, shut down the standardmarkdown.com domain, and issue a public apology.

Atwood, caught off guard, published the exchange. He apologized. Within 48 hours, the project was renamed “Common Markdown.” But Gruber pushed further — he stated that “no form of the word ‘Markdown’ is acceptable to him” in the name. The project was ultimately renamed CommonMark, a single word with no direct reference to the original.

The fallout was a rare public rupture in the otherwise polite world of open source. But CommonMark survived. It provided what had been missing for a decade: a rigorous, unambiguous specification with a formal grammar, a reference implementation in C (approximately 10,000 times faster than Markdown.pl), and a comprehensive test suite. Today, GitHub, GitLab, Reddit, Stack Exchange, Discourse, and Swift all use CommonMark as their Markdown baseline.

Interestingly, the 1.0 finalized specification has still not been released. Even the standardization effort ended up living with ambiguity.

Why Markdown Won When Others Didn’t

Markdown wasn’t the only lightweight markup language competing for dominance in the early 2000s. Textile (2002) was adopted by major Ruby on Rails applications. reStructuredText (2001) became standard in Python documentation and remains the format of choice for Sphinx and ReadTheDocs. AsciiDoc (2002) offered more powerful features and ended up as the default for O’Reilly technical books and the AsciiDoctor toolchain.

Each of these formats is, in certain technical respects, more sophisticated than Markdown. reStructuredText is more consistent. AsciiDoc supports complex documents with cross-references and semantic markup that Markdown cannot express natively.

And yet Markdown became the default.

The reason is readability in raw form. Gruber’s design principle was uncompromising on this point: “A Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been tagged or formatted with special instructions.”

This is a harder constraint to satisfy than it sounds. Compare these representations of the same content:

reStructuredText: **bold**, .. note::, .. code-block:: python — syntactically consistent but noisy to the eye.
AsciiDoc: *bold*, ==== for section delimiters, [source,python] for code blocks — powerful but not intuitive at a glance.
Markdown: **bold**, ## for headings, triple backtick for code blocks — conventions already present in email and Usenet.

Markdown’s syntax wasn’t invented. It was observed from what writers were already doing informally, and then codified. That’s why it reads naturally even without rendering. The asterisks around *this word* communicate emphasis even as raw text. The # at the start of a line reads as a heading even before any parser touches it.

This property — legibility without tooling — turned out to be the decisive competitive advantage. It meant Markdown could be adopted incrementally, stored safely in version control, read in email clients, diffed in terminals, and understood by humans who had never heard of the format.

RTF, DocBook, ODF, and a dozen other formats that were technically superior died slow deaths because opening them without the right tool revealed nothing useful. Markdown’s plain-text legibility was not a limitation. It was the entire product.

The Unexpected Second Act: Markdown as AI Infrastructure

Here’s the part Gruber couldn’t have anticipated.

When researchers at Google, OpenAI, Anthropic, and Meta began assembling the massive text corpora used to train large language models, they drew heavily from the internet’s most structured, high-quality written content: GitHub repositories, Stack Overflow answers, Reddit posts, developer documentation, technical blogs.

These sources share one property: they are almost entirely formatted in Markdown.

GitHub alone — home to over 100 million developers — stores virtually all project documentation, README files, issue discussions, and pull request descriptions in Markdown. Stack Overflow renders answers in Markdown. Reddit historically used Markdown for post composition. The result is that Markdown-formatted text represents a disproportionately large share of the highest-quality training data that shaped modern language models.

LLMs didn’t learn to produce Markdown because someone instructed them to. They learned it because it was the dominant structure of the text they were trained on. A 2025 paper from arXiv studying LLM writing behavior found that models trained on these corpora have “internalized” Markdown as a structural orientation — they produce headers, bullet points, and bold emphasis even when not explicitly asked, because that is the shape of structured thought in their training data.

This has practical consequences that reach far beyond formatting preferences.

Markdown is now the standard interface format between humans and AI systems. When you feed documentation into a Retrieval-Augmented Generation (RAG) pipeline, the recommended input format is Markdown — because the headers create natural semantic chunk boundaries that improve embedding quality and retrieval accuracy. When you scrape web content to feed to an LLM, tools convert HTML to Markdown first — because Markdown strips the noise (CSS, JavaScript, navigation menus, tracking pixels) and preserves semantic structure at a fraction of the token cost of raw HTML.

There is also a token efficiency argument that matters at scale. Markdown conveys structure with fewer characters than JSON, XML, or HTML. For organizations running millions of LLM queries, the difference between sending raw HTML and sending clean Markdown is not aesthetic, it directly affects context window utilization and cost per query.

The emerging llms.txt standard — a structured Markdown file that websites publish specifically to help AI agents understand their content — is perhaps the clearest sign of this shift. It’s a robots.txt for the AI era, and it’s written in Markdown.

Lasting Lessons

The history of Markdown teaches a few things worth holding onto.

Observing behavior is more powerful than inventing syntax. Gruber didn’t design conventions — he codified the ones writers were already using. The asterisks, the dashes, the hashes were already there in email inboxes. The insight was recognizing them as a latent standard and making them explicit. Technologies that absorb natural human behavior tend to outlast those that try to replace it.

Readability is a first-class feature, not a UX concern. The decision to make Markdown legible without rendering felt like a minor usability choice in 2004. Twenty years later, it’s the property that made Markdown structurally compatible with version control, diff tools, email clients — and eventually, language model training corpora. Constraints that serve humans often end up serving machines too.

A missing spec can be both a competitive advantage and a technical debt. Markdown’s lack of formalization allowed it to spread organically and adapt to dozens of contexts it was never designed for. It also created a decade of fragmentation that required a community conflict and a renaming controversy to partially resolve. CommonMark exists because the informal spec eventually became a liability. The lesson isn’t that specifications are always good — it’s that the timing and nature of formalization matters as much as the spec itself.

Adoption by the right ecosystems is irreversible. Once GitHub made Markdown the format for READMEs and pull requests, every developer on the platform learned it. Once Stack Overflow rendered answers in Markdown, millions of questions and answers were effectively encoded in it. That corpus became training data. The format became model behavior. The accidental decisions of 2004 are now embedded in the weights of billion-parameter neural networks.

Some foundations are invisible precisely because they became load-bearing before anyone noticed.

Conclusion

Markdown was never meant to be a standard. It was a Perl script, released by a blogger who was tired of writing angle brackets, collaborating with a teenage prodigy who felt the same way. It had no formal specification, an author who refused to maintain it, and a naming war that nearly fractured its community.

And yet here we are. Markdown is how engineers document code, how writers publish online, how AI systems structure their outputs, and how humans and machines signal semantic intent to each other across billions of daily interactions.

Like the relational model that gave us SQL, and like REST that gave us the web’s communication pattern, Markdown succeeded not because it was the most technically sophisticated option — but because it was the most legible one. It met people where they already were.

The accidental standard became the essential one.

References

Gruber, J. (March 15, 2004). Introducing Markdown. Daring Fireball.
daringfireball.net/projects/markdown
Swartz, A. (2002). atx, the true structured text format.
aaronsw.com/2002/atx
Swartz, A. (March 19, 2004). Markdown.
aaronsw.com/weblog/001189
Atwood, J. (October 25, 2012). The Future of Markdown. Coding Horror.
blog.codinghorror.com/the-future-of-markdown
Atwood, J. (September 5, 2014). Standard Markdown is now Common Markdown. Coding Horror.
blog.codinghorror.com/standard-markdown-is-now-common-markdown
MacFarlane, J. et al. (2014). CommonMark Specification.
commonmark.org
Dayeh, T. et al. (2025). The Last Fingerprint: How Markdown Training Shapes LLM Prose. arXiv.
arxiv.org/html/2603.27006
RFC 7763 (March 2016). The text/markdown Media Type. IETF.
datatracker.ietf.org/doc/html/rfc7763

Old Mug

Discussion about this post

Ready for more?