Every day, millions of new web pages are added to the internet. Most of them are unstructured, uncategorized, and nearly impossible for software to understand. It irks me.

Look no further than Sir Tim Berners-Lee's Wikipedia page:

The markup for Tim Berners-Lee's Wikipedia page; it's complex and inconsistent
What Wikipedia editors write (source).
The browser output for Tim Berners-Lee's Wikipedia page
What visitors of Wikipedia see.

At first glance, there is no rhyme or reason to Wikipedia's markup. (Wikipedia also has custom markup for hieroglyphs, which admittedly is pretty cool.)

The problem? Wikipedia is the world's largest source of knowledge. It's a top 10 website in the world. Yet, Wikipedia's markup language is nearly impossible to parse, Tim Berners-Lee's Wikipedia page has almost 100 HTML validation errors, and the page's generated HTML output is not very semantic. It's hard to use or re-use with other software.

I bet it irks Sir Tim Berners-Lee too.

The markup for Tim Berners-Lee's Wikipedia page; it's complex and inconsistent
What Wikipedia editors write (source).
The generated HTML code for Tim Berners-Lee's Wikipedia page; it could be more semantic
What the browser sees; the HTML code Wikipedia (MediaWiki) generates.

It's not just Wikipedia. Every site is still messing around with custom

s for a table of contents, footnotes, logos, and more. I could think of a dozen new HTML tags that would make web pages, including Wikipedia, easier to write and reuse: , , , and many more.

A good approach would be to take the most successful Schema.org schemas, Microformats and Web Components, and incorporate their functionality into the official HTML specification.

Adding new semantic markup options to the HTML specification is the surest way improve the semantic web, improve content reuse, and advance content authoring tools.

Unfortunately, I don't see new tags being introduced. I don't see experiments with Web Components being promoted to official standards. I hope I'm wrong! (Cunningham's Law states that the best way to get the right answer on the internet is not to ask a question; it's to post the wrong answer. If I'm wrong, I'll update this poost.)

If you want to help make the web better, you could literally start with Sir Tim Berners-Lee's Wikipedia page, and use it as the basis to spend a decade pushing for HTML markup improvements. It could be the start of a long and successful career.