Login

Semantics of the Web

6/1/2011 Thomas Shields

Update: just read a fantastic article at the Windows Team Blog about Building Mobile-Ready Content. It's got a great piece on semantics. Check it out!

Why?

I recently had a discussion about semantics and the role of markup and scripting and where their paths meet and where they don't.  Apparently it's rather a touchy matter. Obviously some web developers couldn't care less about what their markup, styling, and scripts look like as long as the finished product looks good. (See Sites, Commercial)
But smart coding is important for a few reasons:

  • A futuristic coding mindset
        - code in such a way that your site will work in the next several iterations of any given browser.
        - code in such a way that your site can easily be updated by you or (presumably) a different developer 5 years forward
  • Site speed: the cleaner, more semantic, and simpler your code is, the faster it loads and works. Elementary.

What?

First off, what is semantics?  Glad you asked.

se·man·tics [ sə mántiks ]   

  1. study of meaning in language: the study of how meaning in language is created by the use and interrelationships of words, phrases, and sentences
  2. study of symbols: the study of the relationship between symbols and what they represent
  3. study of logic: the study of ways of interpreting and analyzing theories of logic

se·man·tic [ sə mántik ]  

  1. relating to word meanings: relating to meaning or the differences between meanings of words or symbols
  2. of semantics: relating to semantics
  3. relating to truth: relating to the conditions in which a system or theory can be said to be true

When we're talking about semantic markup, we mean that the tags that form our HTML should be wrapping content that you would expect them to wrap, even if it could technically be otherwise. Tags tell you what they're supposed to contain; don't put anything else in them. Just because you can doesn't mean you should. A <body> tag should contain the body of the page. Scripts and style sheets don't go there. A <p> is a paragraph. Don't use it to layout the page; it's for a paragraph of text. Again, elementary.

More What/How?

Okay. Definition time.

  • HTML === Hypertext Markup Language
  • Javascript === scripting language independent from but primarily used for interaction with the DOM. (Oh, and Javascript !== Java)
  • CSS === Cascading Style sheets
  • Oh, and DOM === Document Object Model

(notice how there's no type coercion for those equalities...)

Okay, PDS (Pretty Dang Simple), eh? Right? But lots of people don't get it. One thing I've certainly learned, (esp. thanks to all pals at SO) is to keep a separation of concerns. Here's three freakishly obvious, why-the-crap-don't-more-people-get-this??!?, fundamental to web design principles:

  1. HTML is for organizing your content as semantically as possible. Period.
  2. CSS is for effectively displaying your content in a style fashion. Period.
  3. Javascript is for scripting content to improve user interaction. Comma. Just kidding. Period.

"But wait!" you say, in all your annoying impertinence, "doesn't HTML help with styling too? What about the infamous clearfix? Classnames, ids? That's all styling, right?"

Sort of. I for one say clearfixing is a stupid perk that should go away as the languages improve. For one, just use the clearfix on an element that's already down there. Explicitly setting the overflow is supposed to work as well. Ids and class names are used for the styling to know what it's working with, but they also help with semantics by providing a description for the more obscure elements.

Basically the point I'm trying to make here is you shouldn't have to write your HTML for your CSS. You write your HTML for your content. You write your CSS for your HTML. Writing semantic HTML helps you write HTML that snaps to the content, and vice-versa. Matt Mcdonald said:

There's a certain beauty to writing purposeful markup (HTML). You know you've done it right when the page looks fine without CSS.

Example: semantic HTML is the center column of a three column layout first in your markup because it is more important, even though there is no technical difference. If you're doing it right, the CSS shouldn't need that column "in-between" the other two anyways. As a corollary (read: insert cool word for related-point), you would never use a three-column just because it looks good; you use it to support the overall message and content of the site. Write HTML for your content.

A three-column design with a header and footer should be simple:

 <header>
<img id="logo">
<h1>Title</h1>
<nav>
<ol>
<li>Home</li>
<li>Products</li>
<li>Contact</li>
</ol>
</nav>
</header>
<section id="main">
<div id="content"></div>
<div id="leftCol"></div>
<div id="rightCol"></div>
</section>
<footer>Copyright</footer>

I'm not saying the above is perfect by any standard. But it's closer to semantic then a heck of a lot of markup up there, mostly because it couldn't care less what the CSS and Javascript does. That's the point. You should be able to keep generally the same layout with markup but radically change the design with CSS. In the above example, the <header> tag contains items that form a header. The <h1> tag contains the primary piece ("1") of the header ("h"). The image's id is marked in such a way as to communicate the specific purpose of this generic tag.  The <nav> tag is used to contain navigation. The <ol> tag contains an ordered list of <li> list items. Depending on the navigation, the <ul> tag could be used - I think it most cases, navigation is ordered - you want the viewer to read your home page first, then view your products, then contact you.  The <section> tag contains a section that is "main" - as delimited by the id. And etc., etc. You guys get it.

This brings up an avenue into the philosophy of web design. Like other kinds of design - CD Album Art, Software products, etc. - the design is just a means to an end; a pretty way of displaying content. This is one reason I don't really like templates - it feels forced, often simply because it doesn't know what you want to use it for. You have to know what your content is, then design for it. (Blah blah blah...)

Anywho. What I and the guys were actually talking about was scripting and how it relates to markup. I had placed an <input> outside of a <form> tag because a script was interacting with it and I didn't want it to get posted back with the rest of the form's content. Remember how I said markup is for arranging your content semantically, period? I lied. It also, occasionally, has to serve as a means of taking input - it's the portal to the actual interaction with the user that comes via scripting or server-side code. And right around here is where "semantics" starts jumping around. My <input> was indeed something that needed to get posted back to the server - the reason I didn't put it in was because my script was putting it in (handling the formatting). See, I got one part right: scripts are for improving user interaction; and user interaction shouldn't be dependent on it.  But I wasn't "improving" enough. My markup should have pretended scripts didn't exist, that only the server existed to take in the input, and the script should have improved from there. (Side Note: this is why I can't wait for CSS3 Animations to finish up. Scripting shouldn't have to do it). Granted, right now lots of devices have scripting enabled. But to repeat: just because you can doesn't mean you should. Just because scripting is enabled doesn't mean you have to do an animation with it. That's not the script's job. It is the script's job to, for example, take a form (that already works without scripts) and improve it to use nice AJAX instead).

On the other hand, some web applications can set a strict user-base (or know a strict user base) and Javascript is the essential core of the application. I'm not talking about these. I'm talking about public websites, not intranet websites or internet/intranet applications.

Whether you actually support browsers that don't have scripting enabled or can't handle HTML5 and CSS3 is up to you. The point is to keep a separation of concern on these things. Even if you're not supporting script-disabled browsers, your display of content shouldn't depend on the user's interaction with it. Why? Separation of concerns:

  • If a new scripting technique comes on the scene and Javascript becomes deprecated, your HTML/CSS should still work, etc.
  • Scalability: You should be able to radically change the layout or design of a page without scripts being affected; and vice-versa.
  • Maintainability: Someone with zero knowledge of Javascript should be able to edit the HTML/CSS files without fear (well, to a point. They shouldn't scrapping ids or class names)

Also, I realize there's some major overlap here. HTML, CSS, and Javascript aren't automagically and completely independant of each other. Sometimes you do have to add an element solely for styling purposes. The world isn't perfect. On a better note, a good overlap occus with the natural overlap of HTML/CSS (class names, ids, tag names, the <em> tag, etc). Additionally, anotherdesigned 'good' overlap is the DOM in Javascript. Some say it's crap. Some say it rocks. Whatever. The point is, if you're working with Javascript in the browser, in a website, you're primarily going to be modifiying, adding, and removing elements from the document. But when you start using Javascript to generate HTML that is vital to the execution of the page correctly, there's a problem.

 

There you go. The insane ramblings of one of millions of web developers. Please comment with your thoughts - I'm sure I've left something (obvious probably) out. Thanks!
    -Thomas