Got Alt?

I’ve been going back and forth a little bit with Ian Hixie who is the man behid the alt text spec among many other things.

1. He is not dumb.
2. He will play a significant role in how CSS and the web in general develop over the next few years.
3. 1 + 2 = a good thing for all of us

So you may want to at least bookmark his blog.

He brought up an obvious point that I’ve completely missed (which is not unusual for me). That has sparked the following rant.

Alt text serves a key purpose. It’s not just some words that pop up as a toolbar over an image if you leave your cursor there too long. Nor is it just some quick means to identify the contents of an image for text-based browsers or screen readers.

Alt text is one of the many underappreciated keys in HTML that allows an HTML document to be self-contained and self-sustaining long after all the external objects the HTML document reference are gone.

What do I mean by that?

Ian’s example was Google cache. My example would be the Internet Archive. Long after you stop developing content, the HTML documents you create will still be around.. somewhere. And not everything that you have tied into that document will be with it. Images, stylesheets, javascript… all these resources external to the HTML document will be out of reach.

When all that is left is your HTML, will your page retain all the information and meaning it once presented when all those resources were still available?

It should!

Here’s the scenario: hundreds of years from now a digital archeologist is combing through a bunch of ancient hard drives recently recovered from the ruins of a long lost data center. A reconstruction team is able to recover data off these terribly large, terribly small capacity disks. That data are your HTML documents. Sadly, the images folder is too corrupted for repair, and only the HTML files themselves are left behind. The documents talk of great accomplishments and amazing and unexpected results that will set the company in a new and exciting direction. But the exact nature of what is being discussed is within those images. The context of the words is lost because the images containing the key information are gone, and there is nothing that describes what was in those images. Sadly, the archeologists put the data into storage, in the basement of some digital museum, and the world will never learn about how your dazzling new CSS-based layout helped increase ease-of-use and, in turn, profits, earning you employee of the year.

Very sad.

If only you had included descriptions of each image within the alt text… the archeologists would go on the news about their amazing find, about how you were the digital god of the early computing age. Several statues would be erected in your honor, and 3 elementary schools (and 1 pre-school) would be named after you.

But that is not to be.

Now this bit of story is fun, but it makes the assumption that you care about how your information is understood hundreds of years from now.

Alright. Let’s talk about right now.

You’ve just developed your amazing web site design. It blows the world away. It’s so hot even Slashdot links to your company. But.. uhoh.. your company has just been hit by the Slashdot Effect. Your image server is dead. Your document server is slowing spitting out HTML. People are able to somehow get to your site, but those images won’t be loading anytime soon. So what are your potential fans (and customers) going to be able to see? No graphs. No screenshots. If there’s no alt text, there’s nothing to ground the surrounding text. People will dismiss your content as useless, make up their own minds and fill in very large blanks with their own misunderstanding of what you’re trying to say.

There goes that employee of the year award. The statues. The schools… well, you get the idea.

The point is this: clients will not always have access to the external resources your document references.

Images, javascript and the like are all superfluous to an HTML document (web page). They are there to better aid the client in interpreting the data (usually the humans with decent eyesight using a modern web browser type of client) but should not be the only means to access that information.

So what about stuff that’s image-centric, such as color blindness tests, or “magic-eye” images or examples of optical illusions? Well you can use text to describe each one of those. That’s your alt text. Will the client be able to get the same use out of the page if it can’t “see” the images? No.. but there’s a difference between use and information.

I may not be able to see the color-blind test image, but reading a description along the lines of “an area filled with different colored dots organized in a pattern such that people who suffer from type X color-blindness will see the number 7 and others the number 8” is sufficient enough information for non-visual users to act upon.

Bah.

Running out of brain juice again. Please keep in mind all of these random posts of mine are pure stream-of-consciousness (thus the poor organization, spelling, grammar, etc…)

It’s about information. Screw presentation. Hah!

Web of Information: Well Structured HTML

In my last post I started to get into the meat of the message I’m trying to get across. I’ll recap the message again in the hopes that in one of these posts I’m going to get it down to a brief, concise, and very easy to understand manner.

The Idea

The web, at its core, is all about the organization of information. It is organization that determines accessibility of information. Accessibility determines a client’s ability to consume information. The client’s ability to consume information determines the usefulness of the web (our work). Our ultimate goal, as web developers, is to make our work useful.

Organization begins with the structure of the individual document and flows out to how a group of documents relate and link to each other, which further flows out to how groups of documents relate and link to one another, to sub-sites or sub-domains, to whole web-sites, to the web itself.

Everything begins with document structure. We use HTML as (the primary) means to structure individual documents. How we structure HTML documents is the key to how well the web works. That is why, in this post, I will be focusing on well structured HTML.

A Bit Ambiguous

I was purposely being ambiguous in the last section. We need to recognize that a client is not necessarily a user accessing information via a graphical web browser. There will certainly be users who operate using a text-only browser such as Lynx. Others may be using a screen-reader such as JAWS. And others still might not be human at all, but a computer application designed to pick out the important parts of documents on the web. One such example would be a search engine’s webcrawler or bot. If the bot can understand the structure of your document, it will be better equipped to index your information and make it easier for others to locate and access it.

Likewise, documents could be more than just HTML structured documents. We might also find text documents, spreadsheets, presentation slides, images, and any other manner that facilitates the storage of information. For the purposes of this post, I’m going to focus on HTML, but we must keep in mind that HTML is not the only manner in which information is presented on the web.

What Is Well Structured HTML?

Well structured HTML is markup used to provide meaning and organization to the information contained within an HTML document. Well structured HTML is free of superfluous markup such as embedded presentation logic which I discussed in my previous post. If a piece of markup exists within an HTML document, it has reason and purpose and provides added meaning or insight to the information it is acting upon.

So what the hell does that really mean? I’ll get into that in just a second, but first I want to point out that when I talk about well structured HTML I am not focusing on correct HTML, where each opening tag has it’s corresponding closing tag, and where nested tags open and close in the correct order, and so on and so forth. Syntax is not what I’m talking about here. I’m talking about purpose and meaning. These are slightly abstract concepts but have a greater importance and is the subject that got me going on putting together this Web of Information series in the first place.

Let’s look at some HTML:

This is a line of text.<br>
<br>
And here we are on a second line of text.<br>Or is it the
second item in a list?<br>
Or is it a new paragraph?

So which is it? What is the relationship between the first and second blocks of text? There’s an empty line separating the two text blocks, why? What is the intent or purpose of that separation? To a graphical browser, it doesn’t matter much, does it? Whether I’m using line breaks or paragraph tags or a list with no style, to anyone viewing this through a browser they will probably interpret the information as two paragraphs of text.

And this is what’s plagued the web: too much focus on the graphical representation of the information, and not enough on the underlying structure of the information.

Detour In Philosophy

In the HTML sample above, the intent of the author is ambiguous. Us humans using a graphical browser can make assumptions easily enough, but other applications or methods being used to consume this information may not be able to make such assumptions. Not important? Perhaps. This is more academic than practical, isn’t it? If 99% of your information consumers are humans on a graphical interface, who cares? So this is where I tie into the unfinished thoughts in the philosophy section of the Philosophy In A Broken Web article. We don’t know what the future may hold, but we can be fairly certain a lot of the HTML documents out there will be around for some time. Down the road, the need to understand purpose and intent of a block of text could become very important. Advances in search engine technology might occur in which the position of text itself within the structure of an HTML document adds or detracts the ‘score’ given for a certain search term. A term appearing in a list might carry more weight than something found in a paragraph because the list will inherently carry more importance or be more closely related to the true subject matter of the document. This is one example of what might be. There are millions I can’t begin to imagine of. It is because those millions that I want to develop well structured HTML. It’s not only an investment in the now (which I’ll get into later) but it’s an investment in the future.

Back To The Show

Given the HTML example above, well structured HTML means adding meaning, purpose, intent, etc.. to the information. We do this by being explicit in the document’s structure. The well structured HTML version of the previous example would look something like this:

<p>
This is a line of text.
</p>
<p>
And here we are on a second line of text.
Or is it the second item in a list?
Or is it a new paragraph?
</p>

So now we know that each block of text is a paragraph. Well structured HTML is markup that keeps to the spirit or original intended purpose for a given HTML tag. Table blocks contain tabular data, and only tabular data. H2 and other headings define headings to different sections within the page. They form a type of hierarchy that a client can use to get a better handle on the structure and organization of the document. A quick scan of just headings will tell the client what each section is about without having to parse individual paragraphs. Strong and em wrap information that need to be elevated in importance from surrounding information (such as text in a paragraph). Lists contain lists of information, blockquotes contain quoted text from some other source… and so on.

You soon find yourself using br tags a lot less, to the point that a majority of documents will probably not have a single one. In fact the XHTML2 specs replace br with a less ambiguous l (line) element used to wrap individual lines. It explicitly defines a string of text as being part of a single line. It adds meaning, whereas br has virtually no meaning to it, but is a sort of cleverly hidden bit of embedded presentation logic.

Hr tags might be in the same boat. Although hr does have meaning and purpose by making the separation of information more pronounced. Headings ought to provide enough functionality for that, but perhaps there will be times where it isn’t logical to use headings but an explicit separation of information is needed. Maybe it’s a gray area. And this is where you really get to exercise your own philosophy. You make the choice. Is hr only a form of embedded presentation logic or is there a purpose to it? That’s your call.

I’m not going to go over the purpose of each and every HTML tag here. You can consult the HTML spec yourself and determine intent, purpose, and meaning that each tag provides.

What’s important here is that you break down your content into logical blocks that are defined by whatever markup you feel is appropriate. Whenever possible you are explicit in the intent of the information. Paragraphs are wrapped in p tags, tabular data is placed inside tables, and so on.

At the end of the day, your pages will be primarily headings, paragraphs and lists. Tables, certainly, but you don’t see tabular data as often, especially inside blogs or personal websites. And certainly you’ll have more than just what I’ve listed, but you should start to feel a great simplification in how your information is structured, and it should feel correct, clean, and good.

But Your Approach Gives Me Plain Pages. ICK!

As I said before, 99% of your clients will be humans operating a graphical browser. Color and other presentational elements will certainly come in handy, and actually aid in the visual breakdown of the information within the HTML document. And you most certainly can (and should) do so, even with the approach I’ve lined out here.

How? CSS! Throw an “underline” class into a stylesheet, link it from your HTML document, and apply it to your headings. Now you can get your hr effect without hr tags. Change colors for different heading levels to help guide a user’s eyes through your information at the depth they want to scan at. Alternating background colors for table rows certainly helps the eye when trying to follow a row across the screen. All done by applying a simple class or id attribute to your HTML document. That’s perfectly acceptable, although I do recommend using class and id values that have some meaning to them. Class names such as “blue” and “dots” work fine, but “worksheetTable” and “importantHeading” carry much more meaning in their names. Plus it saves you from having to apply red colors to a class named “blue” when you decide you’re no longer happy with blue headings.

If you’re reading this, you probably have some understanding of the power of CSS. I won’t go into it here. But suffice it to say that external stylesheets should provide you with all you need to add presentation logic to your HTML documents.

Practical Benefits

Screw those academic and philosophical approaches. I need something tangible; a real reason to actually care about well structured HTML.

Well you’ve got it. Well structured HTML documents, as I’ve discussed here, will almost always be smaller in filesize than ones with lots of embedded presentation logic and br tags. Furthermore, they are much easier to understand when looking at the raw HTML. How often have you gone to edit a page original created by Word or FrontPage or DreamWeaver, only to cringe at the sight of the cluttered and confused structure? With this approach, even documents produced by FrontPage and DreamWeaver will be much easier on the eyes and brain to edit. No more seemingly random font tags or empty strong or em tags. Things become much more clean and easier to manage. When was the last time you created an HTML document in FrontPage that you could say had clean HTML?

Well structured documents, being easier to follow and edit, means it will be easier for others to manipulate. No longer will you have to rely on a single person who understands the arcane purpose to the font tags that are nested 4-deep. Beginners to HTML will find it much easier to follow and understand as well. A less confusing document means less chance for mistakes while making changes.

But the biggest benefit of all is when you use external stylesheets to handle your presentation logic. With embedded presentation logic you have to edit every single page when you make a change to your site’s color scheme or layout. With external stylesheets, all your documents point to a single source to define colors and other presentation logic. A change to your site’s color scheme is but a few quick edits to a single CSS file and you’re entire site is updated. I covered this in my previous post but just wanted to double-up on the message.

Well structured HTML documents + CSS = a website that is small, efficient, easy to manage, and has an inherent organization to individual pages that will facilitate carrying that organization up through the entire site much easier. And I’ve even heard of several instances where sites that use well structured HTML will find themselves better ranked among search engines, and users of those search engines more likely to find exactly what they’re looking for.

Do As I Say…

So if you’ve viewed the source of this page at all you’ve seen it’s got not so great HTML structure. Yep. Headings within paragraph tags?! C’mon… I know better than that… don’t I? Wait… there’s even some embedded presentation logic that I’ve added to this article! With all my crap about embedded presentation logic, why am I still using it?!

A couple reasons. First, the headings inside paragraphs seen on this blog is directly related to my laziness. I could insert my own paragraph tags and tell MovableType to not convert newlines into appropriate HTML. At some point I might go back and fix that, but for now… I’m lazy.

What about that embedded presentation logic? Sometimes it’s the means to an end. Sometimes it’s just easier that way. And sometimes… see reason given in previous paragraph.

This is a good philosophy. It is a good guide. Do I expect everyone to follow these ideas to the letter? No. I’m a realist. But keeping these ideas in mind will certainly help you out, and help you to develop your own approach. You may not find yourself making every HTML document “perfect”, but it will be better if you can apply even just a little of what’s here.

And sometimes you just forget yourself completely and go off and do things you know you shouldn’t. It’s exploration. We can’t get stuck into a single paradigm and expect it to be right forever. Kick the tires, make sure this is still a strong idea. Maybe embedded presentation logic has its place. Maybe it’s not so black and white. I’m not going to tell you how it is. I’m going to give you some ideas. You make of them what you will.

Second Star To The Right…

So where to from here? The original message I wanted to get out when I started this has been, for the most part, been put down in these articles. But there are certainly other areas of web development to talk about, and that’s what I’m going to do. Each post may not tie in to its predecessors, but I hope to have a few m ore ideas to share as we go along here.

As always, I’d love to hear from you. Any thoughts at all, even “wow, this sucked”, I’ll take that. Either post a comment or e-mail me directly at: ruthsarian@gmail.com.

Cheers

Web Of Information: Fixing The Web

A belated continuation of my Web of Information series.

A Brief Recap

The web was created to make access to information easier by placing it in a structured/organized format (HTML). When the web became popularized, primarily through the invention of a graphical interface for the web, the focus changed to user experience. HTML documents became cluttered with presentation logic and the actual information with HTML documents became secondary. This is how things went wrong.

Purpose

There are no HTML police. Nobody to say what is right and what is wrong. We are allowed to do with HTML as we see fit. If I want 100 nested font tags that are never closed, so be it. What I want to do here is show that you don’t need font tags, or any other tags used only for presentation. They don’t belong in an HTML document and I want to cover why this is from both a practical and academic point of view. Furthremore, purposely not using presentation tags actually helps you to build a more well-structured HTML document. And the ultimate prize: with the wide support of CSS, you can have a single point that defines the presentation of your HTML structured information for all your HTML documents. Your HTML documents become smaller and easier to manage and, when done right, much more compatible with the large number of World Wide Web clients that are out there. Everyone benefits.

Just a note that my next post will focus on the how, which will discuss what I refer to as well-structured HTML documents. First we need to figure out why we want to create well-structured HTML documents.

Academic Perspective

HTML documents are structured information. That structure makes the meaning of the information more explicit and easier to interpret by a client (whether that’s a computer application or a user). So when we look at HTML documents and why presentation tags don’t belong, we have to think about meaning.

What meaning does a font tag provide to its contents? For a graphical interface it means how the text is to be rendered. This could be done in a manner to boost or take away importance of the text within the font tags. To any non-graphical client, however, the font tag is meaningless. How does a screen reader convey the difference between “Times New Roman” and “Helvetica”? It doesn’t. The font tag is ignored.

Another, more in-depth, example is looking at the differences between strong and b(old) tags. What meaning can I derrive from a b tag? Not much, if any. I know that b will make the text bold, but that’s about it. And while you might infer that bolded text should be given more importance over any surrounding text, that’s only an inference. The inference only works for visual-based HTML document processors, such as a graphical web browser. A screen-reader would get no meaning from a b tag, because there is no visual manner to bold the text. B is a strictly visual, presentational tag. Meaning added to the content wrapped inside b tags can only be inferred.

Strong, on the other hand, carries more than just a visual instruction. Strong explicitly tells the client that the content within the tag has a higher importance. How that higher importance is interpreted or rendered is up to the client and not a concern of the HTML document’s author. Visually, a strong tag is typically handled the same way a b tag, that is, the textual content of the tag is bolded. But now a screen-reader has a non-visual tag that it can do something with, such as increase the volume of the voice reading back the text.

The difference between b and strong may seem small, but it’s a very important difference. B is a cue to the client on how to visually render the content wrapped within. There is no added meaning to that block of content, just a change in how it is rendered. Whereas a strong tag explicitly tells the client that the content within is of higher importance. Strong adds meaning to the content it wraps. It is not a visually-oriented tag, it is a meaning-oriented tag.

The same discussion could be had on the differences between i and em. Subtle (to us users of graphical interfaces) but very important differences. And this is why I title this section “academic perspective”, because at the end of the day, the target audience of most HTML documents is a human operating on a graphical interface. To this audience, the difference between strong and b is pointless. But to HTML document authors, it should mean quite a lot.

There is a simple rule of thumb to go by that carries all of this thinking with it: There should be no information within an HTML document that is medium-dependent. That would have to include inline stylesheets and possibly embedded stylesheets as well. It’s a good rule but it can certainly be broken, so long as you have a specific purpose in mind for doing so. The practical perspective will give better reasons to support it.

Practical Perspective

This section will probaby carry more weight than the previous. Afterall, if a b tag is going to get the job done, who cares about the “academic” issues? If it works, it works.

Imagine you have a nearly infinite number of HTML documents on your website. Each page contains various font and b tags. Then one day you decide (or the decision is made for you) that the font type needs to be changed. You must now edit thousands of HTML documents to do so. A mass search-and-replace application might do the trick, but will it catch everything? What if the changes required were more complex; outside the scope of any searc-and-replace application? You’ll be handling each document by hand; a very tedious process.

Enter stylesheets. Create documents without any font tags. Allow for a single link tag in the head of the document which points to a stylesheet. That stylesheet then defines the fonts and colors and other presentation logic used to display all of your thousands of documents. When asked to change the size of a particular heading, or font type, a simple edit to that single CSS file instantly propagates out to all HTML documents.

The only caveat to this approach is that any documents with inline or embedded stylesheets, as well as documents with any presentation tags, may still need to be edited by hand to either remove or work around those particular instances. But if you remove all presentation tags and embedded/inline stylesheets, you won’t have to worry embedded presentation logic mucking up your global stylesheet. Thus I refer to the rule given in the last paragraph of the previous section.

I’m over-simplifying things a bit here because I want to move on. Chances are if you’re reading this you already have a good idea of the power of global/external stylesheets. But the caveat with embedded presentation logic, and thus why you shouldn’t have any, is something I want to cover a bit more first.

Embedded Presentation Logic

Let’s say you have a particular group of pages, among your thousands of HTML documents, which have blue-colored headings. To achieve this you’re using any manner of embedded presentation logic. By “embedded presentation logic”, I mean anything that dictates how a page is to be rendered that is local to the document, whether it be inline stylesheets via the style attribute, embedded stylesheets via the style style tag inside the head block, or font, b, i or other such presentation tags.

So you’ve got your blue headings, while the rest of the site uses the standard black-colored headings. Then you have to convert all your pages to red headings. This isn’t so bad, a quick addition to your global stylesheet to make all headings red and you’re good to go.

Not so fast.

Your embedded presentation logic is going to take precedence over anything you’ve placed inside your global stylesheet. Now you may start thinking about using the !important CSS directive, however you want to stay away from !important as much as possible. It overrides every other stylesheet value that exists, including any user-defined stylesheets. User-defined stylesheets are a bit new so they aren’t given too much thought, but a UDS can be used to give a color-blind user or a user with poor sight a chance to view pages in a color scheme or font size that makes it easier for them to read. If you override that, you’re degrading their experience and may even be blocking their ability to read the site at all. So let’s stay away from !important.

Even if you were to use !important, it still submits to the heirarchy of the document. If the heading color is set with !important but there’s a font tag inside that heading tag, your !important color becomes meaningless.

Your only recourse is to edit each HTML documents by hand to remove the conflicting embedded presentation logic. Had you refrained from embedded presentation logic, your life would have been a bit easier. So what else could you have done besides embedded presentation logic? In this example you could have simply created a .blue CSS class. This class could simply set the font color to blue. Now apply that to your headings and you’ve got blue headings. When you make your move to red headings, you could negate the .blue class for all headings.

Aside: Perhaps not the best approach from a CSS design standpoint. Maybe you call the class “specialHeadingOne” or something else that describes why there is a different color for the heading, rather than just the name of the color. That way your CSS doesn’t have a .blue class which has a red color definition in it. But that’s a whole separate topic entirely.

Motivation

Hopefully you can start to see the advantages to keep embedded presentation logic out of your HTML documents. I’m a bit strapped for time on writing this post up and I want to get it up ASAP so I apologize for any grammar/spelling errors.

In my next post I’ll talk about how you structure HTML documents without embedded presentation logic and utilize stylesheets to easily manage how your information is rendered to the user.

Web Of Information: Philosophy In A Broken Web

Before I continue, let me provide a much belated disclaimer. I am not an expert. I’m just some guy who works with this web stuff for a living. These are just my opinions and thoughts. I welcome discussion on this. In fact, I crave it. This isn’t stuff that I can just turn to a co-worker and say “Hey Bob” and start in on a conversation about information theory and the web. The few of you reading this are probably the only ones I can get any sort of dialog with. Every opinion counts, so please share. Even if it’s to say I’m so off base that I might as well quit my job and become a lumberjack.

And so I move on, with my flannel shirts packed and my saw sharpened.

Philosophy?

Yes, philosophy. The web has become a game of interpretation and philosophy. It is still a “wild west” of sorts. Nobody can tell me that I must use an html or body tag within my HTML documents. In fact, there are quite a number of sites out there that use neither, but still deliver HTML content. Web browsers (most of them anyways) will render these pages exactly as if those tags existed. The HTML spec seems to be more suggestion than law.

Here’s the worst kept secret about the web: You don’t have to follow the rules.

So why should you?

This is where philosophy comes into play. You have to want to follow the rules. There needs to be some kind of motivation to follow the rules. User experience is, usually, the goal web developers use as their motivation. I’m going to use those HTML and BODY tags because if I don’t, users that have browser X installed on their system won’t be able to browse my website. There’s the motivation, and I think that’s the most common form of motivation found when constructing HTML documents.

We need to change this philosophy – this reasoning behind the motivation. The web isn’t about the user experience. The web is about information management. If you read through the history of the web, you would see that this is exactly what the web’s fouders had planned. It was the advent of the graphical interface web browser that caused a shift in web development philosophy. It became focused on user experience.

Going Off Track

Picking up where the history lesson left off…

During the mid 1990s the Internet became an entertainment playground. Web browsers were being created to operate outside of the HTML spec. They were using proprietary tags to create a more interactive user experience. The idea of information organization and structure was ignored as webmasters focused on user experience. The largest, and longest running symbol of this abuse is the use of tables to create multi-columned layouts.

The table tag was created for the purpose of structuring tabular data within HTML documents. A person or application that is processing an HTML document would expect table elements to contain only tabular data. That is no longer the case. It is similar to an author writing a novel entirely within Microsoft Excel. Can it be done? Absolutely. Should it be done? Absolutely not!

Another example would be the heading tags. Headings help break down large amounts of information into small sections. Each heading provides insight into what the subject of each group of information will be and makes it very easy to scan a document for specific topics within a large document. An application might be able to generate an abstract of an HTML document based off the order and organization of headings within the document. This could be quite useful, such as enumerating web pages within a large web site to help create organization out of the mass of information. But many webmasters found alternative uses. Headings provided an easy way to increase and decrease font sizes. Whole HTML documents have been written inside of heading tags just to increase or decrease the page’s font size.

The result is that applications which process information stored on the web, outside of popular graphical interface web browsers, can no longer rely on document structure (HTML tags and their content) to reliably discern meaning of the HTML document.

In the late 1990s the web became a collection of PowerPoint presentations. Access to information on the web was limited to visual means by graphical web browsers. User experience was based on visual experience. Providing structure and organization to information was ignored. The point of the web’s creation was to manage information. But now we can no longer manage information in any method other than in a visual manner.

The web has been broken.

Web Of Information: History

Academic History

Published in the July 1945 issue of Atlantic Monthly was an article by Vannevar Bush titled “As We May Think“. In this article, Bush describes a system that would help organize and make accessible large amounts of information. This system was called Memex. It was the start of an idea where information could cross-reference other material which could be pulled up in a moment’s notice. It was the start of an idea that would breed what we now know as the World Wide Web.

In the 1960s a Harvard graduate student named Ted Nelson began to develop his own ideas about the organization and flow of information. His work had a lot to do with engaging information that was not stored in sequential order and the ability of users to take their own path through information rather than in a linear fashion. This concept was readily demonstrated by his two books Computer Lib and Dream Machines which reference each other and are not written in any particular order. (You can get a feel for the books and what they looked like here.) By the end of the 1960s he had named his information system Project Xanadu. During his early work on Xanadu, Nelson coined the terms hypertext and hyperlink.

Hypertext refers to information (text) that contains references (called hyperlinks) to other pieces of information that may be brought up quickly by accessing the hyperlink.

In the early 1980s Tim Berners-Lee, an independent contractor at CERN, began to develop an information system based on the concept of hypertext. By 1989, a functioning prototype of this system was up and running, named ENQUIRE. Building upon his ENQUIRE experience, Berners-Lee proposed the creation of a new hypertext system, which would operate over the global “Internet” network, called the WorldWideWeb. The development of the web included the creation of the Hyper Text Transport Protocol, the Hyper Text Markup Language, and the Uniform Resource Locator. HTTP handled communications between server and client, HTML detailed the structure of the native document format for the web, and the URL was the mechanism used to reference documents contained within the web. Berners-Lee’s work for the web was made free for all to use and in 1994 he co-founded the world wide web consortium. The W3C would handle the development and management of various standards used in the web.

In April of 1993 Marc Andreessen created a web browser named Mosaic. It had a clean, graphical interface that made it unique among the few existing browsers. It was also released under a unique license that allowed the program to be used for free in non-commercial use. These factors helped make it the most popular browser of its day. Andreessen would go on to co-found Netscape Communications Corporation. NCC produced the Netscape Navigator web browser which was the product of Andreessen’s work on Mosaic.

The Growing Web

Up until this point that Internet had become an area mostly used in academia and scientific arenas. Mosaic offered a new and visual way to access information on the Internet. It was also free for non-commercial use. Access to the web’s vast resources, even at it’s young age, was now made easy and cheap for anyone who wanted it.

Within a couple of years the web had grown to tens of thousands of websites and Mosaic (now Netscape) had made the web experience an easy and enjoyable one. People began to see the commercial possibilities and public access to the Internet became wide-spread.

Microsoft finally caught up with it’s Internet Explorer web browser, making it part of it’s Windows operating system and, more importantly, free for Windows users. By 1995 the web was becoming more easily accessible. The public was catching on and the web began to explode. Everything was bright and shiny and covered in that new-car smell.

Web Of Information: Prologue

I’m rewriting this post.

That probably violates some unwritten blog rule… but for the purposes of what I want to accomplish here, I need some continuity between these posts. Besides, nobody has commented on anything I wrote in the previous version of this, so maybe it doesn’t matter.

This started with a conversation among co-workers about some users we support in the development of their web pages. We see a lot of people who want to use lots of color, several different font types, animated images, background music, and a whole host of other, seemingly needless “enhancements”.

Why are people doing this? We aren’t telling them to do this. We simply provide basic training with a WYSIWYG web page editor. Perhaps the problem is that we aren’t telling what they should not do.

I made a comment along the lines that people ought to stick to very basic web pages; they need to learn less is more. Just headings, lists, tables and paragraphs, and that’s it. I pointed out that we need to teach people why simple designs are better designs and we need to provide reasons why this is true. So I started to think of some reasons. From there, the idea began to snowball into something a bit bigger.

There are few very good reasons why simple is better. The problem is that the why for each reason is a bit complex, and nobody has to listen to them. I can tell you that strong should be used rather than b because strong provides meaning beyond how the information it contains should be visually represented. But who cares? Why should you care? B gets the job done; it makes text turn bold. I have to make the argument that you need to think about information contained in HTML documents beyond their visual representation. But information on the web (within an HTML document) is probably processed by users in a visual manner 99% of the time it is accessed. Why should you care beyond the visual representation of your HTML documents? I hear it a lot with my work in CSS. “Why create CSS-based layouts when tables will get the job done and in a far easier, more compatible way?” It’s obvious, practical logic.

It’s a tough argument to make. I think the “obvious, practical” approach is a self-sustaining argument. I think that by focusing on the visual presentation of web pages people have become trained to think of the web only in visual terms. The idea that HTML documents (should) contain structured information, which can be processed, shaped, and changed into new forms for whatever purpose a user may have, and the benefits of that, is lost on a lot of people. So the need to create HTML documents that have good informational structure, rather than visual structure, isn’t very big. Without such a base of documents to work upon, the need for applications that process information on the web in a non-visual manner is not very big. Search engines are an example of an application that processes HTML documents outside of a visual means. They’re pretty powerful little tools too. But they work around poorly structured HTML documents – because that’s the nature of what’s out there on the web. There really isn’t any reason or motivation for web developers to stop the “obvious, practical” approach; and so it will continue to be the accepted method for developing web pages.

What I want to try and do over my next few posts is to try and convince readers why “obvious, practical logic” needs to be changed. Why you should care about the informational structure of your document. Why visual information, henceforth referred to as “presentation logic”, absolutely does not belong in HTML documents; that’s what CSS is for. And to open your mind to the possibilities and power of “simple” documents.

By now you should see that when I say “simple” web pages, I’m talking about HTML documents void of any embedded presentation logic. “Simple” is not the right word. When training users new to web development, they see documents without any color or snappy layout as being “simple” or “plain”. From a visual sense, they are correct. And since I’m targeting a visually-oriented user base, I might as well provide some means to connect the new approach I want to talk about with the old one.

I’m not entirely sure this post is any more clear than my original.

At some point I need to just move along.