Web Of Information: Philosophy In A Broken Web

Before I continue, let me provide a much belated disclaimer. I am not an expert. I’m just some guy who works with this web stuff for a living. These are just my opinions and thoughts. I welcome discussion on this. In fact, I crave it. This isn’t stuff that I can just turn to a co-worker and say “Hey Bob” and start in on a conversation about information theory and the web. The few of you reading this are probably the only ones I can get any sort of dialog with. Every opinion counts, so please share. Even if it’s to say I’m so off base that I might as well quit my job and become a lumberjack.

And so I move on, with my flannel shirts packed and my saw sharpened.

Philosophy?

Yes, philosophy. The web has become a game of interpretation and philosophy. It is still a “wild west” of sorts. Nobody can tell me that I must use an html or body tag within my HTML documents. In fact, there are quite a number of sites out there that use neither, but still deliver HTML content. Web browsers (most of them anyways) will render these pages exactly as if those tags existed. The HTML spec seems to be more suggestion than law.

Here’s the worst kept secret about the web: You don’t have to follow the rules.

So why should you?

This is where philosophy comes into play. You have to want to follow the rules. There needs to be some kind of motivation to follow the rules. User experience is, usually, the goal web developers use as their motivation. I’m going to use those HTML and BODY tags because if I don’t, users that have browser X installed on their system won’t be able to browse my website. There’s the motivation, and I think that’s the most common form of motivation found when constructing HTML documents.

We need to change this philosophy – this reasoning behind the motivation. The web isn’t about the user experience. The web is about information management. If you read through the history of the web, you would see that this is exactly what the web’s fouders had planned. It was the advent of the graphical interface web browser that caused a shift in web development philosophy. It became focused on user experience.

Going Off Track

Picking up where the history lesson left off…

During the mid 1990s the Internet became an entertainment playground. Web browsers were being created to operate outside of the HTML spec. They were using proprietary tags to create a more interactive user experience. The idea of information organization and structure was ignored as webmasters focused on user experience. The largest, and longest running symbol of this abuse is the use of tables to create multi-columned layouts.

The table tag was created for the purpose of structuring tabular data within HTML documents. A person or application that is processing an HTML document would expect table elements to contain only tabular data. That is no longer the case. It is similar to an author writing a novel entirely within Microsoft Excel. Can it be done? Absolutely. Should it be done? Absolutely not!

Another example would be the heading tags. Headings help break down large amounts of information into small sections. Each heading provides insight into what the subject of each group of information will be and makes it very easy to scan a document for specific topics within a large document. An application might be able to generate an abstract of an HTML document based off the order and organization of headings within the document. This could be quite useful, such as enumerating web pages within a large web site to help create organization out of the mass of information. But many webmasters found alternative uses. Headings provided an easy way to increase and decrease font sizes. Whole HTML documents have been written inside of heading tags just to increase or decrease the page’s font size.

The result is that applications which process information stored on the web, outside of popular graphical interface web browsers, can no longer rely on document structure (HTML tags and their content) to reliably discern meaning of the HTML document.

In the late 1990s the web became a collection of PowerPoint presentations. Access to information on the web was limited to visual means by graphical web browsers. User experience was based on visual experience. Providing structure and organization to information was ignored. The point of the web’s creation was to manage information. But now we can no longer manage information in any method other than in a visual manner.

The web has been broken.

Web Of Information: History

Academic History

Published in the July 1945 issue of Atlantic Monthly was an article by Vannevar Bush titled “As We May Think“. In this article, Bush describes a system that would help organize and make accessible large amounts of information. This system was called Memex. It was the start of an idea where information could cross-reference other material which could be pulled up in a moment’s notice. It was the start of an idea that would breed what we now know as the World Wide Web.

In the 1960s a Harvard graduate student named Ted Nelson began to develop his own ideas about the organization and flow of information. His work had a lot to do with engaging information that was not stored in sequential order and the ability of users to take their own path through information rather than in a linear fashion. This concept was readily demonstrated by his two books Computer Lib and Dream Machines which reference each other and are not written in any particular order. (You can get a feel for the books and what they looked like here.) By the end of the 1960s he had named his information system Project Xanadu. During his early work on Xanadu, Nelson coined the terms hypertext and hyperlink.

Hypertext refers to information (text) that contains references (called hyperlinks) to other pieces of information that may be brought up quickly by accessing the hyperlink.

In the early 1980s Tim Berners-Lee, an independent contractor at CERN, began to develop an information system based on the concept of hypertext. By 1989, a functioning prototype of this system was up and running, named ENQUIRE. Building upon his ENQUIRE experience, Berners-Lee proposed the creation of a new hypertext system, which would operate over the global “Internet” network, called the WorldWideWeb. The development of the web included the creation of the Hyper Text Transport Protocol, the Hyper Text Markup Language, and the Uniform Resource Locator. HTTP handled communications between server and client, HTML detailed the structure of the native document format for the web, and the URL was the mechanism used to reference documents contained within the web. Berners-Lee’s work for the web was made free for all to use and in 1994 he co-founded the world wide web consortium. The W3C would handle the development and management of various standards used in the web.

In April of 1993 Marc Andreessen created a web browser named Mosaic. It had a clean, graphical interface that made it unique among the few existing browsers. It was also released under a unique license that allowed the program to be used for free in non-commercial use. These factors helped make it the most popular browser of its day. Andreessen would go on to co-found Netscape Communications Corporation. NCC produced the Netscape Navigator web browser which was the product of Andreessen’s work on Mosaic.

The Growing Web

Up until this point that Internet had become an area mostly used in academia and scientific arenas. Mosaic offered a new and visual way to access information on the Internet. It was also free for non-commercial use. Access to the web’s vast resources, even at it’s young age, was now made easy and cheap for anyone who wanted it.

Within a couple of years the web had grown to tens of thousands of websites and Mosaic (now Netscape) had made the web experience an easy and enjoyable one. People began to see the commercial possibilities and public access to the Internet became wide-spread.

Microsoft finally caught up with it’s Internet Explorer web browser, making it part of it’s Windows operating system and, more importantly, free for Windows users. By 1995 the web was becoming more easily accessible. The public was catching on and the web began to explode. Everything was bright and shiny and covered in that new-car smell.

Web Of Information: Prologue

I’m rewriting this post.

That probably violates some unwritten blog rule… but for the purposes of what I want to accomplish here, I need some continuity between these posts. Besides, nobody has commented on anything I wrote in the previous version of this, so maybe it doesn’t matter.

This started with a conversation among co-workers about some users we support in the development of their web pages. We see a lot of people who want to use lots of color, several different font types, animated images, background music, and a whole host of other, seemingly needless “enhancements”.

Why are people doing this? We aren’t telling them to do this. We simply provide basic training with a WYSIWYG web page editor. Perhaps the problem is that we aren’t telling what they should not do.

I made a comment along the lines that people ought to stick to very basic web pages; they need to learn less is more. Just headings, lists, tables and paragraphs, and that’s it. I pointed out that we need to teach people why simple designs are better designs and we need to provide reasons why this is true. So I started to think of some reasons. From there, the idea began to snowball into something a bit bigger.

There are few very good reasons why simple is better. The problem is that the why for each reason is a bit complex, and nobody has to listen to them. I can tell you that strong should be used rather than b because strong provides meaning beyond how the information it contains should be visually represented. But who cares? Why should you care? B gets the job done; it makes text turn bold. I have to make the argument that you need to think about information contained in HTML documents beyond their visual representation. But information on the web (within an HTML document) is probably processed by users in a visual manner 99% of the time it is accessed. Why should you care beyond the visual representation of your HTML documents? I hear it a lot with my work in CSS. “Why create CSS-based layouts when tables will get the job done and in a far easier, more compatible way?” It’s obvious, practical logic.

It’s a tough argument to make. I think the “obvious, practical” approach is a self-sustaining argument. I think that by focusing on the visual presentation of web pages people have become trained to think of the web only in visual terms. The idea that HTML documents (should) contain structured information, which can be processed, shaped, and changed into new forms for whatever purpose a user may have, and the benefits of that, is lost on a lot of people. So the need to create HTML documents that have good informational structure, rather than visual structure, isn’t very big. Without such a base of documents to work upon, the need for applications that process information on the web in a non-visual manner is not very big. Search engines are an example of an application that processes HTML documents outside of a visual means. They’re pretty powerful little tools too. But they work around poorly structured HTML documents – because that’s the nature of what’s out there on the web. There really isn’t any reason or motivation for web developers to stop the “obvious, practical” approach; and so it will continue to be the accepted method for developing web pages.

What I want to try and do over my next few posts is to try and convince readers why “obvious, practical logic” needs to be changed. Why you should care about the informational structure of your document. Why visual information, henceforth referred to as “presentation logic”, absolutely does not belong in HTML documents; that’s what CSS is for. And to open your mind to the possibilities and power of “simple” documents.

By now you should see that when I say “simple” web pages, I’m talking about HTML documents void of any embedded presentation logic. “Simple” is not the right word. When training users new to web development, they see documents without any color or snappy layout as being “simple” or “plain”. From a visual sense, they are correct. And since I’m targeting a visually-oriented user base, I might as well provide some means to connect the new approach I want to talk about with the old one.

I’m not entirely sure this post is any more clear than my original.

At some point I need to just move along.

Back in Beige

The comments submission bit should be fixed. The server underwent some software updates on its OS and things got out of whack for a little bit.

This is what happens when you install OS updates labeled as unstable.

But I’ve kicked things around and downgraded stuff to the stable versions and everything seems happy again.

I’m working on a big post that I hope to get on here today. So check back soon.