Nvu

This shows how “in touch” I am. This gem of free software totally slipped under my radar.

Nvu is a WYSIWYG(ish) web site editor, very similar to Dreamweaver. Nvu is released under “the MPL/LGPL/GPL tri-license” which means you can download, muck with it, distribute it, all for free.

What’s really nice is the underlying rendering engine is based on Mozilla Composer. So the rendering of the page is rock solid.

I’ve received a ton of e-mails in the past about how the Skidoo layouts break in Dreamweaver. Well they don’t break in Nvu.

The only thing I’ve found in my testing thus far is my half-finished drop-down menus don’t render in Nvu the way they do in FireFox. But when compared to any other editor out there (DW/FP) Nvu blows them all away.

So check this out. Use it. Love it. Force all your friends and enemies to use it.

But seriously, go download Nvu and give it a try.

Got Alt?

I’ve been going back and forth a little bit with Ian Hixie who is the man behid the alt text spec among many other things.

1. He is not dumb.
2. He will play a significant role in how CSS and the web in general develop over the next few years.
3. 1 + 2 = a good thing for all of us

So you may want to at least bookmark his blog.

He brought up an obvious point that I’ve completely missed (which is not unusual for me). That has sparked the following rant.

Alt text serves a key purpose. It’s not just some words that pop up as a toolbar over an image if you leave your cursor there too long. Nor is it just some quick means to identify the contents of an image for text-based browsers or screen readers.

Alt text is one of the many underappreciated keys in HTML that allows an HTML document to be self-contained and self-sustaining long after all the external objects the HTML document reference are gone.

What do I mean by that?

Ian’s example was Google cache. My example would be the Internet Archive. Long after you stop developing content, the HTML documents you create will still be around.. somewhere. And not everything that you have tied into that document will be with it. Images, stylesheets, javascript… all these resources external to the HTML document will be out of reach.

When all that is left is your HTML, will your page retain all the information and meaning it once presented when all those resources were still available?

It should!

Here’s the scenario: hundreds of years from now a digital archeologist is combing through a bunch of ancient hard drives recently recovered from the ruins of a long lost data center. A reconstruction team is able to recover data off these terribly large, terribly small capacity disks. That data are your HTML documents. Sadly, the images folder is too corrupted for repair, and only the HTML files themselves are left behind. The documents talk of great accomplishments and amazing and unexpected results that will set the company in a new and exciting direction. But the exact nature of what is being discussed is within those images. The context of the words is lost because the images containing the key information are gone, and there is nothing that describes what was in those images. Sadly, the archeologists put the data into storage, in the basement of some digital museum, and the world will never learn about how your dazzling new CSS-based layout helped increase ease-of-use and, in turn, profits, earning you employee of the year.

Very sad.

If only you had included descriptions of each image within the alt text… the archeologists would go on the news about their amazing find, about how you were the digital god of the early computing age. Several statues would be erected in your honor, and 3 elementary schools (and 1 pre-school) would be named after you.

But that is not to be.

Now this bit of story is fun, but it makes the assumption that you care about how your information is understood hundreds of years from now.

Alright. Let’s talk about right now.

You’ve just developed your amazing web site design. It blows the world away. It’s so hot even Slashdot links to your company. But.. uhoh.. your company has just been hit by the Slashdot Effect. Your image server is dead. Your document server is slowing spitting out HTML. People are able to somehow get to your site, but those images won’t be loading anytime soon. So what are your potential fans (and customers) going to be able to see? No graphs. No screenshots. If there’s no alt text, there’s nothing to ground the surrounding text. People will dismiss your content as useless, make up their own minds and fill in very large blanks with their own misunderstanding of what you’re trying to say.

There goes that employee of the year award. The statues. The schools… well, you get the idea.

The point is this: clients will not always have access to the external resources your document references.

Images, javascript and the like are all superfluous to an HTML document (web page). They are there to better aid the client in interpreting the data (usually the humans with decent eyesight using a modern web browser type of client) but should not be the only means to access that information.

So what about stuff that’s image-centric, such as color blindness tests, or “magic-eye” images or examples of optical illusions? Well you can use text to describe each one of those. That’s your alt text. Will the client be able to get the same use out of the page if it can’t “see” the images? No.. but there’s a difference between use and information.

I may not be able to see the color-blind test image, but reading a description along the lines of “an area filled with different colored dots organized in a pattern such that people who suffer from type X color-blindness will see the number 7 and others the number 8” is sufficient enough information for non-visual users to act upon.

Bah.

Running out of brain juice again. Please keep in mind all of these random posts of mine are pure stream-of-consciousness (thus the poor organization, spelling, grammar, etc…)

It’s about information. Screw presentation. Hah!

Alt Text

This past week has been an interesting adventure on the always exciting topic of alternate text within the scope of HTML.

Last week a co-worker of mine, Steve, put together an extension for FireFox that would display an icon in place of any broken or missing images. He created this extension because FireFox wasn’t doing anything to show that a missing or broken image existed in a given document. When you oversee several hundred (if not thousand) HTML documents, it’s nice to have some kind of visual cue that something is wrong in a page.

But that’s a bit odd. No broken image icon?

So I put together a quick test case and sure enough, no broken image icon once the page was loaded.

On a hunch, I deleted the DTD and tried again. This time the broken image icon is there. Why? Because the browser is in quirks mode.

So in standards compliance mode, Mozilla was not displaying a broken image icon.

So I headed over to bugzilla, did a quick search, and didn’t find anything related to this bug. So, thinking I was hot stuff, submitted a bug report. As has been the case with all previous bug reports of mine, someone posts a response in a day or two proving my report to be a duplicate of something else.

This case was not entirely different.

The bug I eventually marked my own report as a dup was bug #41924. This bug dates back quite a ways, to the year 2000. It starts out trying to tie down the spec for how Mozilla should handle alt text and broken/missing images. The current spec may be viewed at http://www.hixie.ch/specs/alttext. There is an accompanying advocacy document that is also worth reading.

The gist of the arguments presented across the bug report comments and the alt text spec are as follows.

Quirks mode is there to deliver what users expect from old, 1990s browsers. To be in line with those expectations, a broken image icon will be displayed when an image is missing or broken.

Standards mode is about how to apply various W3C docs such as the UAAG, WCAG and the HTML 4.01 spec to rendering a web page. The key idea here is that the alternate text attribute for images should provide enough information to be an adequate replacement of the image itself. If the image is broken, the alt text is displayed in its place. This substitution should not alter the information delivered by the document, only the representation of that information. If an alt attribute is empty, it is assumed there is no information in the image, and so broken images with no or missing alt text can be simply replaced with an empty inline element (essentially removed from the document). Since there is no information loss, this it is reasonable for a user agent do to this.

But I’ve still got a problem. I’m right there with Steve, having to help manage thousands of HTML docs with over a hundred separate content providers. It is not a simple task to try and educate these users on what feels a lot like theoretical HTML design. It’s not easy trying to reeducate users to think of the web and HTML pages in terms of information and how that information is interpretated. It requires a certain perspective and motivation that is really very difficult to have unless doing web development is your primary job. So my problem is that the authors of these thousands of documents don’t care enough to insert ALT text. Since that’s the case, if they develop a page with a broken image, I’m not going to see it nor be able to fix it if I’m using FireFox.

And here’s the sweet irony of my predicament. The template system into which user content is published was developed by me. This template system creates valid HTML 4.01 documents, so most of the pages I’m viewing are being seen through standards compliance mode in FireFox. The fix for my situation, however, may be quite easy. Since I created the template system, I can change it. I can make it so that on our development box, that template isn’t 4.01 valid and trigger quirks mode. The only gotchya is to make sure users still develop in standards compliance mode, otherwise the switch from the development to production box will also mean a switch in browser rendering mode, meaning the user might see an entirely different page in product than the development box.

Why does crap like this have to be so complicated?

Which is why I start to look back at the extension and think that’s probably the best way to go for now.

It is worth noting that FireFox does not yet meet its own alt text spec. Bug 180622, reported by the man behind the alt text spec, notes that FireFox does not preceed alt text with an icon to indicate that it is in fact alt text, as the spec states.

Bah. Running out of direction for this post, so that’s where I’ll leave it.

CF: cflock

CFLOCK is used whenever there’s a chance that more than one process will try to manipulate a given resource at the same exact time. You might use cflock with a database so that if a read operation occurs during a write, the read operation will have to wait for the write to finish. That way the read operation will have the most up-to-date information possible. Normally you don’t need to worry about stuff like this. I use this in a room selection application (where students can pick the room they live in for the following year). I cflock an entire block of logic that first checks to see if the room being selected is available and, if it is, record the room selection. This way I don’t risk someone else selecting the room between the read to check for availability and the write to record the selection. If I didn’t do that, there’s a chance rooms could become double-booked.

When working with a file that is used in an application, you almost always use cflock. Now I don’t mean you cflock whenever someone is uploading a file or you’re going to read from some random htm file (like a template system) because your application isn’t going to be altering the file’s contents (more than once, in the case of writing an uploaded file). But if you’ve got a file your application is going to read and write to throughout the application’s life, you need to protect against the chance two writes will occur at the same time. For example, in a user account claim application I have to write usernames to two separate files. One gets picked up by a process that creates e-mail accounts, another that creates LMS accounts. There’s a chance two people will try to claim an account at the same time. So I’ll cflock the file during the write, so no other CF process tries to write to the file at the same time. If that happened, you can wind up with a corrupt or empty file. Guestbooks, blog comments, etc are examples of applications where you’d cflock file access (assuming you’re using files and not a database).

Now why CFLOCK session variables?

Because you can’t assume a user will only make one connection at a time. When a user requests a page, the web browser begins to make several simulatneous connections to download the images, CSS files, javascript, etc.. on top of the HTML for the page. If you’ve got one or more of those files setup as a CFM (a dynamic image, dynamic stylesheet, who knows what) you’re application.cfm will run as well. If you’ve got logic in your application.cfm that manipulates session variables, you run the risk of having your session variables being changed in mid-process, creating either corrupted data (less likely) or incorrect data being acted upon (more likely).

For example, let’s say you’ve got a voting system. The voting system uses a session variable to set whether or not you’ve voted. A person logs into this system, makes their vote selection, and double-clicks the “vote” button. Your CF server now has 2 separate vote processes from the same person that it will process. The application logic is:

1. Check if allowed to vote
2. Record vote
3. Flag user as having voted

Step 2 is a database operation. In computer time, that step is going to take forever to process. While process 1 is working on step 2, process 2 comes along and passes the check in step 1, and gets in line for step 2. Process 1 moves on to step three and records the user has voted, but only after process 2 has started recording the vote for a second time.

Your user has now voted twice because of a race condidtion with session variables.

To fix this process you can do a couple things. You could put all three steps inside a single cflock block. You could swap steps 2 and 3 and then put steps 1 & 2 (check/record allowed to vote flag) then do your database options. The latter option frees the lock sooner to help keep resources to a minimum and is a potential speed increase but you might lose votes. You could wrap the database operation in a cftry/catch block and reset the flag if needed, but now you’re getting overly complicated in a system where just wrapping the 3 steps in a cfblock works fine.

So why not wrap every page in a cflock?

Because you will have pages that take a few seconds to process. If a user double-clicks a button, like in the example above, they will have to wait twice as long for the results to be displayed. If they get bored/angry at the wait, they might press that button 10 or 20 more times thinking it’ll go quicker, when in reality it’s only slowing things down. At that point, you’ve got 20+ processes waiting for that lock to open up. CFMX recommends your number of simultaneous processes allowed in CF be 3 or 4 times the number of CPUs in the machine. I’ll tell you that we have ours set to 12. (3 * 4 (2 P4s, each of which act like 2 separate processors)). So at 20+ processes, each waiting for that lock, each on the stack of running processes, you’re entire site (or your CF applications at least) grind to a halt. Now every user (not just the one) starts clicking on that button to speed things up. You eventually wind up with a really nasty situation where you’ve got hundreds (even thousands) of processes in the queue waiting to be processed by CF. Your site becomes unusable for minutes, maybe even hours.

That’s why you need to be very very efficient in your use of cflock. They can be a source of severe bottlenecking. I have a cf_sleep custom tag that gets CF to hang for a few seconds. I don’t use it much (if ever) but the way it works is by nesting cflocks on the same resource. Create a page with a 20 second sleep, reload it 20 times, and you’ll shut down the site for 20+ seconds. Very nasty.

If you can, set yourself up with a performance monitor (this is a Windows thing. start->run->perfmon) set on your CF server and you can see this in action yourself. (Assuming you’re setup, monitoring all the CF related monitors.) Turn on highlighting in your performance monitor (CTRL+H or click the lightbulb icon in the top toolbar). Then select the “running requests” monitor from the list in the bottom section of the performance monitor window. The highlighted (white) line you see shows you how many current requests are being processed.

Create a script with a 20 second sleep in it. Load the page then check your performance monitor and you’ll see that there is 1 process running. Now hold down CTRL+R in your browser for a few seconds. You’ll get maybe a couple hundred of these processes going. Now check out the performance monitor. The running requests will max out at 12 (which is what I set it to as mentioned earlier). Now check out the “queued requests” monitor and see how that spikes up.

The site is essentially useless while you wait for those processes to finish. All this because of 1 user holding down CTRL+R in a browser for a few seconds. That’s the downside of cflock (and any slow ColdFusion page). Try changing the page so the sleep is only a second and do it again. The server takes a little longer to go unresponsive, and it recovers more quickly. But you start to see how CF can be exploited.

There’s a configuration option to kill any process that runs over X seconds long in the administrator interface. That offers some protection from prolonged denial-of-service attacks, but not much. (I typically set it to 30 seconds, but 5-10 seconds might be better for most people.)

Increase the number of simultaneous requests? That only prolongs the inevitable and when you hit that max, it takes much longer to recover because your server is doing a lot more than it can handle.

So be careful of bottlenecks in your CF code, CFLOCK being potentially one of the biggest in your application.

CF: Session Hijacking

ColdFusion uses two unique values to keep track of user session information. These values are CFID and CFTOKEN. They are stored as cookies but can also be passed along the URL and inside POST data.

Session variables are a place to store information specific to the user and to the current session (such as whether or not a user is logged in).

It is possible to hijack a user’s session by supplying the correct CFID and CFTOKEN values to the server, either on the URL, or wherever else you want.

The two numbers combined represent a space of 10^15 numbers. Average brute force will take half that amount, so 10^15/2. Figure 100 attempts per second, and the average time it would take to brute force is in the neighborhood of 150,000 years.

There’s the 1 in a jillion chance someone might guess a correct CFID and CFTOKEN, but that doesn’t really worry me much.

Your more likely to see someone hack your application by doing a little packet sniffing (or looking over someone’s shoulder) and capturing the CFID and CFTOKEN that way.

Packet sniffing you can curb by going over SSL with your application. Over-the-shoulder attacks can be stopped by not passing the CFID and CFTOKEN values on the URL (which CF does with cflocation tag by default… go figure).

If the user has a virus on their machine passing out their cookies, well that user has a bigger problem than having their session hijacked.

So how do you protect against session hijacking? You store the IP address as a session variable. Compare the IP in the session variable to the user’s IP (stored in cgi.remote_addr) and if they don’t match, you’ve got a hijacking attempt.

… But there’s a catch.

AOL, for example, uses a proxy server for their packaged browser. This means everyone comes from the same IP address. Not cool. Now AOL users can simply go into their browser settings and kill the proxy config and they’ll surf using their own IP, but can you really ask users to do that for every little application we have using session variables?

Also AOL users won’t be the only ones behind a proxy server.

And if you have session timeouts set to days, dial-up users and any other user on a network with shared IP addresses will eventually get an IP address of a former user. And they might be able to get into the application that way.

So what can you do? Not much. You won’t ever be 100% certain in your security. It’s all about managing risk. In this case, you’re at a fairly low risk with hijacking if you’re comparing IP addresses.

But here’s what I do to take it 1 step further.

Combine the user’s IP address and browser string (cgi.http_user_agent) into a single string. Then MD5 hash the thing. Store that hash as a session variable. Recalculate and compare hashes as the first step in any user request (in other words: put this logic in your application.cfm file, and put it up at the top). And that should protect you well enough. The browser string provides a little extra security in the event of proxy users hitting the site.

Also keep your session timeouts to a low value (30minutes.. 2 hours MAX, unless security isn’t a big issue for your application).

When you detect a hijack attempt, you might not want to kill the session because the legit user also gets locked out. You can reset CFID and CFTOKEN on the user with the CFCOOKIE tag then redirect the user to the enterance page.

ColdFusion

I’ve written some long-winded e-mails about CF development recently and, rather than see them go to waste, I figured I would put them here and have my blog serve not only CSS matters, but CF as well. I’ll put these entries into their own category for those who care to follow.