Still Alive : Google Mini

Yes I have played portal and I loved it.

Sorry for the absence. There are a few reasons for it. Partly because work is very busy at the moment, but also because there have been plans on the table to switch to a new blog server here on campus. The swtich should be happening soon. We’ll be using WordPress MU. Had some problems with getting LDAP authentication working on the thing using this plugin. The problem was the OpenLDAP libraries that PHP was compiled with were having problems talking to Active Directory over SSL. Eventually we discovered that adding TLS_REQCERT never to the ldap.conf file resolved the issue. (The issue having to do with OpenLDAP not knowing the CA of the SSL cert.) So now that’s been fixed the switch should be soon. I’ve already tested out importing from this blog into WPMU and it seems to work pretty well.

In other news I got my hands on a Google Mini. This is basically a miniature version of your own Google search engine. The benefit being you have greater control over how the search operates. It’s not a bad little machine for the price (a couple grand). But there are a few drawbacks. For example you can’t actually open this thing up without a bit of effort. There doesn’t appear to be any FTP or SSH access to the box either. Short of ripping the thing open with the jaws of life you can’t see or touch the contents of the hard drive.

This is especially annoying if you’re looking to style the search engine’s interface to match your website. Files such as logos, external stylesheets, and supplemental images for the layout must be stored on a separate server which you then link to. That’s a bit annoying.

However you can customize the interface anyway you like. You’re essentially given an XSLT to play with. By default you never see this XSLT, but rather are given a list of about 10 to 20 different properties (font colors and sizes and a URL to your logo). These values are then put into the XSLT and away you go.

However there is an advanced mode, and I like advanced modes. Here you get the raw XSLT to play with anyway you want.

But be careful! The XSLT is filled with just as many comments about how you shouldn’t really be touching any of the XSLT code and that failure to heed this warning will result in Google not providing any assistance once you’ve royally screwed up your interface. (Luckily there’s a button that resets the XSLT to its default, so you’re never completely screwed.)

Having virtually no XSLT experience I dived in without care or worry. Within a few minutes of exploring the default XSLT I felt at home. The syntax looks a bit bloated but it’s quite easy to follow. The only real trouble I had with it were all the extra hoops you had to go through to add custom HTML to the thing.

Because XSLTs must be well-formed every opening tag must have a closing tag and that closing tag must be within the same element that opening tag appears. Doesn’t sound like a problem right?

Well the way our template works is you have a top half and a bottom half and the content of the page itself goes in the middle. In the interests of time and not wanting to rewrite the entire XSLT from scratch I decided to create two XSL templates, one for the top half and one for the bottom half of the page. I would then call each template in place of where the Google equivalent template was called.

However this is not well formed at all! You see I have one template called “page_header” inside which I have an opening BODY tag, but no closing BODY tag. The closing BODY tag is in another template called “page_footer”. This makes the XSLT parser very angry.

The quick solution is to simply stick each half of the template in its own XSL:TEXT element. All content within this element is treated as text rather than code, so the XSLT parser doesn’t care if this content is well formed or not. However, this content needs to all be escaped as well. This means every < becomes &lt; and every > becomes &gt; and so on. Quite annoying, especially if you expect to make lots of changes to the HTML along the way.

But by far the biggest pain in the ass of the whole thing is Google’s antiquated HTML. Strewn throughout the 3000 or so lines of XSL are things like FONT tags and TABLE tags with BGCOLOR, COLOR, and ALIGN attributes. Google explains the reasoning for this in their documentation; it mostly has to do with compatibility of older devices — they want everyone to see basically the same interface regardless of their browser.

However in 2008 I think it’s time to ditch some of this. Focus on using stylesheets all the way and if a browser can’t use the stylesheets they get plain page. That’s part of the point of CSS — by separating presentation logic from data browsers that don’t understand the presentation logic can still use the data. Sure pages will look plain and have a certain 1994 web quality to them, but if they’re still using Netscape 3 or IE 4 those users are certainly use to it by now.

And the page remains usable to all browsers, which is the most important part here, especially for a search engine.

I do plan on eventually replacing all the FONT, B, and I tags with SPAN, STRONG, and EM tags, but the task is quite large and right now my Google Mini has a 4.01 transitional DOCTYPE so the FONT tags aren’t really hurting anyone.

Still, despite all these weak areas, I don’t hate the Mini. Having that Google engine under the hood is nice and the installation and setup was fairly painless. I’m sure for virtually nothing I could have put together an open source search engine running on second-hand hardware running Linux or BSD which would afford me much greater access and control, but I didn’t have the time or resources to spent on researching and installing such a system. I needed something quick, easy, and relatively cheap, and that’s the Mini.


2 thoughts on “Still Alive : Google Mini

  1. Hi – nice to se you really are back. I have visited your site freqently over the past few years and have learned alot about new standards from you. Thanks !

  2. Hahahaha! This was a delightful article, and I have experienced all these same obstacles. Diving into the XSLT without knowledge, care, or concern and cringing over all the bgcolor, color, and align attributes. Phew! The one thing you mentioned that I am currently trying to conquer is the DOCTYPE malfunction. Our Mini is causing IE7 to render in Quirks IE5, which ultimately makes our website’s default IE7 stylesheet useless.

    Do you have any brief advice on how I could change the DOCTYPE so that IE7 doesn’t have an identity crisis?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s