Tuesday, March 25, 2008

Getting to Page 1 on Google

"Content is King" they say. So, in the spirit of content, I decided to add our entire help-page system (Which is already made up of *.htm pages) to our website.

We would increase our relevant content quite significantly, and allow customer to browser our help pages online. A win-win proposition.

In reality though, this proved to be more work than what I initially estimated. Our help pages are built using the HTML Help Workshop, and my first task was to clean each individual page of all header and body tags. Each page would simply have to start with a h1 tag.

  • This format still works in the HELP file, which is great !
  • This allows us to embed the help files directly into ASPX pages with a simple wrapper and include statement:
Now, any change to a help file can be updated on our web page by just adding the file directly to our web directory.

Now, the problem was that IIS was configured to open *.htm as web pages. So, a quick reconfigure later, and the include works (and we have no other htm pages anyway).

My last problem is that each *.htm file includes links to other *.htm files. But each htm file is embedded in an ASPX page, so the links don't work.

Using a URL rewriter, I simply setup a rule to re-direct any requests for *.htm files to the corresponding *.aspx page. Nice!! I love URL re-writing!!

Google even managed to index the pages, which is great, BUT google fails miserably when reading the title tag. For some reason each title becomes an "unknown page", most likely due to the redirect.

While finding the ultimate destination correctly, google seems to 'use' the 'missing page' it get's from the URL redirect header (Or something to that effect), so now I have to find and resolve that issue.

No comments: