Hpricot is a funky Ruby library for parsing and manipulating website HTML. Because of its ease of use it makes quick prototyping and testing of websites a breeze.
We use a number of tools for managing our customers SEO in partnership with Essential Marketer.
Here I’m going to show you how you can make use of Hpricot to check your site for a range of common SEO-related points.
First Off: Get the Document
Before you can check the HTML you need to get the document. Here’s how in Hpricot:
-
require 'rubygems'
-
require 'hpricot'
-
require 'open-uri' #Make opening your webpages nice and easy
-
-
doc = Hpricot(open('http://www.example.com'))
-
-
# We're going to define a function here that will come in useful throughout this example
-
# It simply splits a string into its component words and returns the count
-
def word_count(str)
-
return str.split(/ /).length
-
end
Checking Meta Tags
-
# First you need to check there are keywords
-
if doc.at('meta[@name="keywords"]')
-
keywords = doc.at('meta[@name="keywords"]')['content']
-
keyword_count = word_count(keywords)
-
if keyword_count < 4 || keyword_count > 10
-
puts '[PROBLEM] Keyword Count'
-
end
-
else
-
puts '[PROBLEM] No Keywords'
-
end
To check your meta description you’d simply change the above code to:
-
# First you need to check there are keywords
-
if doc.at('meta[@name="description"]')
-
description = doc.at('meta[@name="description"]')['content']
-
description_count = word_count(description)
-
if keyword_count < 4 || keyword_count > 10
-
puts '[PROBLEM] Description Count'
-
end
-
else
-
puts '[PROBLEM] No Description'
-
end
Checking Image Alt Tags
-
if doc.search('img')
-
images = doc.search('img') #Get all the page images
-
image_alts = doc.search('img[@alt]') # Get only those images that have alt tags
-
image_count = images.length
-
image_alt_count = image_alts.length
-
# After getting the counts you get the difference.
-
# Then you'll know whether you're missing alt tags
-
if image_count - image_alt_count > 0
-
puts '[PROBLEM] Mising Some Image Alt Attributes'
-
end
-
end
Checking Title
-
title = doc.at('title').inner_html
-
title_count = word_count(title)
-
if title_count < 3 || title_count > 6
-
puts '[PROBLEM] Title Length'
-
end
In Conclusion
As you can see all the tests follow a similar pattern:
- Get the doc
- Get the tag and its contents.
- Get the count.
- Check.
There’s much more that can be done than whats shown here and we plan on releasing a tool for our partner site soon so you can get a thorough grading of your own site. Possible areas you could check in acse you’re wanting to experiement are:
- Keyword checking of attributes and text.
- Inbound links - Scraping Google, Yahoo, etc.
- Server responses - Checking for nice 404, 500 responses and ensuring you have correct 302 redirects on your subdomains.














leave a reply