ControlFreak
LANGUAGES:
VB.NET | C#
ASP.NET
VERSIONS: 2.x
Prepare to Be Searched
Get Your Site Noticed by the People Who Matter Most
By Steve C. Orr
If your Web site provides useful content, services, or
products, there are people out there who want to know about it. But how do you
get the word out? You could send out copious amounts of spam to get noticed,
but that s not likely to earn the kind of reputation that most organizations
crave. Other forms of marketing and advertising are likely to bring more
positive results, but just because you don t have an advertising budget doesn t
mean you re out of luck. Read on to find out free ways to maximize your Web
site s status and get found by the people you re trying to reach.
Robots and Crawling Spiders
Sounds like an introduction to sci-fi movie, doesn t it? Actually,
robots, crawlers, and spiders are all names for custom software from search
engines like Google, Yahoo, and MSN Search that investigate what s currently
out on the Internet. If you have a public Web site, chances are it has already
been visited, scanned, and thoroughly indexed by one of these ominous-sounding
pieces of software. As intimidating as they sound, spiders can be your best
friend if you take the time to understand them they hold the key to every Web
site s search ranking. If your site sells discount toothpicks, then your site
needs to appear near the top of the list when users search for discount
toothpicks and the spiders hold the power to make that happen.
Functionally speaking, spiders do little more than record
key pieces of your Web page s HTML and follow the hyperlinks to see where they
lead. Conceptually, it s not very difficult to design a basic spider yourself. The
.NET WebRequest object is all you really need to retrieve the HTML of a page so
you can parse it and extract the hyperlinks to recursively parse other related Web
pages. While in the process, you can store important pieces of text in a
database for querying. Sites like Google and Yahoo have become masters of this
technique, and by understanding some details about how they do it, you can use
their global dominance to advance your own agenda.
A primary technique that spiders employ is to examine the
words used most often in your Web pages. Therefore, the text content of your Web
site is important for determining the ranking of your site in relation to
specific words and phrases. It s not very feasible (or advisable) to make major
changes to the content of a Web site just to increase search rankings. Instead,
there are other techniques that are likely to give better results. For example,
another extremely important item search engines examine is the title of a page.
In a basic HTML page, the title would be defined like this:
<html>
<head>
<title>THIS IS
THE TITLE</title>
</head>
<body>
Hello World
</body>
</html>
When the page is viewed by a user, its title shows up in
the title bar of the browser, as shown in Figure 1. As far as search engines
are concerned, it is best to have the title consist of a good sentence or two
filled with highly descriptive words about the page and/or Web site. This will
help search engines understand the primary focus of the Web page, thereby
increasing the site s ranking when people search for related topics.
Figure 1: A Web page s title shows
up in the title bar of the user s browser. It s a key element that is examined
by most major search engines to determine the subject matter of a Web page.
In ASP.NET 2.0, you re likely to have a master page, so
the simplest way to specify the title for each page will be more like this:
<%@ Page TITLE="MY PAGE TITLE"
Language="VB"
MasterPageFile="~/MyMasterPage.master" %>
<asp:Content ID="Content1"
ContentPlaceHolderID="CPH1" Runat="Server">
Hello World
</asp:Content>
This technique is fine for a small Web site, but for
larger sites you re in for a major maintenance chore if you ever decide to
change the titles of all the pages in your Web site. Luckily, ASP.NET 2.0 makes
it easy to change a page s title programmatically from the page s (or master page s)
code-behind file:
Page.Title = "Discount Toothpicks" 'VB 2005
Page.Title = "Discount Toothpicks"; //C# 2.0
Now all that s needed is a way to programmatically set the
page title from some kind of a data source. Luckily, the SiteMapDataSource is
perfect for this kind of thing. For more information about site maps, I suggest
you read Automate
Navigation Chores. Once a site map is set up, it only takes a tidbit of
code in the master page s code-behind to set the page title to the associated
title specified in the site map:
'VB 2005
If SiteMap.CurrentNode IsNot Nothing Then
Page.Title =
SiteMap.CurrentNode.Title
End If
//C# 2.0
if (SiteMap.CurrentNode != null)
Page.Title =
SiteMap.CurrentNode.Title;
Descriptions, Keywords, and Meta Tags
Virtually all search engines make use of the page title,
so it has a high payoff to ensure each page is thoroughly titled. However,
there are other specific HTML elements that some search engines also value
highly in their rankings. For example, Yahoo and MSN Search use the Description
meta tag when present; Yahoo uses the Keyword meta tag, as well. Here s a
syntactically correct example of these meta tags in action:
<html>
<head>
<title>THIS IS THE PAGE TITLE</title>
<meta
name="description" runat="server"
content="Discount Toothpicks" id="description" />
<meta
name="keyword" runat="server"
content="toothpicks, discount, teeth, cheap" />
</head>
<body>
Get yer cheap toopicks
here
</body>
</html>
Technically, from an HTML perspective, the runat and id
attributes are not required but by including them it permits you to adjust
their value via server-side code. For example, you can use a SiteMap for the
Description meta tag in a similar way that the title page was set in the
previous example:
'VB 2005
If SiteMap.CurrentNode IsNot Nothing Then
Me.description.Content =
SiteMap.CurrentNode.Description
Me.keywords.Content = _
SiteMap.CurrentNode("keyword").ToString()
End If
//C# 2.0
if (SiteMap.CurrentNode != null)
{
this.description.Content =
SiteMap.CurrentNode.Description;
this.keywords.Content =
SiteMap.CurrentNode["keyword"].ToString();
}
While SiteMaps don t officially support the keyword attribute,
you can add it anyway because extraneous attributes are permitted and can be
accessed programmatically using the syntax listed above.
Get a Buzz
Another extremely important factor that search engines
consider when ranking a site is how many other Web pages on the Internet link
to that site. For a Web site to be considered an authority on a particular
topic, it will need a lot of related Web sites pointing to your site, and the
effect is greatest when those sites rank highly (see Figure 2). Of course, the
rhetorical question here is how to get other sites to link to yours. There is
no single great answer to this although it sure helps if you ve got a lot of
advertising dollars to spend. Otherwise, you re stuck with gradually building a
reputation and getting other sites to link to yours via trading, begging,
bartering, and hard work. Sometimes sharing content with other Web sites is a
good way to get them to notice you and (more importantly) provide valuable
hyperlinks back to your site.
Figure 2: The Google toolbar plug-in
(available for Internet Explorer and Firefox) gives a good indication of a
particular Web site s ranking. This ranking is based primarily on how many
other Web sites link to the site.
Creating a buzz is a great way to launch a public site on
the right foot. Get the word out. Make sure all the sites that should know
about your pages are aware your site is online. Post in public forums
frequently, and always include a hyperlink to the site in your signature or
elsewhere in the posting. Get friends and coworkers to join in, too. If you re
proud of your site, make a big deal about it and see who notices.
Through some investigation, you might find some link
networks related to your industry. Basically, when you join such a network you
agree to provide links to other related Web sites, and they agree to link to
yours as well. Varying degrees of automation are generally involved to ensure
participation among members. If you go with this approach, be sure to stay with
link networks within your industry; straying into more general link farm
networks will often have the opposite effect; that is, watering down the focus
of your Web site in the eyes of search engines, potentially making it more
difficult to find.
When you feel your site is ready, most major search
engines provide a way to submit a site for indexing, which effectively queues
the site visit from a spider. To submit a site to a search engine, visit the
main search page and find a help link and click it to find their submittal
page. It s generally not necessary to submit a site to the search engine because
their spiders will eventually find it on their own, although it can sometimes
speed up the process. In fact, Google s spiders are so effective that Google
doesn t even provide a way to manually submit a site. Don t worry if your site
has already been indexed; spiders will visit again soon to investigate content
revisions.
What Not To Do
While all the previous tips provide valuable things that
can be done to improve a site s search ranking, there are also some things that
simply should not be done. For example, most spiders are unable to analyze
images, so you shouldn t hide critical search phrases inside an image unless
they are duplicated in the image s ALT attribute.
It s also advisable to not attempt to trick search engines
to increase a site s ranking. People have come up with all kinds of devious
ways to hide extra key words in HTML documents in an effort to boost profiles. Some
people mistakenly think injecting a wide variety of irrelevant words in a Web
site will help it to be found by a wider audience. My advice is to not get cute
like this. The major search engines have seen it all before. At best, these
extra words will be ignored; at worst, your entire site could end up being
ignored.
Generally speaking, the more Web sites that link to your
site the better. However, there are a couple exceptions. Web sites infamous for
undesirable content such as spam, warez, and other illegal activities might
give your Web site a bad reputation in the eyes of search engines if they
consistently link to your site. In other words, keep your nose clean so
questionable sites will have little interest in linking to your content.
Complex QueryStrings can also confuse spiders. For example,
do these two URLs output the same content?
http://www.SomeSite.com/ShowContent.aspx?ID=1
http://www.SomeSite.com/ShowContent.aspx?ID=2
The answer is, it depends. As a Web developer, you likely
know that the ID QueryString tacked onto the end of the URL could be mostly
irrelevant, or it could completely change the page that is displayed. Spiders
understandably tend to get confused by this kind of thing and don t know
whether to index them as separate pages. As a result, some spiders completely
ignore such pages. Because complex QueryStrings confuse spiders, they should be
mostly avoided, especially for pages that are meant to be highly searchable. The
Context.RewritePath method can be quite useful for providing spider-friendly
URLs without having to heavily modify a preexisting architecture that relies on
QueryStrings.
Private Parts
Perhaps there are parts of a Web site that should not be
searched. Maybe they contain personal information or sensitive copyrighted
content. The best solution is to use some kind of authentication, such as Forms
Authentication or Windows Authentication. Because spiders don t have user
accounts, they won t be able to access (or index) the information contained
within. However, if a full-blown authentication system is overkill for your
needs, there are some simple alternatives to keep specific pages away from
prying spider eyes.
One solution is the ROBOTS meta tag. To prevent a page s
content from being indexed, add the following meta tag to its HTML:
<meta name="ROBOTS" content="NOINDEX" />
To prevent spiders from following hyperlinks contained
within the page, add this meta tag to the page s HTML:
<meta name="ROBOTS" content="NOFOLLOW" />
While this solution can be useful for protecting a page or
two, it can start to become less manageable for larger numbers of pages. If
entire directory trees need to be protected, then creating a robots.txt file in
the web root may be a better solution because it centralizes the management of
such details. To prevent the entire Web site from being indexed, the robots.txt
file should contain the following text:
User-agent: *
Disallow: /
This tells all (*) spiders to ignore pages starting at the
root (/) of the Web site. It s easy to be more selective about which files to
exclude, such as in the following example that denies (only) Google permission
to index content in the web root s subdirectory named secure , as well as the /data/logs
subdirectory:
User-agent: Googlebot
Disallow: /secure
Disallow: /data/logs
It s also possible to grant different levels of access to
spiders from different search engines, and other advanced tricks that are
beyond the scope of this article. For more information, see http://www.robotstxt.org/wc/faq.html.
Although there is currently no ratified standard that is
guaranteed to ward off all search engines, most voluntarily comply with the
techniques mentioned here.
Search Is King
Being easily found on the Internet is an important
accomplishment for any public organization. Being able to find information can
be just as important. For more details on how to retrieve and use search
results programmatically, see Search
Box.
Obviously, the topic of searching and indexing the Web is
far more complex than anyone could hope to cover in an article or two;
otherwise, companies like Google and Yahoo wouldn t be able to rake in such
enormous amounts of money from their expertise. Armed with the right knowledge,
and building on the information you now have, maybe you too can scoot up to the
table and grab yourself a piece of the pie.
Steve C. Orr is an
MCSD and a Microsoft MVP in ASP.NET. He s been developing software solutions
for leading companies in the Seattle
area for more than a decade. When he s not busy designing software systems or
writing about them, he can often be found loitering at local user groups and
habitually lurking in the ASP.NET newsgroup. Find out more about him at http://SteveOrr.net
or e-mail him at mailto:Steve@Orr.net.