asp:Feature
LANGUAGES:
C# | VB.NET
ASP.NET
VERSIONS: ALL
Smart SEO
URL Rewriting Using ASP.NET and IIS 6.0
By Jeffrey Hasan
Search Engine Optimization (SEO) has become the major
driver for Uniform Resource Locator (URL) rewriting in Web sites. A URL is the
online discoverable address for a Web resource, including Web pages, and
Web-hosted documents and images. For simplicity, this article focuses on Web
pages as the Web resource of primary concern.
In the world of online search engines, page rank and link
equity are a measure of how relevant and popular a specific Web page is for a
certain search request. High page rank is what enables your Web page to appear
on the first page of Google search results, instead of the thirtieth. In the
ultra-competitive world of online commerce, this difference is significant, as Web
site visitors are most likely to select their Web site destination from the
first or second pages of choices.
With SEO, the Web site becomes less important than a
specific Web page, because search engines deconstruct your Web site into
relevant pages. High page rank and link equity cannot be achieved overnight;
instead, they must be earned over a period of time. And once earned, they must
be preserved which means your Web page URLs cannot change. Otherwise, you must
start rebuilding the page rank and link equity from the ground up.
URL rewriting enables your Web application to support
SEO-friendly URLs and logical site navigation, while using a different physical
navigation structure in the backend. URL rewriting also allows you to make
necessary adjustments to your Web site navigation structure without
compromising your search engine rankings.
Note: This article
focuses on URL rewriting using IIS 6.0 and all versions of ASP.NET. The newer IIS
7.0 and ASP.NET 3.5 integrate differently and provide different options for URL
rewriting. This will be the topic of a companion article to appear in a future
issue of asp.netPRO.
What Is URL Rewriting?
URL rewriting involves intercepting an incoming HTTP
request, then remapping the request to an alternate URL. URL rewriting can be
performed at the Web server level or at the Web application level. At the Web
server level, the URL rewriting is handled directly by Internet Information
Server (IIS), in the case of Windows server platforms. At the Web application
level, the URL rewriting may be handled at various points within the
application execution workflow. Your choice of intercept point depends on the
nature of the URL rewriting and the type of Web resource being requested.
What Is an SEO-friendly URL?
SEO-friendly URLs avoid using querystring parameters and
use qualified path formats instead. Here s an example of an SEO-friendly
(static) URL for an HP printer on an e-commerce site:
http://www.jeffshop.com/printers/hp/P1006.aspx
Here s an alternate (dynamic) URL for the same product,
but constructed in an unfriendly way for SEO:
http://www.jeffshop.com/printers.aspx?mfr=hp&model=p1006
The contrast between the two URLs is clear: the friendly
(static) URL has a simple path structure, whereas the unfriendly (dynamic) URL
uses multiple querystring parameters. Search engines interpret unique
combinations of query parameters as different Web pages, which dilute the
relevance of the printers destination node within the e-commerce site.
Even better from an SEO perspective is to load up the
product URL with rich keywords, as in:
http://www.jeffshop.com/forsale/hp-laserjet-p1006-printer-monochrome-17-ppm/876543.html
This URL format preserves the SEO-friendly simple
qualified path structure, and adds a keyword-rich sub-path section to the URL.
In this example, 876543.html could correspond to a product identifier, such as
a stock keeping unit (SKU).
Finally, you could eliminate the .html Web page file extension
and publish a simplified SEO-friendly URL, as in:
http://www.jeffshop.com/forsale/876543/hp-laserjet-p1006-printer-monochrome-17-ppm/
In this example the product SKU is part of the URL, but is
not called out as such. The Web server is instructed how to interpret this sub
portion of the URL, and could in fact ignore everything that comes after the
product SKU portion of the URL. In effect, the Web server considers the
relevant URL to be:
http://www.jeffshop.com/forsale/876543/
In reality, this URL doesn t have to physically exist on
the Web server. In fact, SEO-friendly URLs typically do not, being instead
logical constructs that are tailored for the benefit of the search engine and
the site visitor rather than the application developer.
Now you can see the need for a URL bridge mechanism that
rewrites logical URLs into their associated physical URLs. This URL bridge will
interpret incoming SEO-friendly logical URLs and map them on the Web server to
the actual physical resource address. In the above example, the Web server
could process the incoming SEO-friendly product URL, then redirect the user to
the actual product page URL, as in:
http://www.jeffshop.com/catalog.aspx?sku=876543
How IIS 6.0 and ASP.NET Process HTTP Requests
Remember when you were first learning ASP.NET, and you
skipped over the section that described the processing workflow for incoming
HTTP requests? And how the ASP.NET engine processes the Web page and returns
the HTTP response output via the Web server? Right, that section. Well, it s
time to go back to the textbook, because you need to know how HTTP requests are
processed by the Web server and the ASP.NET engine. Only then can you
understand where HTTP requests can be intercepted, should you need to rewrite
the requested URL and redirect the site visitor to an alternate location.
Grab a textbook if you have one handy; otherwise, refer to
Figure 1, which summarizes the important steps in processing incoming HTTP
requests and returning HTTP responses.
Figure 1: The ASP.NET HTTP pipeline.
Briefly, here s how it works. The client browses a Web
page by typing a URL into their browser. The browser relays the HTTP request to
the Web server over the Internet. The Web server, in turn, parses the URL to
determine the type of resource being requested, whether it is a Web page, an
image, or some other type of file.
Let s assume the client requests an ASP.NET page, with a
*.aspx extension. The Web server, in this case, IIS, will direct the HTTP
request to the ASP.NET engine, alternatively known as the ASP.NET HTTP pipeline.
The HTTP request passes through a variable number of HTTP modules, which
include standard modules (e.g., for session state management and Forms
Authentication), plus any custom modules that the Web site developer has installed
in the pipeline. After passing through the sequence of HTTP modules, the HTTP
request reaches the dedicated HTTP handler for *.aspx Web pages (other Web
resource types will have their own dedicated handlers, such as *.asmx). The
HTTP handler executes the compiled code for the Web page that was requested by
the client. Finally, an HTTP response is returned to the client through the
ASP.NET engine and out of the Web server. The HTTP response includes an HTTP
header code that could include:
- HTTP 200. The requested resource was found at
the URL location and a response has been successfully returned.
- HTTP 301. The requested resource has been
permanently redirected to an alternate URL location and a response has been
successfully returned.
- HTTP 404. The requested resource was not found.
Web users know this status code by the common Page Not Found default status
page.
From a search engine perspective, these are the most
significant of the various HTTP status codes, and the implications are noteworthy.
A search engine spider successfully indexes HTTP 200 and 301 responses, and for
the latter it will permanently record the new URL location for the Web resource
and migrate the previous Web resource s page rank and history. However, for
HTTP 404 responses the search engine spider will drop the Web resource from its
index, thereby losing the page rank and link equity that this Web resource
formerly had. This is why you should always redirect users from old pages to
new ones, rather than present them with a Page Not Found error. URL redirection
allows you to preserve hard-earned page rank and link equity, and so is
essential for any SEO-focused Web site.
URL Rewriting in ASP.NET and IIS 6.0
Before you can rewrite a URL you must intercept the
incoming HTTP request and inspect it. Let s assume that the Web site visitor
has requested an ASP.NET Web page on your Web site. There are five appropriate
locations to intercept HTTP requests within the Web server and ASP.NET HTTP pipeline.
These are shown in Figure 1 by numbered location; they are:
1) Within
IIS using an ISAPI filter, which is a plug-in that extends the functionality of
IIS.
2) Using
a custom HTTP handler to replace the default ASP.NET engine s *.aspx handler.
3) Using
a custom HTTP module to intercept and inspect the HTTP request before it
reaches the ASP.NET handler.
4) Within
the hosting Web site s Global.asax application file (e.g., in the
Application_BeginRequest event handler).
5) Within
the compiled Web page code (e.g., in the Page_Load event handler).
Of these five options, we ll exclude looking at ISAPI
filters and custom HTTP handlers, which are the first and second options,
respectively. ISAPI filters are beyond the scope of our focus. Custom HTTP
handlers are also out of scope, mainly because the URL rewriting modifications
that we could make there can mostly be made in other code constructs instead,
such as within HTTP modules.
Let s examine each of the remaining three options, and
expand on how URL rewrites are accomplished using ASP.NET and IIS.
URL Rewriting in the Web Application s Global.asax File
The Global.asax file is an optional file that resides
within the Web application, and which provides programmatic access to
application lifecycle events and the HTTP request and response messages. The
Global.asax file is like a simplified HTTP module, one that cannot be reused
across multiple applications, but which is easier to use because it requires no
special configuration within the Web application. With the Global.asax file you
can simply start coding into application and session-level event handlers. The
IIS Web server will bounce back any direct HTTP requests that are made on this
file. In short, the Global.asax file lets you stay focused on code and not be
concerned about deployment issues.
Figure 2 illustrates a redirect scenario, where the site
navigation for a product Web site has been redesigned. Originally, the Web site
served product pages from a subdirectory named forsale. However, the site
navigation has been flattened so that the product pages have all been moved to
the root of the application. Solution Explorer contains two pages named
876543.aspx, one under the Web root (the active product page), the other under
the forsale subdirectory (the obsolete product page). The Web site has built a
high page ranking for the product page under the forsale subdirectory, so they
want site visitors to continue to access this subdirectory but these incoming
requests must be redirected on the server to the corresponding page in the Web
root.
Figure 2: URL rewriting for an
obsolete Web page.
Figure 3 provides the code listing for the Global.asax
Application_Start event handler. The code first looks for HTTP request URLs
that contain the forsale subdirectory. If none are found, the request is
allowed to execute normally. However, if a match is found, the code uses a
regular expression to extract the Web page name and rewrite the URL so this
page is served from the Web root instead of from the forsale subdirectory.
void Application_Start(object sender, EventArgs e)
{
// Code that runs on
application startup
string requestPage;
string requestURL =
Context.Request.Url.ToString().ToLower();
if
(requestURL.Contains("/forsale/"))
{
Regex regex = new
Regex(@"([0-9]+).aspx", RegexOptions.IgnoreCase);
MatchCollection matches =
regex.Matches(requestURL);
if (matches.Count >
0)
{
// Rewrite the URL
requestPage =
matches[0].Groups[1].ToString();
Context.RewritePath("../" + requestPage + ".aspx");
// Set HTTP Status
301 - Permanent Redirection
Context.Response.Status = "301 Moved Permanently";
Context.Response.AddHeader("Location",
Context.Request.Url.AbsoluteUri);
}
}
}
Figure 3: Using
Context.RewritePath in Global.asax.
After the URL is rewritten, the code manually updates the
HTTP headers to record an updated HTTP status code 301 (permanent redirection)
and the new fully qualified location of the replacement page. Without this
update, the new page would have rendered, but the browser URL would have displayed
the old URL and the HTTP status code would have read 200. This is adequate
for people visiting the site, but not for search engine spiders, which need to
record permanent redirections. Recall that this is the only way for the search
engine spider to transfer to the replacement page the page ranking and link
equity of the previous page.
Note that you should not use Response.Redirect in place of
the RewritePath method, because this issues an HTTP status code 302 back to the
client, which indicates a temporary redirect. This status code is confusing to
search engine spiders, which will not drop the original URL from their index,
but they may also record the destination URL. If the original and destination
URLs appear very different, or if they are in a different domain, you risk
having your site dropped, and with it all the link equity that you ve earned.
This is because less reputable sites make it a practice to guide you to one Web
site and then redirect you to a completely different one to which you did not
intend to go. So to be safe, never issue an HTTP status code 302, either
directly or indirectly. Always use HTTP status code 301 to indicate permanent
redirection.
URL Rewriting in an ASP.NET Web Page
From an SEO perspective, a URL rewrite within an ASP.NET
page occurs too late in the process lifecycle because the HTTP request has been
allowed to get all the way to the destination page without interception. From
an SEO perspective it is unlikely that you would want to rewrite a URL at this
late stage. For one thing, the original destination Web page is now already
rendering, which defeats the purpose of redirecting the visitor somewhere else:
It would be more efficient to redirect them earlier, if that s what you intend
to do.
An SEO-driven URL rewrite and redirect is not about
intelligence, it is about rules. It is possible that an ASP.NET Web page
code-behind file contains some decision logic that is needed for a URL
redirect. But this would typically be a different kind of URL redirect, not related
to an SEO-driven URL rewrite. In short, do not use the ASP.NET Web page
code-behind file for SEO-driven URL rewrites, although feel free to use this
location for other types of URL redirects.
URL Rewriting Using an HTTP Module
HTTP modules are assemblies that plug in to the ASP.NET
HTTP request pipeline. They intercept all incoming HTTP requests to a Web
application and contain various event handlers that can be used to access the
details of the HTTP request and modify it, if necessary. Custom HTTP modules
can be developed in .NET, deployed on the Web server, and configured for use by
registering it in a Web application s configuration file (Web.config).
HTTP modules are straightforward to write, but the URL
redirection logic can be tedious. If you don t want to write your own URL
rewriting infrastructure code, I recommend using an open source module called
UrlRewriter.NET, which allows you to configure your custom URL rewriting rules
directly within the Web application s Web.config file using regular expression
pattern-matching syntax. Not only can you encode rules, but you can add
conditional statements to separate out rule groups based on the incoming
request type.
Figure 4 shows an example of a Web.config file for an
application that uses UrlRewriter.NET with a regular expression pattern-matching
rule to map a keyword-rich product URL to a querystring formatted URL that is
more suitable for generic database lookups. In this case, the rewrite is from:
http://www.jeffshop.com/forsale/876543.aspx
to:
http://www.jeffshop.com/catalog.aspx?sku=876543
<rewriter>
<rewrite
url="^/forsale/([0-9]+) .aspx$"
to="^/catalog.aspx?sku=$1" />
</rewriter>
Figure 4: Rewrite
a URL using UrlRewriter.NET.
The code listing in Figure 4 redirects the HTTP request to
a different URL, but the HTTP status code in the response is returned as 200,
which is not an official redirect code. This is OK if your Web site visitor is
a person, but for search engine spiders you ll want to return HTTP status code
301 to inform the search engine to record the permanent redirect, and transfer
page rank and link equity from the previous URL. In this case you should use
the <redirect> configuration element instead of the <rewrite>
element. The <redirect> element returns the permanent redirect HTTP
status code 301 by default.
UrlRewriter.NET is not the only available module, but it
is currently one of the more popular ones, and has received good coverage in
books and blog postings. In addition, being an open source project, the source
code is freely available on http://SourceForge.net
for you to browse and modify. You can download UrlRewriter.NET from http://urlrewriter.net/.
Conclusion
SEO-driven URL rewrites are essential to preserving
hard-earned page rank and link equity for Web pages. The leading search engines
assign higher rankings to Web pages that support keyword-rich URLs; have many
incoming links from reputable sites; and have longevity. URL rewrites allow you
to preserve hard-earned search engine rankings even if your site navigation
needs to change. IIS and ASP.NET provide good technical support for URL
rewrites, especially in combination with popular third-party modules.
Source code accompanying
this article is available for download.
Jeffrey Hasan, MCSD,
is Senior VP of Strategic Consulting Services at Axis Technical Group, Inc. (http://www.axistechnical.com). He has
been a professional systems architect and developer for 12 years. His work
focuses on enterprise integration, business intelligence, data warehouses, and
workflow-driven portals using SharePoint. Jeff has authored several .NET books,
including Expert Service Oriented Architecture
in C# (Apress, 2006).