A Web page is an HTML document that might contain plain text as well as layout information and script code. Script code and layout information are separated from text by using special characters—the angle bracket characters—and tags. It goes without saying that if the text contains angle brackets, or some other special characters, that would probably produce a weird effect when the browser works on the document.
HTML encoding replaces these critical characters with ad hoc sequences that the browser recognizes as the intended character. For example, when the opening angle bracket symbol (<) is used as plain text, HTML encoding transforms it as <.
The need for encoding HTML came about with the advent of dynamic pages, which allow text to be read and injected from databases. HTML encoding is also important from a security viewpoint. HTML encoding protects against script exploits neutralizing unwanted script tags that might be silently injected in your pages.
Since its first version, ASP.NET has provided tools for encoding (and decoding) HTML text. In particular, you'll find a pair of methods for encoding and decoding—HtmlEncode and HtmlDecode—on the ASP.NET Server object. In ASP.NET 4, using these methods is quicker, and to some extent smoother, due to a new syntax and a new subsystem.
A Quick Syntax in ASP.NET 4
ASP.NET 4's new subsystem for auto-encoding HTML text saves you from the burden of always wrapping any piece of text in a call to HtmlEncode. The new syntax is a special version of the classic code block. When you have a code block, you simply use the colon symbol (:) to instruct the runtime to HTML-encode any text being displayed. Here’s an example:
<%: "<script>alert('Hello ASP.NET 4');</script>" %>
If you try this expression in an ASP.NET 4 sample page, you’ll obtain what’s depicted in Figure 1. The script command is output as plain text and doesn’t execute. Replace the : symbol with an = sign and the script code just executes. So far so good; now look at the following code snippet:
<% var text = "<script>alert('Hello ASP.NET 4');</script>"; %>
<%: text %>
This snippet produces exactly the same output you might see in Figure 1. However, the structure of the code opens up new possibilities. What if you emit text in the code block from an existing utility that already provides sanitized HTML? Imagine that the variable text receives its value from a method you don’t control and that already returns encoded markup. As a result, HTML encoding will be performed twice. What happens in this case? You might end up in a situation like the one illustrated below.
<%: Server.HtmlEncode("<script>alert('Hello ASP.NET 4');</script>") %>
The result is shown in Figure 2. As you can see in the figure, the original text will be encoded twice—once because of the explicit call to HtmlEncode and once because of the : symbol in the code block. Clearly, this is not desirable.