September 25, 2003 12:09 AM

Tame the BLOB

Storing files as Binary Large OBjects (BLOBs) in the database can improve consistency and reliability.
DevConnections
Rating: (0)

DataStream

LANGUAGES: C#

ASP.NET VERSIONS: 1.0 | 1.1

 

Tame the BLOB

Storing files as Binary Large OBjects (BLOBs) in thedatabase can improve consistency and reliability.

 

 

A common requirement in applications is loading and savingdata from files, and .NET makes performing file I/O a piece of cake. However,managing the storage of those files can get tricky sometimes, especially whendealing with volatile and automatically created files. You have to come up witha scheme for what folder structure you are going to use, ensure that you do nothave naming conflicts, make sure you have appropriate access to those files,remove files when they are no longer being used, and so on. Basically, thereare a lot of things to get straight and a lot of places for things to go wrongwith file I/O. It is also common to need to store some metadata about the filesyou are creating and accessing in a database, so that you can query for thatmetadata or provide summary reports of the files that exist in your system orthe data that they contain.

 

Separating the data contained in the files from themetadata stored in the database leads to all kinds of synchronization problemsas well. What happens if the file gets stored, but the metadata for the filedoes not get created properly? What if someone deletes the file from the filesystem, but fails to clean up the corresponding records in the database? Byseparating the files from their associated data, updates and additions involvemaking changes to both a transactional resource manager (the database) and anon-transactional resource manager (the file system). Trying to ensureconsistency with this separation of file data and metadata can be a real pain.

 

The answer to this dilemma lies in the ability for mostmodern databases to store binary data in fields within a table. Under thecovers, the way the data is stored and accessed is going to vary widely, basedon the particular database implementation. But at the logical level, the binarydata appears to be just another field in a table where you can store otherinformation as well.

 

For this article, I am going to focus on working with SQLServer, but the code and concepts apply to other databases as well, as long asthey have corresponding database types.

 

SQL Server Binary Types Pave the Way

SQL Server defines several types that translate to binarystorage under the covers. There are the binary and varbinary types, which storeup to 8KB of binary data. The binary type has a fixed allocation size that youspecify when creating the column, and varbinary fields will only requirestorage based on the actual size of the data put into each row. More commonly,what you will want to use are the image, text, and ntext types, each of whichallows variable length storage of data from 0 to 2GB. The image type is forstoring raw bytes, whether they correspond to a graphic image, some bits from afile, character data, or whatever you want. The text and ntext fields areoptimized for storage and retrieval of large blocks of text. The text type willstore the text in a format determined by the code page used by the database inquestion. The ntext type always stores characters in Unicode format.

 

ADO.NET eases the process of getting binary data into andout of the database. An ADO.NET-managed provider is responsible for defining amapping between the underlying database type system and the .NET type system.For SQL Server, the text and ntext types just map to a string. This is anatural fit since .NET strings are Unicode under the covers and are onlylimited in size by the amount of memory your process can allocate. The imagetype in SQL Server maps to a byte array, again making it easy to work withsince most binary operations in .NET deal with byte arrays as well.Specifically, whenever dealing with streams, it is easy to push a byte arrayinto a stream or read it back out. The code in Figure 1 shows how to read agraphic image file out of a database field and into a Bitmap. This code calls astored procedure, which takes the primary key for the table and returns theimage field of the table containing the raw bytes read in from the image filewhen the table was populated.

 

private Bitmap LoadBitmap(int id)

{

   // Create theconnection

   SqlConnection conn =new SqlConnection(m_connectionString);

   // Create a command forthe GetImage proc

   SqlCommand cmdGetImage= new SqlCommand("GetImage",conn);

   cmdGetImage.CommandType= CommandType.StoredProcedure;

   // Populate the idparam

   cmdGetImage.Parameters.Add("@ImageID",id);

   SqlDataReader reader =null;

   byte[] bits;

   try

   {

      conn.Open();

      // Execute a readerto get the data

      reader =cmdGetImage.ExecuteReader();

      reader.Read(); //Position on first and only record

      // Extract the bits

      bits = reader["ImageData"]as byte[];

   }

   finally

   {

      // Close the readerfor this image

      if (reader != null)

         reader.Close();

      // Close theconnection

      conn.Close();

   }

   // Load the bits into aBitmap using a MemoryStream

   MemoryStream ms = newMemoryStream(bits);

   Bitmap bmp = newBitmap(ms);

   return bmp;

 

}

Figure 1. Readingan image field in from a database returns a byte array. Depending on theexpected contents of the field, those bytes can then be used to re-hydrate anobject of the appropriate type. In this case, the bytes are placed in aMemoryStream, which is then used to create a Bitmap object. This is all donewithout any direct file I/O.

 

The code in Figure 1 takes the simple approach of justusing a reader with the default command behavior because it is being used witha query that is expected to only return a single row. If you are using aDataReader with a query that will return multiple rows containing image data,you will want to use the CommandBehavior.SequentialAccess flag to have thereader only pull the data one row at a time as a stream. You pass thisenumeration value to the ExecuteReader method, then access the data by callingthe GetBytes method on the reader. The code below shows this approach:

 

SqlDataReader reader =cmd.ExecuteReader(CommandBehavior.SequentialAccess);

while (reader.Read())

{

   // Get the byte countby passing null for the buffer

   int byteCount =(int)reader.GetBytes(0,0,null,0,0);

   byte[] imgData = newbyte[byteCount];

   // Read in the wholeblock for this row at once

   reader.GetBytes(0,0,imgData,0,byteCount);

}

 

If you expect the image field for a single row to be verylarge, you can read in the bytes in chunks using the GetBytes method byproviding different indexes for the various arguments of the method. See theonline docs for details of each parameter.

 

The download code for this article includes a sampleapplication that allows you to read and write both graphic image files (bmp,jpg, and gif) and XML documents to some tables in a database. The app lets youpopulate the database by pointing it to a folder that contains image files orXML documents. The app also lets you read those files in from disk usingequivalent code to the code that reads them in from the database, allowing youto experiment with the relative performance of loading files from disk orloading them from the DB. The graphic images in the ImageCollection table arepassed into a Bitmap object constructor, and the XML file contents are used toload an XmlDocument instance to represent the kinds of objects you might createwith the underlying binary data. The app code includes some simple profilingtiming to see the differences between reading BLOBs from the DB and reading thesame data from disk.

 

Why (and When to) Bother?

First, you need an application where it makes sense to usethis technique. One example would be if you have pictures that are associatedwith other data presented by your app, such as product photos or thumbnailspresented with product data in an online store Web site. Another would be ifyou are using XML as a structured storage mechanism for data that you don'twant to have to define a rigid relational schema for, but you need to store andretrieve that data along with other metadata items about that structure storage(i.e. date/time of creation, number of data items contained, etc.).

 

But don't get carried away with using this technique. Thefile system still has a place in many applications. If the files are productsof the application, as in end-user applications, and if the user might expectto be able to find those files outside the running application itself, thenstoring files as BLOBs in the database is not the right way to go.

 

What about performance? If you dealt with binary data andSQL Server prior to version 7.0, you may have found that the performance leftsomething to be desired. In version 7.0 and later, the way SQL Server storesand retrieves binary data has been vastly improved. From profiling done withthe download code sample app using ADO.NET and SQL Server 2000, and using acollection of images and documents on my machine, it is obvious that there islittle to no performance penalty for retrieving files stored in a databasecompared to loading them from disk. Storing and retrieving text files isactually faster from the database than from disk. Your mileage may varydepending on the connection between your database and your executing code,network traffic, and other factors, so as always with data access andperformance concerns - profile!

 

By storing your file content in a database you benefitfrom co-locating it with any associated metadata, and accessing that data andcontent within the context of a transactional resource manager. You can back upall the data atomically using replication or other database backups. You don'thave to worry about file access permissions, name collisions on files orfolders, or cleaning up obsolete files. If you are storing the content of textfiles, you can also include them in the Full-Text Search capabilities of SQLServer to perform deep searches of not only the metadata for files, but alsotheir content.

 

The sample code in this article is available for download.

 

Brian Noyes is a consultant, trainer, speaker, andwriter with IDesign, Inc. (http://www.idesign.net),a .NET focused architecture and design consulting firm. Brian specializes indesigning and building data-driven distributed applications. He has over 12years experience in programming, engineering, and project management, and is acontributing editor for C#PRO, asp.netPRO, and other publications. Contact himat mailto:brian.noyes@idesign.net.

 

 

 

 

Add a Comment

There are no comments to display. Be the first one!
You must log on before posting a comment.

Are you a new visitor? Register Here

advertisement




Comments from the DevConnections Community

Join our community of development pros.

Windows problem

I all, I have a problem on my Windows Vista that began afetr the purchase of an external Hard Disk Freecom. A few days afetr the purchase I discon...

Most Recent Posts

GOOGLE LINKS
SPONSORED LINKS
FEATURED LINKS