I have a table in my system that has some 70,000 records in it. I needed to add a field to the table that would flag some of the records as having a certain attribute (I won't go into the details here). The default would be 0, and perhaps only 100 of these records would actually have it set to 1. This got me thinking about what a good design would be for this kind of information storage...
Just adding a bit field to the table seemed like a huge waste of space. I know, disk is cheap, time is not, but still: actually storing 69,980 zeros and 20 ones seemed a bit silly. This is what you would call sparse data, meaning only a very small amount of the rows actually have any non null or non default data in them.
The way I solved it was by adding an extension table, that just contains the IDs of those rows that should have the flag set. This table now has 20 records, and the original was left unchanged. Of course, the drawback is inserting, updating and selecting becomes a bit more of a hassle: in essence, you set the flag by adding a record to the extension table, you clear it by deleting the record, and you check for it's existence by left outer joining and if there is a non null match, the flag is set. So you might see code like this:
CREATE PROCEDURE UpdateRecord( @ID int, @IsSet bit, /* other values */ ) AS /* Update base table - details omitted */ /* Update extension table */ IF @IsSet=1 INSERT ExtensionTable( ID ) VALUES @ID ELSE DELETE ExtensionTable WHERE ID=@ID
And
CREATE PROCEDURE GetRecord( @ID int ) AS SELECT /* Columns from base table */, CASE WHEN ext.ID IS NULL THEN 0 ELSE 1 END as IsSet FROM BaseTable base LEFT OUTER JOIN ExtensionTable ext ON ext.ID = base.ID WHERE base.ID = @ID
You get the picture: the storage is efficient, but it is hardly transparent.
Things deteriorate rapidly if you have more than one sparse column - the only way to efficiently store that would be multiple extension tables... what a nightmare of SQL code.
Wouldn't it be nice if the database engine could handle these details for you, and let you treat the column as if it actually were a bit field in the base table?
Well, enter SQL server 2008! Is boasts support for sparse data columns, described as follows in Microsoft's documentation:
"Sparse Columns
This feature provides a highly efficient way of managing empty data in a database by enabling NULL data to consume no physical space. For example, sparse columns allows object models that typically contain numerous null values to be stored in a SQL Server 2008 database without experiencing large space costs. Sparse Columns also enable administrators to create tables with more than 1,024 columns."
Sounds exciting! I haven't been able to find any detailed information from Microsoft about the syntax or how this is implemented, although I am sure I could if I downloaded the 2008 beta. But this blog shows some of the syntax:
CREATE TABLE products (product_num int, item_num int, price decimal(7,2), ..., color char(5) SPARSE, width float SPARSE...)
Looks pretty transparent to me!
There are lots of other goodies in SQL Server 2008, check out this document for an overview.
Remember Me
a@href@title, i, strike, u
Copyright © 2003-2008 Falafel Software Inc.
Subscribe to Falafel Blogs
The opinions expressed herein are Falafel's employees own personal opinions and do not represent Falafel Software's view in any way in case they go bananas!