navigation
 Wednesday, November 21, 2007

I have a table in my system that has some 70,000 records in it. I needed to add a field to the table that would flag some of the records as having a certain attribute (I won't go into the details here). The default would be 0, and perhaps only 100 of these records would actually have it set to 1. This got me thinking about what a good design would be for this kind of information storage...

Just adding a bit field to the table seemed like a huge waste of space. I know, disk is cheap, time is not, but still: actually storing 69,980 zeros and 20 ones seemed a bit silly. This is what you would call sparse data, meaning only a very small amount of the rows actually have any non null or non default data in them.

The way I solved it was by adding an extension table, that just contains the IDs of those rows that should have the flag set. This table now has 20 records, and the original was left unchanged. Of course, the drawback is inserting, updating and selecting becomes a bit more of a hassle: in essence, you set the flag by adding a record to the extension table, you clear it by deleting the record, and you check for it's existence by left outer joining and if there is a non null match, the flag is set. So you might see code like this:

CREATE PROCEDURE UpdateRecord( @ID int, @IsSet bit, /* other values */ )
AS

/* Update base table - details omitted  */

/* Update extension table */
IF @IsSet=1 INSERT ExtensionTable( ID ) VALUES @ID ELSE DELETE ExtensionTable WHERE ID=@ID

And

CREATE PROCEDURE GetRecord( @ID int )
AS

SELECT /* Columns from base table */, CASE WHEN ext.ID IS NULL THEN 0 ELSE 1 END as IsSet
FROM BaseTable base
LEFT OUTER JOIN ExtensionTable ext ON ext.ID = base.ID
WHERE base.ID = @ID

You get the picture: the storage is efficient, but it is hardly transparent.

Things deteriorate rapidly if you have more than one sparse column - the only way to efficiently store that would be multiple extension tables... what a nightmare of SQL code.

Wouldn't it be nice if the database engine could handle these details for you, and let you treat the column as if it actually were a bit field in the base table?

Well, enter SQL server 2008! Is boasts support for sparse data columns, described as follows in Microsoft's documentation:

"Sparse Columns

This feature provides a highly efficient way of managing empty data in a database by enabling NULL data to consume no physical space. For example, sparse columns allows object models that typically contain numerous null values to be stored in a SQL Server 2008 database without experiencing large space costs. Sparse Columns also enable administrators to create tables with more than 1,024 columns."

Sounds exciting! I haven't been able to find any detailed information from Microsoft about the syntax or how this is implemented, although I am sure I could if I downloaded the 2008 beta. But this blog shows some of the syntax:

CREATE TABLE products (product_num int, item_num int, price decimal(7,2), ..., 
color char(5) SPARSE, width float SPARSE...) 

Looks pretty transparent to me!

There are lots of other goodies in SQL Server 2008, check out this document for an overview.

Tuesday, November 27, 2007 3:58:54 PM UTC
I always looked at this as a performance issue: creating an extension table than putting a bit flag. An index on the bit column is useless, while creating the extension table and putting an index on that id performs much faster.
Does the SPARSE option changes anything for indexes set on these columns?
Liviu Costea
Thursday, December 13, 2007 6:42:05 AM UTC
And I went to try it.

We have a 192 GB table with lots of rows where some of the columns are null.

So I tried this :
CREATE TABLE products (product_num int, item_num int, price decimal(7,2),
color char(5) SPARSE, width float SPARSE)

select @@VERSION

And got this:
myserver(mylogin): Msg 102, Level 15, State 1, Line 2
Incorrect syntax near 'SPARSE'.

And
Microsoft SQL Server code name "Katmai" (CTP) - 10.0.1075.23 (X64) Nov 8 2007 14:13:37 Copyright (c) 1988-2007 Microsoft Corporation Developer Edition (64-bit) on Windows NT 5.2 <X64> (Build 3790: Service Pack 2)


So no luck, wait for SQL 2011 or SQL 2008 SP2?

Best regards,

Henrik Staun Poulsen
Henrik Staun Poulsen
Name
E-mail
Home page

Comment (Some html is allowed: a@href@title, i, strike, u) where the @ means "attribute." For example, you can use <a href="" title=""> or <blockquote cite="Scott">.  

Enter the code shown (prevents robots):

Live Comment Preview