Fluent Thinking – C# Plus SQL Equals CouchDB. Maybe. January 23, 2010
Posted by ActiveEngine Sensei in .Net, ActiveEngine, C#, New Techniques, Open Source, Problem Solving.Tags: ActiveEngine, C#, CouchDB, Data Model, Document Management, new software, new thinking, SQL Server
trackback
I’ve gone too far to turn around
It’s hard to reach for you
When I’m lying face down
I can’t relieve my soul
I’m lost in a moment
Lying face downSaving Abel
There are moments when life brings too much possiblilty to your attention and your mind catapults into overload. Take the great potential that CouchDB has, as an example. Elegant, simple. Damien Katz has ignited a real fire storm with his work and may have single handedly given rise to the not-a-RDBMS movement. Sometimes developing under the gun, Sensei has to admit that the 3rd normal form restrictions can be a little inhibiting. Really, a collection of value pairs will do quite nicely, and XML, when it doesn’t run on for more that 1 page can be quite descriptive, it still is not quite the solution for quick configuration solutions. JSON is just about right, easy to manipulate, and this is why Damien chose this as the structure for a document in CouchDB.
“But Sensei, I have SQL Server. We aren’t allowed to say the L-word!”
“You mean Linux, right? Just checking.”
There may be constraints imposed by the environment you have at your disposal.And if you have a current system in production you might have a hard justifying to the powers that be that you should download a LAMP server from JumpBox just so you can introduce doucment managment into your current production system. Or perhaps you have an extensive data model and are concerned that that the meta data would too complex to store with the documents.
Here’s a simple yet effective way to introduce a “CouchDB lite” version to a current database platform that is applicable to SQL Server, Oracle, MySql and provides a flexible way to store meta data for quick retrieval. It’s not badass like Sam Jackson, but it’s not ridiculous like the beast picture next to him either.
The goal stated succinctly:
- Provide the abilty to associate a document to a record in an existing DBMS
- Provide the abilty to store the meta data related to a specific record or a document type.
- Ensure that future meta data and document types will not a change in the cardinality of the database schema nor require the addition of additional database attributes.
- Provide a object oriented framework that adheres to the Open-Closed principal; that is, open to extension but closed to modification.
Previously Sensei has opined regarding fluency, and this solution adheres to those prinicipals. For existing RDBMS a single table can introduce the simplicity and clarity you seek. All document meta data can be stored in a table aptly named Document:
CREATE TABLE [dbo].[Document]( [DocumentId] [int] IDENTITY(1,1) NOT NULL, [UserId] [int] NOT NULL, [Name] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL, [PostedDate] [datetime] NOT NULL, [DocumentHash] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL, [Attributes] [varchar](8000) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL, CONSTRAINT [PK_Document] PRIMARY KEY CLUSTERED ( [DocumentId] ASC )WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY] GO SET ANSI_PADDING OFF GO USE [REIMS_DEV] GO ALTER TABLE [dbo].[Document] WITH CHECK ADD CONSTRAINT [FK_Document_User] FOREIGN KEY([UserId]) REFERENCES [dbo].[User] ([UserId])
In brief, this table stores an identity key for each document, the name of the document, a foreign key that represents the user / person who saved the document, and the date when the document was posted. The attribute DocumentHash is a hashkey comprised of the DocumentId, the UserId, the Name of the document, and the PostedDate. The DocumentHash value is prepended to the document name to create a unique document. Consider it a way to associate the document with a specific record in the database. In other words, you should be able to re-create a haskey from the data attributes that matches the value pre-pended to the file name of the document. Another nice by-product is that you can version documents quite easily.
Focus next on the column named Attributes, as this is where the magic happens. The meta data for each document is stored here as JSON. This takes the following form:
[{"Key":"Owners","Value":"79,2,4,79"}, {"Key":"BonusId","Value":"95"} {"Key":"BonusTypeId","Value":"3"}, {"Key":"DealId","Value":"1035"}]
This JSON serilization of an array of Key/Value pairs is a format that allows you store whatever meta data you wish. Notice the first line named “Owners”. This allows you to store the user id’s of all users who have the ability to delete or edit the document. You can set security based on this attribute, and implement a document centric security framework that can exist in addition to security place on the share where the document is stored.
As stated earlier, any type of data can be stored in the Attributes column. If you notice line 2 that reads “BonusId” you see that keys to other tables in your database schema that can saved with the document record. This feature allows you to maintain the relationship of a document back to a particular record or set or records.
You may be asking yourself why the Key Value nomenclature in the JSON string. The attributes can be deserialized to an object of type List<KVPair> where KVPair has the following structure:
namespace BusObj { [Serializable] public class KVPair<TKey, TValue> { public TKey Key { get; set; } public TValue Value { get; set; } public KVPair(TKey k, TValue v) { Key = k; Value = v; } public KVPair() { } } }
The Key Value pair lends itself to simple querying. Since the attributes can vary from document to document, the meta data can include identity keys to records in your existing database schema, information regarding folder structure where the documents can be stored, and what ever other piece of important information of you need to retain. Once the JSON is deserialized to List<KVPair> you can use lambdas to query your documents. Consider the following code that is used to determine document ownership where the owners are represented as
{"Key":"Owners","Value":"79,2,4,79"}:
public bool IsDocumentOwner(int userId) { Enforce.ArgumentGreaterThanZero<int>(userId, "Document.IsDocumentOwner - parameter userId can not be null"); // No attributes? if(this.DocumentAttributes.Count == 0) { return false; } return DocumentAttributes.Find(x => x.Key == "Owners") .Value.Split(',') .Select(s => int.Parse(s)).ToList() .Contains(userId); }
As you can see, multiple users can be designated as an owner of a document. Any operation that is to be performed on a document can be quickly validated against the owner list, so actions such as deleting or displaying document can be quickly validated. Since the attributes are stored as JSON, you can easily augment existing documents.
The possiblities here are quite exciting. Perhaps you are looking for a quick an easy way to store configuration of objects but XML would prove to be too complicated to maintain. You could create an entry in the Document table with the attributes you wished to store and retrieve quite easily. Or, you can easily create a categorization system for documents that sit out on shares and you need to correlate these documents to records in a accounting package. There are numerous things you could do to make your life easier.
Full source code will be posted soon where an in depth discussion of the implementation will continue.
[…] a slim document management solution that can be quickly applied to an existing framework with minimal impact to database schema and code base. Sitting around the conference room table the comment arose from Annie, the project lead from the […]
Great post — Very interesting way of addressing this problem using existing database systems.
Thanks – we are a Documentum shop and part of our process that we make people go through prior to scanning is the create meta data for the files. We adopted a Abstract Object Model approach in the db – a base table for representing documents, then a document type table, document-attribute table, and finally a document-attribute-value table. This way we do no additional programming when there is a new document type, and we have the freedom / flexibility to create whatever number of attributes per document type that we wish.
It’s worked well, but its complex. Right now I’m completing a project where we maintain both attribute-value tables for certain items AND XML configuration files, and quite frankly it sucks. I’ve been bitching that I just want something like CouchDB / Mongo where I can define attributes easily and not go through XML hell when creating new workflows or define other meta data for the application. Last minute our sales group said they wanted document included, and I didn’t want have to able 4 more tables and monkey with the Data Access Layer. Plus, I just got pissed and sat down on a Saturday to hammer a solution out that wouldn’t be a heavy impact on our code base yet let me do stuff on the fly. Luckily it didn’t turn into a deep valley of frustration!
Thanks for reading!