Table of Contents
- Introduction
- Rule 1: What is the nature of the application (OLTP or OLAP)?
- Rule 2: Break your data in to logical pieces, make life simpler
- Rule 3: Do not get overdosed with rule 2
- Rule 4: Treat duplicate non-uniform data as your biggest enemy
- Rule 5: Watch for data separated by separators
- Rule 6: Watch for partial dependencies
- Rule 7: Choose derived columns preciously
- Rule 8: Do not be hard on avoiding redundancy, if performance is the key
- Rule 9: Multidimensional data is a different beast altogether
- Rule 10: Centralize name value table design
- Rule 11: For unlimited hierarchical data self-reference PK and FK
Courtesy: Image from Motion pictures
Introduction
Before you start reading this article let me confirm to you I am not a guru in database designing. The below 11 points are what I have learnt via projects, my own experiences, and my own reading. I personally think it has helped me a lot when it comes to DB designing. Any criticism is welcome.The reason I am writing a full blown article is, when developers design a database they tend to follow the three normal forms like a silver bullet. They tend to think normalization is the only way of designing. Due this mind set they sometimes hit road blocks as the project moves ahead.
If you are new to normalization, then click and see 3 normal forms in action which explains all the three normal forms step by step.
Said and done normalization rules are important guidelines but taking them as a mark on stone is calling for trouble. Below are my own 11 rules which I remember on the top of my head while doing DB design.
Rule 1: What is the nature of the application (OLTP or OLAP)?
When you start your database design the first thing to analyze is the nature of the application you are designing for, is it Transactional or Analytical. You will find many developers by default applying normalization rules without thinking about the nature of the application and then later getting into performance and customization issues. As said, there are two kinds of applications: transaction based and analytical based, let’s understand what these types are.Transactional: In this kind of application, your end user is more interested in CRUD, i.e., creating, reading, updating, and deleting records. The official name for such a kind of database is OLTP.
Analytical: In these kinds of applications your end user is more interested in analysis, reporting, forecasting, etc. These kinds of databases have a less number of inserts and updates. The main intention here is to fetch and analyze data as fast as possible. The official name for such a kind of database is OLAP.
Below is a simple diagram which shows how the names and address in the left hand side are a simple normalized table and by applying a denormalized structure how we have created a flat table structure.
Rule 2: Break your data into logical pieces, make life simpler
This rule is actually the first rule from 1st normal form. One of the signs of violation of this rule is if your queries are using too many string parsing functions like substring, charindex, etc., then probably this rule needs to be applied.For instance you can see the below table which has student names; if you ever want to query student names having “Koirala” and not “Harisingh”, you can imagine what kind of a query you will end up with.
So the better approach would be to break this field into further logical pieces so that we can write clean and optimal queries.
Rule 3: Do not get overdosed with rule 2
Developers are cute creatures. If you tell them this is the way, they keep doing it; well, they overdo it leading to unwanted consequences. This also applies to rule 2 which we just talked above. When you think about decomposing, give a pause and ask yourself, is it needed? As said, the decomposition should be logical.For instance, you can see the phone number field; it’s rare that you will operate on ISD codes of phone numbers separately (until your application demands it). So it would be a wise decision to just leave it as it can lead to more complications.
Rule 4: Treat duplicate non-uniform data as your biggest enemy
Focus and refactor duplicate data. My personal worry about duplicate data is not that it takes hard disk space, but the confusion it creates.For instance, in the below diagram, you can see “5th Standard” and “Fifth standard” means the same. Now you can say the data has come into your system due to bad data entry or poor validation. If you ever want to derive a report, they would show them as different entities, which is very confusing from the end user point of view.
Rule 5: Watch for data separated by separators
The second rule of 1st normal form says avoid repeating groups. One of the examples of repeating groups is explained in the below diagram. If you see the syllabus field closely, in one field we have too much data stuffed. These kinds of fields are termed as “Repeating groups”. If we have to manipulate this data, the query would be complex and also I doubt about the performance of the queries.With this approach the syllabus field in the main table is no more repeating and has data separators.
Rule 6: Watch for partial dependencies
The syllabus is associated with the standard in which the student is studying and not directly with the student. So if tomorrow we want to update the syllabus we have to update it for each student, which is painstaking and not logical. It makes more sense to move these fields out and associate them with the Standard table.
You can see how we have moved the syllabus field and attached it to the Standards table.
This rule is nothing but the 2nd normal form: “All keys should depend on the full primary key and not partially”.
Rule 7: Choose derived columns preciously
In the above figure you can see how the average field is dependent on the marks and subject. This is also one form of redundancy. So for such kinds of fields which are derived from other fields, give a thought: are they really necessary?
This rule is also termed as the 3rd normal form: “No column should depend on other non-primary key columns”. My personal thought is do not apply this rule blindly, see the situation; it’s not that redundant data is always bad. If the redundant data is calculative data, see the situation and then decide if you want to implement the 3rd normal form.
Rule 8: Do not be hard on avoiding redundancy, if performance is the key
Rule 9: Multidimensional data is a different beast altogether
OLAP projects mostly deal with multidimensional data. For instance you can see the below figure, you would like to get sales per country, customer, and date. In simple words you are looking at sales figures which have three intersections of dimension data.Rule 10: Centralize name value table design
Many times I have come across name value tables. Name and value tables means it has key and some data associated with the key. For instance in the below figure you can see we have a currency table and a country table. If you watch the data closely they actually only have a key and value.Rule 11: For unlimited hierarchical data self-reference PK and FK
Many times we come across data with unlimited parent child hierarchy. For instance consider a multi-level marketing scenario where a sales person can have multiple sales people below them. For such scenarios, using a self-referencing primary key and foreign key will help to achieve the same.You can also visit my website for step by step videos on Design Patterns, UML, SharePoint 2010, .NET Fundamentals, VSTS, UML, SQL Server, MVC, and lots more.
No comments:
Post a Comment