December 15, 2007
The Newspaper of the NYSSCPA
Vol. 10, No.22

Critical Database Concepts for Modern CPAs
Demystifying Data Normalization, Part II: Procedures

By Dr. Jianing Fang, CPA

I reviewed all the important database concepts necessary to take on the formidable task of data normalization in The Trusted Professional’s Nov. 15 issue. In this article, I will continue demystifying this process by explaining the procedures involved.

What Is Data Normalization?

What is data normalization, and why it is so important? In my previous article, I explained that the main advantage of the database approach over the application-based file approach is the elimination of data redundancy.

In order to achieve this goal, the database, or set of two-dimensional data tables, must be logically organized and well structured. The process of logically organizing the data tables is called normalization. There are many levels of normal forms, ranging from first normal form (1NF) to fifth normal form (5NF), plus Boyce-Codd normal form and domain key normal form. For accounting information systems, the optimal level is the third normal form, but for more mission-critical systems—like systems for NASA’s space program—a higher level is required to ensure greater reliability and accuracy.

First Normal Form

Table 1, below, is an example of unnormalized relation. The lowest level of normal form, 1NF, requires that all attribute values in the table’s columns be atomic, which really means that the data table cannot have any repeating groups. The simplest way to understand this concept is to determine whether each record (represented in the table’s rows) has the same number of attributes in each of the table’s columns.

We say Table 1 is unnormalized because it contains repeating attributes within some of the records, which can be seen in sales order numbers 101,102,103 and 107. In the table, each record represents one sales order. But let’s look at sales order number 101. The first item is fine, but the Item Number (Item_Num), Item_Name, Quanity Ordered (Qty_Ordered), Customer Number (Cust_Num) and Customer Name (Cust_Name) are repeated for the second item. You can easily see that the second row has two less columns than the first one, as the values for Sales Order Number (SO_Num) and Cust_Name are missing. In short, the table contains repeating groups; therefore, it fails to meet the definition of 1NF.

In order to normalize Table 1 to 1NF, all the repeating groups need to be eliminated to ensure that all attribute values in the table are atomic. The process to accomplish this goal is very simple, consisting of filling relevant data into empty cells (see Table 2).

After filling in those blanks, the data table does not contain any repeating groups, and each row has the same number of columns; the requirement for 1NF is met.

Data tables in 1NF usually cannot provide the required accuracy and reliability for accounting information systems because they contain partial functional dependencies and transitive functional dependencies. For example, in Table 2 we can see the following:

Item_Num functionally determines Item_Name; as a result, Item_Name, such as “24k gold earring,” is repeated many times in the data table. This is a partial dependency, because “Item_Num” is part of the composite primary key made up of SO_Num and Item_Num.

Cust_Num functionally determines Cust_Name, such as “World Best.” This is a non-key dependency, because both Cust_Num and Cust_Name are non-key attributes. This also causes data redundancy, as the customer names repeat many times in the table.

Anomalies

Data redundancy is the main cause for errors or inconsistencies, called anomalies, when the system updates the database. There are three types of anomalies: insertion, deletion, and modification.

Insertion Anomaly

Let’s assume you have just designed a new ruby ring, which you would like to offer for sale. But you cannot add the new item to the data table until you actually have received a sales order for that item. That is impractical, or impossible, for any business. In the real world, we always record inventory information in the database before taking any sales orders.

Deletion Anomaly

If the manager needs to delete an erroneous record—order number 104, for example—the information for customer number 2, “NY King,” is correspondingly eliminated from the database. Following the same logic, if the manager deletes order number 101, the company loses the product information for item number 10, “14k gold bracelet.”

Modification Anomaly

Now let’s assume the manager finds a typo. Instead of “14k gold necklace,” the operator typed in “10k gold necklace” for item number 30. In order to correct the problem, the manager has to go through all the orders that contain that item—in this example, sales order numbers 102, 103, 104, and 109. It’s even more troublesome if the manager misses one or more records when trying to correct the typo. This can easily happen, especially when the relation contains thousands of business transactions. The result is the most dreaded database issue: data inconsistency.

In addition to these anomalies, the data table is highly susceptible to inconsistency in another way. The operator, for example, could type in “item number 50, 24k gold ring” for sales order number 101 but accidentally type “item number 50, 24k gold earring” for sales order number 108.


Dr. Jianing Fang, CPA, is a controller for a private corporation and an associate professor of finance at Pace University. He also teaches Taxation, Accounting Information Systems and International Accounting at Long Island University. Dr. Fang is a member of the NYSSCPA’s Information Technology Committee.

Home | Print Story | E-mail Story


Home
| About Us | Continuing Education | Future CPAs | Government Affairs | Professional Resources | Publications | Sound Advice | Tax Resources

Chapters | Committees | Member Center | Events Calendar | Classifieds | Careers | E-zine Subscriptions | The Trusted Professional | The CPA Journal



Search | Site Map | Become a Member | Jobs | Press Room | Contact Us | Feedback

©1997 - 2009 New York State Society of Certified Public Accountants. Legal Notices