Normalization and Sample Schema Creation
The Database design process should comply with certain broad guidelines. A well designed database stores information in the smallest amount of space possible and responds to queries speedily. Our sample database stores data of books and customers of a bookstore. Consider a scenario where customers should be mapped to their book category preferences for new release mailing lists:
Fig 6a: Data Redundancy
The above table stores the word Fiction four times and the words Non-Fiction, Poetry, and Mythology three times. If the customer base grew to, say, 10000 people, the same words would appear many thousands of times. This table contains redundant, or unnecessarily repeated data. How would a good design eliminate this problem and streamline the database? Take a look at the following tables.
Fig 6b: First Normal Form
The CustomerCategories have now been split into two tables. Each customer now has a unique ID. This ID is called a Primary Key. Assigning such unique IDs or primary keys makes a database compliant to the 1st Normal form. The lookup becomes easier now; all we have to do is search for 'Poetry' in the second table to get a list of customers who have a preference for this category. However, it still looks untidy and numbers still get a repeated a lot. Also, if the last customer subscribed to a particular category is removed from the database, the category will no longer be listed in the database!
Fig 6c: Second Normal Form
Look at the above set of tables. A new Category table has been created with unique IDs for each category. Now, the new CustomerCategories table contains less repetitions and is much clearer. We have separated the old CustomerCategories table because it depended on two sets of keys - customerID and categoryID - not a good practice. By doing this, we have achieved the 2nd normal form. To get the design compliant to the 3rd normal form, all the data within a table should pertain to a single subject. For example, let us say that the bookstore expands to many cities. Rather than adding branch name and location columns to the customers table, we would create a Branch table with a unique primary key and use this primary key to indicate a customer's primary branch. This concept is called a foreign key. Using such keys makes the database comply with the 3rd Normal Form. Making a database comply with these normal forms will ensure that it is efficient.
Fig 6d: 3rd Normal Form
The SQL query language is the standard means of talking to an Oracle database. SQL is an opensource language - it is free - you do not need software or a license to use SQL, it is not owned by any particular corporation. The American National Standards Institute and the International Standards Organization published a specification known as SQL-89. An improvement and expansion of the standard gave the world SQL-92 and later the third generation standard, SQL-99. SQL is a declarative and not a procedural language. A procedural language uses a number of steps or a procedure to get a specified result. A declarative language, on the other hand, simply makes a statement - a single declaration that is sent to the DBMS. The DBMS then executes internal programs and returns an answer. For example, the SQL statement 'CREATE TABLE' with a list of parameters creates a table on the database.
An Oracle table may contain up to 254 columns of varying data types. The data type is the sort of value (number, string, date etc.) that the column contains. A table is made up of rows and columns. The columns define the subject of the data (or domain) each row contains. Table names should be unique within a schema. The structure or relational schema of a table consists of its columns and their data types. The order in which rows appear when a table is opened has nothing to do with the way they are stored or sorting. Tables cannot contain duplicate rows; at least one column has to differ from another.
We will learn about SQL and PL/SQL (used to create structured programs with a flow of control that consists of several SQL statements) over the next three chapters. We will use a sample bookstore database for our exercises. Let us first work on creating an appropriate schema and populating tables with sample data. There are many GUI based methods of achieving most of the exercises that follow; we will stick to the SQL command line as it is the most standard and flexible method of all.
Our user (and schema) will be called 'booksdb'. First, login to a SQL plus command window on the Oracle server with the password 'SYS' or 'SYSTEM' as 'SYSDBA' (issue the command 'connect SYS@<SID> as SYSDBA'). Run the following command to create the 'booksdb' user and a corresponding schema. Here, we assign the new user to the 'USERS' tablespace and unlock the account.
CREATE USER "BOOKSDB" PROFILE "DEFAULT" IDENTIFIED BY "booksdb" DEFAULT TABLESPACE "USERS" TEMPORARY TABLESPACE "TEMP" ACCOUNT UNLOCK;
You should see a 'User Created' message. Now assign a DBA role to this user so that the person logged into the account has a broad set of privileges.
GRANT DBA TO BOOKSDB
You will see a grant succeeded message. Disconnect from the server. The entire process is displayed in the following screenshot (note - substitute your oracle home directory for 'D:\oraclexe' and the name of your oracle database instance SID for 'XE'):
C:\>d: D:\>cd oraclexe D:\oraclexe>sqlplus SQL*Plus: Release 9.2.0.1.0 - Production on Sat Dec 9 23:50:34 2006 Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved. Enter user-name: SYS@XE as SYSDBA Enter password: Connected to: Oracle Database 10g Express Edition Release 10.2.0.1.0 - Production SQL> CREATE USER "BOOKSDB" PROFILE "DEFAULT" 2 IDENTIFIED BY "booksdb" DEFAULT TABLESPACE "USERS" 3 TEMPORARY TABLESPACE "TEMP" 4 ACCOUNT UNLOCK 5 ; User created. SQL> grant dba to booksdb 2 ; Grant succeeded. SQL> disconnect Disconnected from Oracle Database 10g Express Edition Release 10.2.0.1.0 - Production
Included in this tutorial are several SQL scripts that automatically creates the books database table and populates them. Download the scripts booksdbddl.sql , book.sql , author.sql , category.sql , customer.sql , custorder.sql , orderdetails.sql , and subject.sql onto an appropriate directory on your computer. Login to the database as 'booksdb' with the password 'booksdb':
SQL> connect booksdb@xe Enter password: Connected.
Now, run the booksdbddl script in the following way. This script creates all the sample tables and related data objects. Replace 'e:\oracle\info\' with the path to the file on your computer. Don't forget the '@' symbol. After you press enter, you should see several 'table created', 'sequence created', 'index created', trigger created' and 'trigger altered' messages scroll by.
SQL> @e:\oracle\info\booksdbddl
After the above command, run the rest of the files in the exact order specified below. Remember to change the directory to reflect the path to the respective script on your files. Each of these files populates a table created in the previous step. Although the commands are shown contiguously, you will actually see many '1 row created' messages flash past after running each script.
SQL> @e:\oracle\info\author SQL> @e:\oracle\info\category SQL> @e:\oracle\info\subject SQL> @e:\oracle\info\book SQL> @e:\oracle\info\customer SQL> @e:\oracle\info\custorder SQL> @e:\oracle\info\orderdetails
Now, go ahead and familiarize yourself with the tables by opening the booksdb schema on your Oracle GUI tool (OEM console or the XE database web page). To view this schema through the OEM console, you would open your default database, click on the plus sign next to 'schema', and the plus sign next to 'booksdb'. To open it from the XE database web page, simply login as booksdb and choose Object Browser -> Browse -> Table. Go through the data in the tables and the table structure.
Figure 6e: Books Database Tables
The above diagram displays all the relationships in the books database. Note that every table except orderdetails has a primary key (the first column) and most dependant tables have foreign keys. The author, category, and customer tables are top level tables that do not depend on other tables. The subject table contains a foreign key that refers to a category id. The book table contains foreign keys for both the author id and subject id fields. The custorder table contains the id of the customer who placed the order and the order date. The orderdetails table is organized by the id corresponding to each item within an order. We use three data types - varchar for string/character data; date; and number. The price column in the book table and the unitprice column in the orderdetails table alone are numbers with decimal digits; all other number fields contain integers.
Let us start by going over some of the scripts in the booksdbddl.sql files. Note that this file contains mostly Data Definition Language (DDL) commands. DDL deals with table and data format and creation. DML or Data Manipulation Language deals with data retrieval, querying and so on. First, let us look at the statements that created the author table and associated objects.
CREATE TABLE "AUTHOR"
(
"AUTHORID" NUMBER(5,0) NOT NULL ENABLE,
"LASTNAME" VARCHAR2(100) NOT NULL ENABLE,
"FIRSTNAME" VARCHAR2(100) NOT NULL ENABLE,
CONSTRAINT "AUTHOR_PK" PRIMARY KEY ("AUTHORID") ENABLE
)
The 'create table' statement creates a new table within a user's schema and current tablespace. The table's column definitions and other constraints should be within a pair of brackets after the table name. Note that the double quotes are optional and added to make the statement more readable. All words within quotes are defined by the user. The author table has three columns. LASTNAME and FIRSTNAME contain strings (VARCHAR2(100)) that may be up to a hundred characters long. The AUTHORID field contains a number that may be up to five digits long with no decimal digits (as defined by NUMBER(5,0)).
The final line creates a constraint that makes the 'AUTHORID' key a primary key. This constraint (AUTHOR_PK) basically restricts the AUTHORID field so that it contains no duplicates. Such constraints maintain the integrity of the data and are therefore called integrity constraints. A constraint does not have to be a primary key or foreign key. We could even define a constraint through a formula based on constants or values in other fields. The following command adds a constraint to the orderdetails table. This constraint restricts discounts: a discount cannot be applied to a single copy order of a book that costs below $25. The EXCEPTIONS table maintained by the Oracle Server stores information about failed inserts; we direct information about rejected rows that do not satisfy the constraint to the EXCEPTIONS table.
ALTER TABLE ORDERDETAILS ADD CONSTRAINT CHECK_DISCOUNT CHECK ((QUANTITY>1 OR UNITPRICE>25.00) OR DISCOUNT IS NULL) EXCEPTIONS INTO EXCEPTIONS
Inserting a new row in the field would involve finding the maximum value of authorid, incrementing this value by one to obtain the next appropriate value, and entering the new value into the table along with the corresponding last name and first name. This long drawn procedure is unnecessary if we get Oracle to maintain the authorid sequence for us. A sequence data object contains a number within a range. Invoking a command of the sort <sequence_name>.nextval returns the value of the number held in the sequence. It is then incremented; the following value may be obtained by the next call to <sequence_name>.nextval. We may either insert the next value of authorid manually by invoking sequence_name.nextval or get Oracle to do it for us. Let us look at sequence creation:
CREATE SEQUENCE "AUTHOR_SEQ" MINVALUE 1 MAXVALUE 999999999999999999999999999 INCREMENT BY 1 START WITH 1 CACHE 20 NOORDER NOCYCLE
The above statement creates a sequence called AUTHOR_SEQ with a minimum value of 1 and a maximum value of (10 raised to the power 28 - 1). The sequence starts with the value one and is incremented by 1 for each invocation of nextval. The NOCYCLE directive specifies that the sequence should not 'cycle' and start from 1 again once it reaches its maximum value and CACHE 20 relegates twenty values to memory for fast retrieval.
Let us take a look at the create command for the 'book' table to get a better idea about establishing foreign key constraints. The bookid field is the primary key of the book table. The table contains a title column, a price column, and a pubdate column. Note that the price column may range up to seven digits with two decimal digits (in effect, 5 digits before the decimal - NUMBER(7,2)). The book table also contains references to the book's author and subject through the AUTHORID and SUBJECTID foreign keys.
CREATE TABLE "BOOK"
(
"BOOKID" NUMBER(7,0) NOT NULL ENABLE,
"AUTHORID" NUMBER(5,0) NOT NULL ENABLE,
"SUBJECTID" NUMBER(3,0) NOT NULL ENABLE,
"TITLE" VARCHAR2(500) NOT NULL ENABLE,
"PRICE" NUMBER(7,2) NOT NULL ENABLE,
"PUBDATE" DATE NOT NULL ENABLE,
CONSTRAINT "BOOK_PK" PRIMARY KEY ("BOOKID") ENABLE,
CONSTRAINT "BOOK_FK1" FOREIGN KEY ("AUTHORID")
REFERENCES "AUTHOR" ("AUTHORID") ENABLE,
CONSTRAINT "BOOK_FK2" FOREIGN KEY ("SUBJECTID")
REFERENCES "SUBJECT" ("SUBJECTID") ENABLE
)
Note that the foreign key constraint is defined through the CONSTRAINT - FOREIGN KEY - REFERENCES clause of the create table command. The references section specifies the table that the foreign key refers to. The 'ENABLE' keyword is used to enable the constraint and make it effective. After enabling a constraint, errors occur when attempts are made to insert new rows with a subjectid value not represented in the subject table
An Index on a column improves efficiency when the specific values of the column are searched for within large tables. An index works by storing a sorted value of each indexed column(s) with the exact location (as an offset in bytes) of the row where the value occurs in the table. Thus, the value is looked up in the index (quite fast, as it is sorted) and the row is retrieved by 'jumping' to the specified location. All tables are automatically indexed on the primary key. The primary key index is created and maintained by Oracle. Indexes are useful in cases where there are many requests for single rows of data via the indexed column. We create two indexes - one based on the Author's name and another based on the book's title. The index command is quite basic, it creates the named index on the specified table on the columns within the parenthesis in the order specified.
CREATE INDEX "CUSTOMER_IDX1" ON "CUSTOMER" ("LASTNAME", "FIRSTNAME")
CREATE INDEX "BOOK_IDX1" ON "BOOK" ("TITLE")
The datatypes used are just a small subset of available oracle data types. Following are some other datatypes:
- char(n): A fixed length string of up to n characters. 'n' may range up to 255 bytes in size. Any char string that is shorter than the length indicated in the column definition is padded with spaces to the right to the defined length of 'n'.
- varchar2(n): A variable-length string of characters with upper bound 'n'. Only used bytes are stored.
- number(o, d): Numeric type for integers and real numbers. 'o' refers to the total number of digits while d is the number of digits after the decimal point. The number (10 raised to o-d) - 1) is the largest integer that may be stored in a real number field. For example, NUMBER(7,2)'s upper bound integer is (10 raised to 7-2) - 1 i.e. 100000 - 1 = 99999. The integer, decimal and smallint datatypes are derived from the number datatype.
- date: Stores date and time and has a default format of DD-MMM-YY.
- long: A long field can store character data up to 2Gigabytes.
The drop table command may be used to delete a table. A 'cascade' keyword may be used to delete all the dependant data objects. The 'force' keyword drops tables and objects with dependencies without cascading the delete. Drops and should be used with great care to prevent data invalidation.
DROP TABLE MYTABLE CASCADE
The alter command may be used to redefine a table. The alter command may be used in a few different ways. The following commands change the size of the price column and disable the foreign key constraint book_fk1:
ALTER TABLE book MODIFY price NUMBER(8,2) ALTER TABLE book DISABLE CONSTRAINT book_fk1