Introduction to Database
Basic concepts of what a database is
Data storage and management is an issue that confronts all organizations and businesses in today's information age. Businesses are faced with the task of not only storing customer, product, and order data but also with 'mining' existing data for relevant statistics and qualitative analysis. Computing in general and Database Management Systems (DBMS) in particular automate and manipulate data storage and provide businesses with the capacity to view their data in custom formats, display it over the Internet or on a workstation, connect it to other data repositories, perform qualitative and quantitative analysis, among others, in an efficient and secure manner.
Fig 1a: Examples of Databases
The DBMS began in the 1960s and early 1970s. Early DBMS were not the flexible data containers we encounter today, rather, they were tailored to fit the needs of a single organization. Adding new types of data or reorganizing current data was problematic. Relational Databases were conceived of in the early seventies and they have replaced the earlier legacy systems over the span of a few decades. Relational databases are normalized or standardized in a few ways that make them useful across the board. They also present data in a way that most people can easily work with and manipulate. This presentation or logical representation however does not mirror the actual way in which data is stored on the hard disk of the computer.
The relational data model was defined by E.F. Codd in 1970 through a landmark research paper called "A Relational Model of Data for Large Shared Data Banks".
A relational database is a group of related 2-dimensional tables. Each table describes the relation of the main subject of the table to important information. The columns of the table contain the types of information stored about the main subject. For example, the main subject may be EMPLOYEE and the columns may contain EMPLOYEE_ID, EMPLOYEE_DEPARTMENT, and EMPLOYEE_SALARY. Each column has a range of permissible values, called a domain. For example, EMPLOYEE_SALARY should be a number greater than zero. The rows in the database (referred to as tuples) would contain information on different employees.
Searching for an employee in a relational database is quite easy; a user may match the EMPLOYEE_ID to find an entry in the table. How the relational database is represented in the hard drive of the computer - physical storage - becomes unimportant compared to how users and database administrators view the table - the logical view.
The ER or Entity Relationship model was developed in the late seventies. In many ways, it is simply a refinement of the relational model. The EMPLOYEE, according to this model, would form an entity. Other entities, such as SKILLS, SHIFT, and HOURS_WORKED, would be connected to EMPLOYEE by a series of well defined relationships. For example, the relationship between EMPLOYEE and SKILLS would be many to many. That is, a given employee may have many skills, and a given skill may be possessed by many employees. However, the relationship between EMPLOYEE and SHIFT would be many to one; many employees may work in one shift, but a single employee cannot work many shifts (unless he is really strong or really poor). The ER model remains the standard today. Another standard that came to us from the 70’s is the Sequential Query Language, or SQL, a programming language that is based on the relational model and can be used to efficiently pull up and alter data from any relational database.
Fig 1b: The ER Model
In 1974, IBM funded a project called System/R. This project was started with a view towards practically applying and thus validating the theory of relational databases. The query language called SEQUEL (Structured English Query Language) was developed as part of System/R. This language was later renamed to Structured Query Language (SQL) and is now a standard way of accessing, viewing, manipulating, and modifying the contents of relational databases. The System/R prototype was successfully tested at a number of IBM customer sites in 1978 and the implementation of relational databases in commercial contexts became feasible.
Meanwhile, a few engineers from California had formed a company called Relational Software, Inc. whose objective was to devise products that applied Relational Database principles. Relational Software was the creator of the Oracle DBMS. The first commercial version was released in 1979 for the VAX/VMS Operating System. Oracle was also released two years before IBM's DB2 product. In the 1980s, Relational Software Inc. was renamed to the Oracle Corporation. New features have been added to the original DBMS (of solid design in itself) and performance was enhanced continuously with advances in computing resources like memory and chip speed.