The ER Model and Database Design
The Entity Relationship model is the current standard in describing the sort of information that a database contains. As we have read in the previous chapter, the way a database is stored on the computer is of no interest to us; we only work with the ER model-based Logical view. This logical view will always contain one or more tables. Contemporary Database Management Systems (DBMS) make the logical view of a database intuitive and easy to manage by means of a Graphical User Interface (GUI). The Graphical User Interface is what you use to get around in a windows based operating system - icons represent programs and files, windows open up when you click on something, and wizards walk you through difficult tasks.
Microsoft Access is a DBMS that is bundled with Microsoft Office. A sample MS access database called 'HealthyGym' has been designed and filled with data to help you understand this tutorial and complete the accompanying exercises. You may download this database onto your desktop and open it using MS Access. A database in MS Access is represented in the following way:
Screenshot 2a: A database in MS Access's GUI
Each table describes an e ntity , an item of importance to the users of the database. For example, Members, Services, and Equipment are entities in the Gym's database. The columns of a table in a relational database describe the entity; like in the gym database, columns in the Member table may contain the name, address, membership, enrollment expiry date, and gender of the member. Each column is restricted to contain just a certain sort of information. For example, the gender column may be restricted to contain either 'M' or 'F' and the expiry date column may be constrained to contain a date. The Rows in a table, also referred to as records, contain different values of the entity. That is, each row in the member table will describe one member. All the tables of a database together form the database schema.
Screenshot 2b: A table's columns and constraints
How does one identify a member? For instance, by name? Although a gym membership role may not contain too many duplicate names, it is possible that two members share a name. Also, Computers do not read information the way we do; using a computer to compare strings of alphabets is relatively costly and fraught with possibilities of error: misspelled names, extra spaces and so on will not be matched. A good way to avoid this sort of mistake is to assign unique numerical IDs to every member and use this ID to distinguish one from the another.
A Primary Key of a table functions as such - an identifier. A table is also usually indexed on the primary key. A table may contain many columns while each row may contain a large amount of data. Finding data by going through such a table takes too much time. An indexed column may be thought of as a sorted list; each value of the indexed column is associated with the actual physical location of the row containing that value in the table. The searching program can look for the value of interest and its physical location in the indexed list rather than in the large, unorganized actual table. Once found, the program can actually jump to the position of the exact row of interest using the information about the physical location of the row instead of moving sequentially through thousands, sometimes hundreds of thousands of records.
Databases may also be queried, or asked, for information. For example, the manager of the Gym may want to know how many members have not renewed their membership by the last week of the month. He has to put this query to the Gym database to get an answer. The language he should use to speak to the database is probably a variant of SQL, or Structured Query Language, a world standard in programming for databases. He would type a question, or query, in the SQL language on his DBMS and get the information he is looking for in return. See the example below - this query consolidates data from three tables and presents a list of members with the services they have paid for. The user may need to ask this same question every month; it may make sense for him to save this query for later use. In this case, the query's results become a View into the database. A view is conceptually similar to a table but it is not actually stored separately; it is just the result of a query that draws information from one or more tables.
Screenshot 2c: A Query - SQL Language code, Output
Normally, databases contain analytic or operational data; many contain a combination of both. Analytic data is the kind that is rarely modified and frequently used. In the Gym example, the services offered at the Gym and the fees per service represent Analytical data. Enrollment records and Payment records represent operational data - data that is added to and modified frequently. The Gym manager may use queries to get analytic data; but what if he wants to add a new member to the Member's table or change an existing member's address? Would he actually go into the table and add a new row? Filling out customized Forms is an easy way to make changes to operational data. Most of us have probably filled out paper forms or forms on the world wide web; database forms look similar to paper forms and act similar to www forms. They collect information about an entity and change or add rows to the concerned table upon submission. More often than not, forms contain mechanisms that help check the values entered by the user before submission.
Screenshot 2d: Viewing a Member or Adding a member through a Form
Let us say that the cashier at the gym needs a list of members and their addresses in order to send everyone a promotional flyer. The cashier needs not look through the database or query it and then type out an invoice; she may use a custom built report that generates such a report. Basically, a report is a program that runs several queries and obtains the result in a user friendly custom format.
Screenshot 2d: A report of Gym Members and Addresses
Occasionally, a user may need information that cannot be obtained from a simple query. He may need totals or averages that draw together information from several queries, or perform a set of complex operations. He may need this same information over and over again. To achieve this, he may use functions or stored procedures. These are simply sets of queries that are bound together by certain rules. They normally take in parameters or input values and compute and return results based on the input values.
Sometimes, a change in the data in one table may call for an automatic change in other tables. For example, every morning, the entry attendant may need a list of memberships that expire the same day. A trigger is a series of actions that takes place every time something happens; i.e., every time a new row is added or a current row is updated, at a set time every day, or when the DBMS is shutdown. The event triggers the specified set of operations. The triggering events and related operations are stored in the database much like a stored procedure.
Screenshot 2e: Relationships between tables in MS Access
Due to security reasons, users may have restricted access to a database. Some users may only have permission to view and print reports while others may have permission to add or delete information. The list of a database's users, their profiles, usernames and passwords, and details on the operations they are allowed to perform are also represented alongside the tables, queries, forms and so on in the Graphical User Interface of contemporary DBMS.
We now have a bird's eye view of what database is. Are you ready to go ahead and make your own?
How do you go about designing a database? Designing a database is not an easy task; professionals may spend many months studying the data to be stored and the organization's needs before coming up with a good design. However, every database should stick to certain broad guidelines. Databases should do three things very well: They should store information as compactly and as efficiently as possible; they should alter information in the best and fastest possible way; and they should bring up information in response to queries as speedily as possible.
A good way to ensure that all this happens is to make sure that tables describe a single entity. For example, using a single table to describe both employees and members of a gym is not good practice. Every column should deal with a separate subject, and the same data should not appear over and over again. Hence, setting up a primary key or unique identifier for every table is good practice. The Database Normalization rules formalize good practice specifications such as these; we will go over these later in the tutorial.
The needs of an organization may change; a new employee may want to add data that had previously been filed away somewhere. The database should be designed in such a way that it can accommodate changes. A database should also be able to store a lot more data if the organization and its data needs expand. A database should implement some sort of security so that information does not get into the wrong hands. Making sure two people do not make changes to a single piece of data is called concurrency control. This is very important; a database may get many requests for the same piece of information at the same instant through the World Wide Web or other remote channels. In general, exhaustive communication with database users and members of the organization about their requirements is the most important perquisite for a good design.