Database Monitoring and Troubleshooting
Database overseeing and some quick fix ups
Database servers deliver mission-critical services to businesses and over the web. Optimizing a database is not enough; errors and bottlenecks will almost certainly occur. A large part of managing a database is monitoring it to see what is going on - this makes sense from a security standpoint as well - and troubleshooting or fixing problems. Performance bottlenecks and database server failures can have far reaching effects and cause huge financial losses.
Effectively monitoring and managing a database requires expertise. The results of monitoring are an excellent input to further optimization and improvement. Both the DBMS console - where a user types commands at a prompt on a window or uses a GUI to see graphical analysis - and log files may be used to monitor a database. What sort of operations on a DBMS are monitored and analyzed?
- Is the database server always available to service requests? How long does servicing take for the different types of request?
- How many users access the database concurrently? How long is an average session and what is requested most often?
- Which SQL queries take up most of the server's CPU at any given time?
- Have there been any security related alerts? What are they? What caused these alerts?
- Do entire tables get scanned? How frequently are entire tables searched through?
- Are any databases getting close to exceeding the storage allocated for them?
- Are there any invalid, corrupt, or redundant objects (tables, queries) in the database?
- Which tables are modified most often? What are some recent modifications?
Preventive monitoring (against troubleshooting) involves monitoring and gauging some of the above to identify bottlenecks, failures and errors and correct them before users complain or the organization suffers a loss due to data failure. Performance indicators such as those given above may be measured using the DBMS's built in monitoring and analysis tools. The results should be compared with ideal values. If they fall short, the underlying cause - which is mostly not at the place where the failure occurs - should be traced and concerned objects and queries redesigned. The latter part constitutes troubleshooting. However, most often, it is the user complaints, rather than preventive monitoring, that result in the identification of problems and their resolution.
Capturing information about usage and execution times in a log file will help identify the cause of a problem. This sort of capture uses up less overhead on the DBMS than running a complex analysis tool and clearly indicates the amount of CPU used, the exit status of each query and so on. A query that uses up too much of the CPU represents a different sort of problem than a query or command that simply fails. The former should be tuned while the latter corrected.
Every piece of code and procedure should exit gracefully if an error is encountered and log the error. This ensures that the error is traced and resolved and does not hamper the efficiency of the database. For example, adding the following code to the end of the OnChange event procedure we wrote for the 'EnterMember' form in the eighth chapter will allow for a graceful exit and output a message that will point a troubleshooter to the submission of the form. Instead of the screen or an alert box, a message may be written to a log file such as the one mentioned above.