9-2 Final Project Submission: Action Plan Report
John C. Aghadiuno
Nasser Halwani
8/23/2020
Table of Contents
- Executive Overview.. 3
- Performance Tuning. 4
- Types of Performance Tuning. 4
- Defense. 4
- Query Design. 5
- Explanation. 5
- Efficiency. 5
- Design. 6
- Data Duplication and Redundancy. 7
- Explanation. 7
- Defense. 7
- Multiple-User Access Issues and Errors. 7
- Strategy. 7
- Methods. 8
- References. 10
Executive Overview
“The County of Everstone has a total of 34 incorporated cities within its boundaries and a population of 4 million residents. The county uses a database management system that is responsible for the county’s property tax assessment, parcel, and payment information. In a recent program, the county has been highly advocating the use of its online property tax payment and lookup information to reduce mailed payments and in-office visits.
The marketing campaign for online property tax payments and information lookup has been phenomenally popular. However, the county’s database systems have not sufficiently handled the increased traffic. As such, the web application open to residents often experiences excessive amounts of latency, often locks up, or in many instances is unavailable. The board of directors has asked executive IT management to probe the issue and provide a speedy solution to the matter.
A review of the hardware and software systems that the database management systems uses shows that these systems should be able to support the capacity and load. Therefore, the IT management has ruled out the hardware systems as being the issue.
Upon further examination, the issue appears to be caused by a need to provide performance tuning on the database. The database design is the culprit, since this database systems administrator noted copious quantities of duplicated data, poorly written database queries, and triggers.
Furthermore, the chief information officer (CIO) has also instructed the web programmers and web developers to examine at the web application for potential issues. The web development team has reported that while no design or programming issues were found with the web application, the stored procedures (queries) that it runs from the database server take long periods of time to run and frequently lock up the web servers. They suspect the queries were not designed for efficiency, since they were created before more individuals started using the online property tax payment system” (Southern New Hampshire University, 2020).
Performance Tuning
Types of Performance Tuning
Performance tuning refers to the improvement of system performance. “The need for performance tuning can either be real or expected because most systems will react to increased load with some level of decreasing performance, performance tuning can therefore be proactive or reactive” (Dickens, Heidelberger, & Nicol, 1994).
SQL Server Performance Tuning can be a challenging task, especially when working with a massive database where even minor changes can raise a significant impact on the existing query performance.
Performance tuning deals with the configuration and specifications of the Server, Client, Design & Architecture, Database Engine, Hardware & Network design. As the first step of SQL Server configuration, it is essential to calculate the required or optimal Buffer Memory (RAM), Storage, CPU’s, etc. Various kinds of hardware are available in the market with a variety of performance capabilities to choose from.
When executing queries, SQL Server will load data into the RAM, which is then processed before returning to the client. Therefore, we should know what the RAM value for the SQL Server should be? Buffer memory should be of a size that is enough to support concurrent active data of the database.
Defense
Multithreading by processors improves the performance of parallel SQL Server thread executions. SQL Server provides support for specifying the processor sockets with the cores and their Logical CPUs for the processing affinity and I/O affinity.
Physical Disk and Logical Disk can both be used in Infra standard. Logical Disk is the logical partitions of the physical disk with the drive letter. The logical disk is used to distribute the data into multiple logical units. Each disk has the IOPS life with the (n) number of total IOPS. When it crosses this digit, disk refreshment may be required to sustain and maintain the database performance.
Bytes per Second Disk IO is also one of the performance parameters for the disk. Different quality disks have various capacities for performance in a range of IO per second on disk. (SQL Server Hardware Performance Tuning, 2020)
Query Design
Explanation
SQL performance tuning consists of making the queries of a relation database run as fast as possible. SQL performance tuning is not a single tool or technique. Rather, it is a set of practices that makes uses of a wide array of techniques, tools, and processes.
When querying a database, optimization is crucial. An inefficient query will deplete the database’s resources and cause slow performance or loss of service for other users, especially if the query has errors. It is important that queries are optimized for minimum impact on database performance.
Efficiency
When running exploratory queries, many SQL developers use SELECT * (read as “select all”) as a shorthand to query all available data from a table. However, if a table has many fields and many rows, this taxes database resources by unnecessarily querying a lot of needless data.
Using the SELECT statement will make the database to query only the data needed to meet the business requirements. Here is an example SQL query where the business request names and addresses for customers against the database.
Inefficient Query:
SELECT * FROM Students
This query will fetch all data stored in the customer table, such as phone numbers, activity dates, and notes from sales and customer service.
Efficient:
SELECT FirstName, LastName, Address, City, State, Zip FROM Students
This query is much better and only fetches the required customer information for customer invoice table.
Some developers and administrators also prefer to make joins with WHERE clauses, such as the following:
SELECT Students.StudentID, Students.Name, Exams.LastTestDate FROM Students, Exams WHERE Students.StudentID = Exams.StudentID
The problem with this type of join is that it creates a Cartesian Join, also called a Cartesian Product or a CROSS JOIN.
In a Cartesian Join, all combinations of the variables are created. In this example, if we had 1,000 customers with 1,000 total sales, the query would first generate 1,000,000 results, then filter for the 1,000 records where StudentID is correctly joined. This is an inefficient use of database resources, as the database has done 100x more work than needed. Cartesian Joins are especially problematic in large-scale databases, because a Cartesian Join of two large tables could create billions or trillions of results.
Design
In order to avoid a Cartesian Join, an INNER JOIN command is used instead and a query such as this should be used:
SELECT Students.StudentID, Students.Name, Exams.LastTestDate FROM Students INNER JOIN Exams ON Students.StudentID = Exams.StudentID
The database would return only the 1,000 desired records where StudentID supplied is equal to StudentID in Exams schema.
Some DBMS systems can recognize WHERE joins and run them internally as INNER JOINs instead. In these DBMS systems, there is no difference in performance between a WHERE join and INNER JOIN. However, INNER JOIN is understood by all database systems.
Data Duplication and Redundancy
Explanation
Data redundancy occurs when the same piece of data exists in multiple places, whereas data inconsistency is when the same data exists in different formats in multiple tables. Unfortunately, data redundancy can cause data inconsistency, which can provide a company with unreliable and/or meaningless information.
Database normalization is the process of efficiently organizing data in a database so that redundant data is eliminated. This process can ensure that all of a company’s data looks and reads similarly across all records. By implementing data normalization, an organization standardizes data fields such as student names, addresses, and phone numbers.
Defense
Normalizing data involves organizing the columns and tables of a database to make sure their dependencies are enforced correctly. The “normal form” refers to the set of rules or normalizing data, and a database is known as “normalized” if it is free of delete, update, and insert anomalies.
When it comes to normalizing data, each organization is different and might have their own unique set of criteria. Therefore, what one organization believes to be “normal”, may not be “normal” for another organization. For instance, one company may want to normalize the state or province field with two digits, while another may prefer the full name. Regardless, database normalization can be the key to reducing data redundancy across any company. (Pearlman, 2019)
Multiple-User Access Issues and Errors
Strategy
Most database applications are multi-user applications. This means that, at any given point in time, there are multiple persons and/or processes reading from and writing to a database. Given that multiple persons/processes are updating data in a database, it is only a matter of time before two separate persons/processes will try to update the same piece of data.
A typical update cycle consists of:
- Read into memory
- Update in memory, and
- Write back to the database
Therefore, there will be times where two users will both read the same data into memory. User 1 will update the data and write those changes back to the database before user 2 does the same thing. Now you have a concurrency control conflict because user 1 read the data before user 2 wrote it back to the database. Why? Because if user 1 writes his data back into the database he will overwrite the changes made by user 2, causing them to be lost. Whoever saves their changes second will win, overwriting the changes made by whoever saves first.
This kind of database concurrency issue can occur both with humans or automated processes or a combination of the two. A concurrency issue is more likely to occur with human users as the read/update/write cycle is likely to take much longer. However, that said, the same concurrency issue can occur between automated processes and it is harder to solve because in the case of an update by a human you can ask what the user wants (do they want to overwrite changes made by another user?) and respond to that while a process needs to have all actions fully automated. (De Beijer, n/a)
Methods
The technique I recommend for avoiding concurrency issues is called timestamping, in which each record in the database has a “last modified date/time” field. This value is then updated whenever a new record is inserted and when an existing record is updated. Timestamping then works much like the read before write technique, except that instead of saving the entire record as a before image, only its last modified date/time field needs to be saved as user state information.
Similarly, just before changes to a record are saved to the database, only the timestamp field is retrieved and compared with the timestamp that was saved when the record was first retrieved. At this point, timestamping works just like read before write. If the timestamp values are different, the application knows that another user has updated the record, so it cancels the update process and an error message is displayed to the user.
Timestamping is recommended as a general solution because:
- It is DBMS-independent.
- It works even with non-DBMS data such as random-access files.
- It does not require the application to keep a connection with the database, so it works in stateless environments such as HTTP.
- It is simpler to code than read before write.
Bibliography
(2020). Retrieved from Southern New Hampshire University: https://learn.snhu.edu
Conceptual Models – What Are They and How Can You Use them? (2017 , March 23). Retrieved from Airbrake : https://airbrake.io/blog/sdlc/conceptual-model
Conceptual schema. (n.d.). Retrieved from Wikipedia: https://en.wikipedia.org/wiki/Conceptual_schema
Conger, S. (n.d.). Hands-On Database. Retrieved from VitalSource Bookshelf: https://bookshelf.vitalsource.com/#/books/9780133927078
Connolly, M. T., & Begg, C. (2015). Database Systems: A Practical Approach to Design, Implementation, and Management (6th ed.). n/a: Pearson Education, Inc. Retrieved July 2, 2020, from https://bookshelf.vitalsource.com/#/books/9780134410951/pageid/0
De Beijer, M. (n/a). Database Concurrency Conflicts in the Real World. Retrieved from Code Magazine: https://www.codemag.com/Article/0607081/Database-Concurrency-Conflicts-in-the-Real-World
Dickens, P. M., Heidelberger, P., & Nicol, D. M. (1994, July). A distributed memory LAPSE: parallel simulation of message-passing programs. Retrieved from The ACM Digital Library: https://dl.acm.org/doi/10.1145/182478.182488
SQL Server Hardware Performance Tuning. (2020, April 14). Retrieved from SQLShack: https://www.sqlshack.com/sql-server-hardware-performance-tuning/
The 5 Major Stages of the Database Development Lifecycle. (2019, November 20). Retrieved from Apex Technology: https://trustapex.com/the-5-major-stages-of-the-database-development-lifecycle/