Database and query for students’ grades

Imagine a simple database storing information for students’ grades. Design what this database might look like, and provide a SQL query to return a list of the honor roll students (top 10%), sorted by their grade point average.

My initial thoughts:
Students' Grades ER
The query is given by:

FROM Student JOIN StudentTakesCourse
GROUP BY Student.Name

In a simplistic database, we’ll have at least these three objects: Students, Courses, and courseEnrollment. Students will have at least the student name and ID, and will likely have other personal information. Courses will contain the course name and ID, and will likely contain the course description, professor, etc. CourseEnrollment will pair Students and Courses, and will also contain a field for CourseGrade. We will assume that CourseGrade is an integer.
Our SQL query to get the list of honor roll students might look like this:

SELECT StudentName, GPA
  SELECT top 10 percent Avg(CourseEnrollment.Grade) AS GPA,
  FROM CourseEnrollment
  GROUP BY CourseEnrollment.StudentID
  ORDER BY Avg(CourseEnrollment.Grade)) Honors
INNER JOIN Students ON Honors.StudentID = Students.StudentID

This database could get arbitrarily more complicated if we wanted to add in professor information, billing, etc.

ER diagram for a database of company

Draw an entity-relationship diagram for a database with companies, people, and professionals (people who work for companies).

My initial thoughts:

  • Entities:
    1. Company: companyID, companyName, companyLocation, ect
    2. People: peopleID, name, gender, age, etc
    3. Professional: professionalID
  • Relations:
    1. People ISA Professionl: 1:1
    2. Professional WORKS Company: N:1

People who work for companies are Professionals. So there is an ISA (is a) relationship between People and Professionals (or we could say that a Professional is derived from People).
Each Professional has additional information such as degree, work experiences, etc, in addition to the properties derived from People.
A Professional works for one company at a time, but Companies can hire many Professionals, so there is a Many to One relationship between Professionals and Companies. This “Works For” relationship can store attributes such as date of joining the company, salary, etc. These attributes are only defined when we relate a Professional with a Company.
A Person can have multiple phone numbers, which is why Phone is a multi-valued attribute.
ER diagram for a company

What is denormalization?

What is denormalization? Explain the pros and cons

Denormalization is the process of attempting to optimize the performance of a database by adding redundant data or by grouping data. In some cases, denormalization helps cover up the inefficiencies inherent in relational database software. A relational normalized database imposes a heavy access load over physical storage of data even if it is well tuned for high performance
A normalized design will often store different but related pieces of information in separate logical tables (called relations). If these relations are stored physically as separate disk files, completing a database query that draws information from several relations (a join operation) can be slow. If many relations are joined, it may be prohibitively slow. There are two strategies for dealing with this. The preferred method is to keep the logical design normalized, but allow the database management system (DBMS) to store additional redundant information on disk to optimize query response. In this case, it is the DBMS software’s responsibility to ensure that any redundant copies are kept consistent. This method is often implemented in SQL as indexed views (Microsoft SQL Server) or materialized views (Oracle). A view represents information in a format convenient for querying, and the index ensures that queries against the view are optimized.
The more usual approach is to denormalize the logical data design. With care, this can achieve a similar improvement in query response, but at a cost — it is now the database designer’s responsibility to ensure that the denormalized database does not become inconsistent. This is done by creating rules in the database called constraints, that specify how the redundant copies of information must be kept synchronized. It is the increase in logical complexity of the database design and the added complexity of the additional constraints that make this approach hazardous. Moreover, constraints introduce a trade-off, speeding up reads (SELECT in SQL) while slowing down writes (INSERT, UPDATE, and DELETE). This means a denormalized database under heavy write load may actually offer worse performance than its functionally equivalent normalized counterpart.
A denormalized data model is not the same as a data model that has not been normalized, and denormalization should only take place after a satisfactory level of normalization has taken place and that any required constraints and/or rules have been created to deal with the inherent anomalies in the design. For example, all the relations are in third normal form and any relations with join and multivalued dependencies are handled appropriately.

Different types of joins in SQL

What are the different types of joins? Please explain how they differ and why certain types are better in certain situations.

JOIN is used to combine the results of two tables. To perform a join, each of the tables must have at least one field which will be used to find matching records from the other table. The join type defines which records will go into the result set.
Let’s take for example two tables: one table lists “regular” beverages, and another lists the calorie-free beverages. Each table has two fields: the beverage name and its product code. The “code” field will be used to perform the record matching.

Regular Beverages:
Name Code
Calorie-Free Beverage:
Code Name
COCACOLA Diet Coca-Cola
PEPSI Diet Pepsi
PEPSI Pepsi Light
PEPSI Purified Water

Let’s join this table by the code field. Whereas the order of the joined tables makes sense in some cases, we will consider the following statement:

[Regular Beverage] JOIN [Calorie-Free Beverage]

i.e. [Regular Beverage] is from the left of the join operator, and [Calorie-Free Beverage] is from the right.

  1. INNER JOIN: Result set will contain only those data where the criteria match. In our example we will get 3 records: 1 with COCACOLA and 2 with PEPSI codes.
    Regular_Beverages.Name Code Calorie-Free_Beverages.Name
    Coca-Cola COCACOLA Diet Coca-Cola
    Pepsi PEPSI Diet Pepsi
    Pepsi PEPSI Pepsi Light
  2. OUTER JOIN: OUTER JOIN will always contain the results of INNER JOIN, however it can contain some records that have no matching record in other table. OUTER JOINs are divided to following subtypes:
    1. LEFT OUTER JOIN, or simply LEFT JOIN: The result will contain all records from the left table. If no matching records were found in the right table, then its fields will contain the NULL values. In our example, we would get 4 records. In addition to INNER JOIN results, BUDWEISER will be listed, because it was in the left table.
      Regular_Beverages.Name Code Calorie-Free_Beverages.Name
      Budweiser BUDWEISER NULL
      Coca-Cola COCACOLA Diet Coca-Cola
      Pepsi PEPSI Diet Pepsi
      Pepsi PEPSI Pepsi Light
    2. RIGHT OUTER JOIN, or simply RIGHT JOIN: This type of join is the opposite of LEFT JOIN; it will contain all records from the right table, and missing fields from the left table will contain NULL. If we have two tables A and B, then we can say that statement A LEFT JOIN B is equivalent to statement B RIGHT JOIN A. In our example, we will get 5 records. In addition to INNER JOIN results, FRESCA and WATER records will be listed.
      Regular_Beverages.Name Code Calorie-Free_Beverages.Name
      Coca-Cola COCACOLA Diet Coca-Cola
      NULL FRESCA Fresca
      Pepsi PEPSI Diet Pepsi
      Pepsi PEPSI Pepsi Light
      NULL Water Purified Water
    3. FULL OUTER JOIN. This type of join combines the results of LEFT and RIGHT joins. All records from both tables will be part of the result set, whether the matching record exists in the other table or not. If no matching record was found then the corresponding result fields will have a NULL value. In our example, we will get 6 records.
      Regular_Beverages.Name Code Calorie-Free_Beverages.Name
      Budweiser BUDWEISER NULL
      Coca-Cola COCACOLA Diet Coca-Cola
      NULL FRESCA Fresca
      Pepsi PEPSI Diet Pepsi
      Pepsi PEPSI Pepsi Light
      NULL Water Purified Water

Find the number of employees in each department using SQL

Write a method to find the number of employees in each department.

My initial thoughts(code):

SELECT Department.DepartmentName, COUNT(employee.lastName)
FROM employee, department 
WHERE employee.DepartmentID = department.DepartmentID
GROUP BY Department.DepartmentName


SELECT Dept_Name, Departments.Dept_ID, COUNT(*) AS 'num_employees'
FROM Departments
LEFT JOIN Employees
ON Employees.Dept_ID = Departments.Dept_ID
GROUP BY Departments.Dept_ID, Dept_Name