Lecture 10 - Databases and SQL

View notebook on Github Open In Collab

10.1 Introduction to SQL

SQL (Structured Query Language) is a programming language designed for managing data in Relational Data Base Management Systems (RDBMS), or for stream processing in Relational Data Stream Management Systems (RDSMS). A relational database is a database that stores related information across multiple tables, and allows to query information in more than one table at the same time. Within a table, the data is organized in a tabular format with rows and columns.

SQL was initially developed in 1970, and since then different companies and vendors implemented SQL in their products with some variations. To bring greater conformity in the variants of SQL, the American National Standards Institute (ANSI) published the first SQL standard in 1986. The standard has been updated every few years since then.

Today, there are several variants of SQL for database management systems available, some of which were developed by companies such as IBM and Oracle, as well as there are variants developed by communities, such as MySQL, PostgreSQL, MariaDB, etc. Although these variants of SQL have certain differences, they are based on the basic SQL syntax, and are quite similar.

The main advantages of SQL include standardized syntax (since all relational database systems have an SQL query interpreter built-in), and is easy-to-understand due to using English-like commands and functions.

Relational Databases

Relational databases store information in multiple tables, which allows to work with more complex data, and have flexibility in the way the data is organized. An example is shown in the next figure, where a database is shown that is used for managing the HR data of a small business.

This database has seven tables:

  • Jobs table stores data related to job title and salary range.

  • Employees table stores the data of employees.

  • Dependents table stores the employee’s dependents.

  • Departments table stores department data.

  • Regions table stores the data of regions such as Asia, Europe, America, Middle East, and Africa.

  • Countries table stores the data of countries where the company is doing business.

  • Locations table stores the location of the departments of the company.

Each table contains many records with rows and columns (similar to an Excel spreadsheet), and the records have relationships across the tables. Using multiple tables in a relational database allows us to avoid duplication of information, in comparison to using a single table to store all information. Also, it provides flexibility in how we work with the data. To establish relationships between the records in different tables we need to use an ID or identifier for each employee. The identifier for each employee, or in general for each record (row) in a relational database, is referred to as primary key. For instance, each employee can be assigned an ID value (such as employee 162), and each table would have an ID column (primary key column) to establish the relationship with the other tables in the database.

277ca94bae8e44d8b1abfd2e674ace40 Figure source: Reference [1].

SQL for Data Science

SQL is a very important tool for data scientists, data analysts, developers, and database administrators. In particular, as many companies become data-driven, SQL becomes an essential tool for handling data stored in databases and performing various data analytics operations, such as calculating data statistics, updating records, removing duplicate columns, calculating correlations between records, and similar.

SQL versus Pandas

SQL has similarities with the Pandas library, as it offers similar functionality to Pandas, which includes data manipulation over rows and columns, data merging, grouping, dealing with missing values, and similar. However, Pandas is not a relational database management library, but it is a data frame library.

Still, Pandas offers additional functions and flexibility for handling and manipulating tabular data, and many users download databases to their local machine, and afterward use Pandas for data processing, rather than using SQL to process the data on the server.

The benefits of using SQL over Pandas can depend on the task. Several considerations include:

  • In the case of a large database of information (e.g., GigaBytes of data), downloading the database to the local machine to be processed by Pandas may be slow or infeasible. Pandas is more suitable for processing small to medium size databases in Python.

  • Even if the user can download the data on the local machine for processing with Pandas, it may be required to apply some level of preprocessing or organizing the data on the server using SQL.

  • Some tasks can require that the data processing is done in the existing database. Also, when the tasks require fast data retrieval and processing, SQL can be more efficient than Pandas.

SQLite

In this lecture, we will use SQLite which implements a self-contained, serverless SQL database engine. SQLite is lightweight in terms of setup and required recourses. Unlike most other SQL variants, SQLite does not have a separate server process, and it reads and writes directly to disk files. That is, it does not use the client/server model. Because it has no server managing access to it, SQLite is not suitable in multiuser environments where multiple people can simultaneously edit files.

10.2 Using SQLite with Python

To demonstrate the use of SQLite with Python, in this lecture we will use magic commands in Jupyter Notebook. Magic commands are special commands which are not valid Python code, but perform certain actions in a Jupyter Notebook. They begin with the % symbol.

The library ipython-sql offers the magic functions %sql and %%sql, which allow to connect to a database and use standard SQL commands in Jupyter Notebooks. To run ipython-sql on your computer, it needs to be installed (e.g., by pip install ipython-sql). If we run the notebooks on Google Colab, ipython-sql comes preinstalled.

To load the ipython-sql library we will use %load_ext sql as in the next code. %load_ext is a magic command that loads an external package that can add new magic commands.

[1]:
%load_ext sql

The magic command %sql is used to execute an SQL query that is contained in a single statement in a Jupyter notebook cell, and %%sql allows to execute an SQL query that is contained in multiple SQL statements in a single cell.

10.3 Create a New Table

To create a new table we will use the SQL command CREATE TABLE, as shown in the next cell. If the table already exists in the database, an error message will show up.

In order to establish a connection to the newly created table, we used %%sql sqlite:// in the cell below. If we wanted to create a new table in an existing database to which we have already established a connection, we could have used only the magic command %%sql.

SQL has many commands or keywords that have special meaning, such as SELECT, INSERT, DELETE, and these keywords cannot be used as names of tables, columns, or other objects.

To make SQL language more readable, it is a convention to write the SQL commands with uppercase letters, and the other variables and identifiers with lowercase letters. However, this is not required, as the SQL commands are not case-sensitive.

Let’s create a table called cars which has 3 columns: id, name, and price. In the cell below, we specify that the values of id and price columns are integers, denoted by the INTEGER keyword. The names in the name column have a character string type (VARCHAR, i.e., variable-length character string) and should have at most 50 characters. We also specified with NOT NULL that the values should not be missing in the id and name columns, i.e., when we insert data into the table we have to specify the values for the NON NULL columns.

Each table can have only one PRIMARY KEY column that uniquely identifies each row in the table, and prevents from inserting duplicate rows in the table. For the table below we set PRIMARY KEY to the id column. It is not required to define a primary key, however, it is a good practice to do it for every table.

[2]:
%%sql sqlite://
CREATE TABLE cars(
    id INTEGER NOT NULL PRIMARY KEY,
    name VARCHAR(50) NOT NULL,
    price INTEGER);
Done.
[2]:
[]

The table that we just created is empty, and to add data to the table we will use the INSERT statement. In each row we provide values for id, name, and price.

When using multiple statements in SQL, the statements in each line need to be separated with a semicolon ;. The last statement in a cell does not have to be followed by a semicolon.

Inline comments can be inserted by using two consecutive hyphens -- that comment the rest of the line, as shown in the second line of the next cell.

And also, comments that span multiple lines can be inserted by using the multiline C-style notation /* comment */ as in the last lines in the cell.

Note also that we used just the magic command %%sql in this cell, and we didn’t need to write %%sql sqlite:// as in the above cell. The reason is that in the above cell we used %%sql sqlite:// to establish a connection to the newly created table. Once a connection is established, we can use only %sql or %%sql to work with the table.

[3]:
%%sql
INSERT INTO cars VALUES(1,'Audi',52642); --Two consecutive hyphens (--) comment the rest of the line
INSERT INTO cars VALUES(2,'Mercedes',57127);
INSERT INTO cars VALUES(3,'Skoda',9000);
INSERT INTO cars VALUES(4,'Volvo',29000);
INSERT INTO cars VALUES(5,'Bentley',350000);
INSERT INTO cars VALUES(6,'Citroen',21000);
INSERT INTO cars VALUES(7,'Hummer',41400);
INSERT INTO cars VALUES(8,'Volkswagen',21600);
/* A comment that spans
more than one line */
 * sqlite://
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
Done.
[3]:
[]

We can display the table with the following code. Notice again that we used a single % in the magic command %sql, since we have only one line of code.

[4]:
%sql SELECT * from cars
 * sqlite://
Done.
[4]:
id name price
1 Audi 52642
2 Mercedes 57127
3 Skoda 9000
4 Volvo 29000
5 Bentley 350000
6 Citroen 21000
7 Hummer 41400
8 Volkswagen 21600

Another Example of Creating a Table

In the next simple example, we will create another table called writer, with columns FirstName, LastName, and Year.

[5]:
%%sql sqlite://
CREATE TABLE writer(
    FirstName VARCHAR(50) NOT NULL,
    LastName VARCHAR(50) NOT NULL,
    Year INTEGER NOT NULL PRIMARY KEY);
Done.
[5]:
[]
[6]:
%%sql
INSERT INTO writer VALUES ('William', 'Shakespeare', 1616);
INSERT INTO writer VALUES ('Lin', 'Han', 1996);
INSERT INTO writer VALUES ('Peter', 'Brecht', 1978);
 * sqlite://
1 rows affected.
1 rows affected.
1 rows affected.
[6]:
[]
[7]:
%sql SELECT * from writer
 * sqlite://
Done.
[7]:
FirstName LastName Year
William Shakespeare 1616
Peter Brecht 1978
Lin Han 1996

10.4 Database Example

As an example of a database, let’s create a database that was shown in the above section, related to managing the HR data of a small business.

The cells below first create the tables (recall that the database has 7 tables), and afterward the information for each table is inserted.

[8]:
%%sql sqlite://

CREATE TABLE regions (
    region_id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    region_name TEXT NOT NULL);

CREATE TABLE countries (
    country_id TEXT NOT NULL,
    country_name TEXT NOT NULL,
    region_id INTEGER NOT NULL,
    PRIMARY KEY (country_id ASC),
    FOREIGN KEY (region_id) REFERENCES regions (region_id) ON DELETE CASCADE ON UPDATE CASCADE);

CREATE TABLE locations (
    location_id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    street_address TEXT,
    postal_code TEXT,
    city text NOT NULL,
    state_province TEXT,
    country_id INTEGER NOT NULL,
    FOREIGN KEY (country_id) REFERENCES countries (country_id) ON DELETE CASCADE ON UPDATE CASCADE);

CREATE TABLE departments (
    department_id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    department_name TEXT NOT NULL,
    location_id INTEGER NOT NULL,
    FOREIGN KEY (location_id) REFERENCES locations (location_id) ON DELETE CASCADE ON UPDATE CASCADE);

CREATE TABLE jobs (
    job_id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    job_title TEXT NOT NULL,
    min_salary DOUBLE NOT NULL,
    max_salary DOUBLE NOT NULL);

CREATE TABLE employees (
    employee_id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    first_name TEXT,
    last_name TEXT NOT NULL,
    email TEXT NOT NULL,
    phone_number TEXT,
    hire_date TEXT NOT NULL,
    job_id INTEGER NOT NULL,
    salary DOUBLE NOT NULL,
    manager_id INTEGER,
    department_id INTEGER NOT NULL,
    FOREIGN KEY (job_id) REFERENCES jobs (job_id) ON DELETE CASCADE ON UPDATE CASCADE,
    FOREIGN KEY (department_id) REFERENCES departments (department_id) ON DELETE CASCADE ON UPDATE CASCADE,
    FOREIGN KEY (manager_id) REFERENCES employees (employee_id) ON DELETE CASCADE ON UPDATE CASCADE);

CREATE TABLE dependents (
    dependent_id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    first_name TEXT NOT NULL,
    last_name TEXT NOT NULL,
    relationship TEXT NOT NULL,
    employee_id INTEGER NOT NULL,
    FOREIGN KEY (employee_id) REFERENCES employees (employee_id) ON DELETE CASCADE ON UPDATE CASCADE);
Done.
Done.
Done.
Done.
Done.
Done.
Done.
[8]:
[]
[ ]:
%%sql

/*Data for the table regions */
INSERT INTO regions(region_id,region_name) VALUES (1,'Europe');
INSERT INTO regions(region_id,region_name) VALUES (2,'Americas');
INSERT INTO regions(region_id,region_name) VALUES (3,'Asia');
INSERT INTO regions(region_id,region_name) VALUES (4,'Middle East and Africa');

/*Data for the table countries */
INSERT INTO countries(country_id,country_name,region_id) VALUES ('AR','Argentina',2);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('AU','Australia',3);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('BE','Belgium',1);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('BR','Brazil',2);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('CA','Canada',2);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('CH','Switzerland',1);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('CN','China',3);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('DE','Germany',1);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('DK','Denmark',1);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('EG','Egypt',4);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('FR','France',1);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('HK','HongKong',3);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('IL','Israel',4);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('IN','India',3);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('IT','Italy',1);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('JP','Japan',3);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('KW','Kuwait',4);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('MX','Mexico',2);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('NG','Nigeria',4);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('NL','Netherlands',1);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('SG','Singapore',3);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('UK','United Kingdom',1);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('US','United States of America',2);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('ZM','Zambia',4);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('ZW','Zimbabwe',4);

/*Data for the table locations */
INSERT INTO locations(location_id,street_address,postal_code,city,state_province,country_id) VALUES (1400,'2014 Jabberwocky Rd','26192','Southlake','Texas','US');
INSERT INTO locations(location_id,street_address,postal_code,city,state_province,country_id) VALUES (1500,'2011 Interiors Blvd','99236','South San Francisco','California','US');
INSERT INTO locations(location_id,street_address,postal_code,city,state_province,country_id) VALUES (1700,'2004 Charade Rd','98199','Seattle','Washington','US');
INSERT INTO locations(location_id,street_address,postal_code,city,state_province,country_id) VALUES (1800,'147 Spadina Ave','M5V 2L7','Toronto','Ontario','CA');
INSERT INTO locations(location_id,street_address,postal_code,city,state_province,country_id) VALUES (2400,'8204 Arthur St',NULL,'London',NULL,'UK');
INSERT INTO locations(location_id,street_address,postal_code,city,state_province,country_id) VALUES (2500,'Magdalen Centre, The Oxford Science Park','OX9 9ZB','Oxford','Oxford','UK');
INSERT INTO locations(location_id,street_address,postal_code,city,state_province,country_id) VALUES (2700,'Schwanthalerstr. 7031','80925','Munich','Bavaria','DE');

/*Data for the table jobs */
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (1,'Public Accountant',4200.00,9000.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (2,'Accounting Manager',8200.00,16000.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (3,'Administration Assistant',3000.00,6000.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (4,'President',20000.00,40000.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (5,'Administration Vice President',15000.00,30000.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (6,'Accountant',4200.00,9000.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (7,'Finance Manager',8200.00,16000.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (8,'Human Resources Representative',4000.00,9000.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (9,'Programmer',4000.00,10000.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (10,'Marketing Manager',9000.00,15000.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (11,'Marketing Representative',4000.00,9000.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (12,'Public Relations Representative',4500.00,10500.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (13,'Purchasing Clerk',2500.00,5500.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (14,'Purchasing Manager',8000.00,15000.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (15,'Sales Manager',10000.00,20000.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (16,'Sales Representative',6000.00,12000.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (17,'Shipping Clerk',2500.00,5500.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (18,'Stock Clerk',2000.00,5000.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (19,'Stock Manager',5500.00,8500.00);

/*Data for the table departments */
INSERT INTO departments(department_id,department_name,location_id) VALUES (1,'Administration',1700);
INSERT INTO departments(department_id,department_name,location_id) VALUES (2,'Marketing',1800);
INSERT INTO departments(department_id,department_name,location_id) VALUES (3,'Purchasing',1700);
INSERT INTO departments(department_id,department_name,location_id) VALUES (4,'Human Resources',2400);
INSERT INTO departments(department_id,department_name,location_id) VALUES (5,'Shipping',1500);
INSERT INTO departments(department_id,department_name,location_id) VALUES (6,'IT',1400);
INSERT INTO departments(department_id,department_name,location_id) VALUES (7,'Public Relations',2700);
INSERT INTO departments(department_id,department_name,location_id) VALUES (8,'Sales',2500);
INSERT INTO departments(department_id,department_name,location_id) VALUES (9,'Executive',1700);
INSERT INTO departments(department_id,department_name,location_id) VALUES (10,'Finance',1700);
INSERT INTO departments(department_id,department_name,location_id) VALUES (11,'Accounting',1700);

/*Data for the table employees */
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (100,'Steven','King','steven.king@sqltutorial.org','515.123.4567','1987-06-17',4,24000.00,NULL,9);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (101,'Neena','Kochhar','neena.kochhar@sqltutorial.org','515.123.4568','1989-09-21',5,17000.00,100,9);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (102,'Lex','De Haan','lex.de haan@sqltutorial.org','515.123.4569','1993-01-13',5,17000.00,100,9);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (103,'Alexander','Hunold','alexander.hunold@sqltutorial.org','590.423.4567','1990-01-03',9,9000.00,102,6);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (104,'Bruce','Ernst','bruce.ernst@sqltutorial.org','590.423.4568','1991-05-21',9,6000.00,103,6);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (105,'David','Austin','david.austin@sqltutorial.org','590.423.4569','1997-06-25',9,4800.00,103,6);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (106,'Valli','Pataballa','valli.pataballa@sqltutorial.org','590.423.4560','1998-02-05',9,4800.00,103,6);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (107,'Diana','Lorentz','diana.lorentz@sqltutorial.org','590.423.5567','1999-02-07',9,4200.00,103,6);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (108,'Nancy','Greenberg','nancy.greenberg@sqltutorial.org','515.124.4569','1994-08-17',7,12000.00,101,10);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (109,'Daniel','Faviet','daniel.faviet@sqltutorial.org','515.124.4169','1994-08-16',6,9000.00,108,10);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (110,'John','Chen','john.chen@sqltutorial.org','515.124.4269','1997-09-28',6,8200.00,108,10);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (111,'Ismael','Sciarra','ismael.sciarra@sqltutorial.org','515.124.4369','1997-09-30',6,7700.00,108,10);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (112,'Jose Manuel','Urman','jose manuel.urman@sqltutorial.org','515.124.4469','1998-03-07',6,7800.00,108,10);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (113,'Luis','Popp','luis.popp@sqltutorial.org','515.124.4567','1999-12-07',6,6900.00,108,10);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (114,'Den','Raphaely','den.raphaely@sqltutorial.org','515.127.4561','1994-12-07',14,11000.00,100,3);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (115,'Alexander','Khoo','alexander.khoo@sqltutorial.org','515.127.4562','1995-05-18',13,3100.00,114,3);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (116,'Shelli','Baida','shelli.baida@sqltutorial.org','515.127.4563','1997-12-24',13,2900.00,114,3);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (117,'Sigal','Tobias','sigal.tobias@sqltutorial.org','515.127.4564','1997-07-24',13,2800.00,114,3);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (118,'Guy','Himuro','guy.himuro@sqltutorial.org','515.127.4565','1998-11-15',13,2600.00,114,3);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (119,'Karen','Colmenares','karen.colmenares@sqltutorial.org','515.127.4566','1999-08-10',13,2500.00,114,3);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (120,'Matthew','Weiss','matthew.weiss@sqltutorial.org','650.123.1234','1996-07-18',19,8000.00,100,5);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (121,'Adam','Fripp','adam.fripp@sqltutorial.org','650.123.2234','1997-04-10',19,8200.00,100,5);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (122,'Payam','Kaufling','payam.kaufling@sqltutorial.org','650.123.3234','1995-05-01',19,7900.00,100,5);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (123,'Shanta','Vollman','shanta.vollman@sqltutorial.org','650.123.4234','1997-10-10',19,6500.00,100,5);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (126,'Irene','Mikkilineni','irene.mikkilineni@sqltutorial.org','650.124.1224','1998-09-28',18,2700.00,120,5);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (145,'John','Russell','john.russell@sqltutorial.org',NULL,'1996-10-01',15,14000.00,100,8);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (146,'Karen','Partners','karen.partners@sqltutorial.org',NULL,'1997-01-05',15,13500.00,100,8);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (176,'Jonathon','Taylor','jonathon.taylor@sqltutorial.org',NULL,'1998-03-24',16,8600.00,100,8);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (177,'Jack','Livingston','jack.livingston@sqltutorial.org',NULL,'1998-04-23',16,8400.00,100,8);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (178,'Kimberely','Grant','kimberely.grant@sqltutorial.org',NULL,'1999-05-24',16,7000.00,100,8);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (179,'Charles','Johnson','charles.johnson@sqltutorial.org',NULL,'2000-01-04',16,6200.00,100,8);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (192,'Sarah','Bell','sarah.bell@sqltutorial.org','650.501.1876','1996-02-04',17,4000.00,123,5);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (193,'Britney','Everett','britney.everett@sqltutorial.org','650.501.2876','1997-03-03',17,3900.00,123,5);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (200,'Jennifer','Whalen','jennifer.whalen@sqltutorial.org','515.123.4444','1987-09-17',3,4400.00,101,1);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (201,'Michael','Hartstein','michael.hartstein@sqltutorial.org','515.123.5555','1996-02-17',10,13000.00,100,2);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (202,'Pat','Fay','pat.fay@sqltutorial.org','603.123.6666','1997-08-17',11,6000.00,201,2);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (203,'Susan','Mavris','susan.mavris@sqltutorial.org','515.123.7777','1994-06-07',8,6500.00,101,4);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (204,'Hermann','Baer','hermann.baer@sqltutorial.org','515.123.8888','1994-06-07',12,10000.00,101,7);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (205,'Shelley','Higgins','shelley.higgins@sqltutorial.org','515.123.8080','1994-06-07',2,12000.00,101,11);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (206,'William','Gietz','william.gietz@sqltutorial.org','515.123.8181','1994-06-07',1,8300.00,205,11);

/*Data for the table dependents */
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (1,'Penelope','Gietz','Child',206);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (2,'Nick','Higgins','Child',205);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (3,'Ed','Whalen','Child',200);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (4,'Jennifer','King','Child',100);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (5,'Johnny','Kochhar','Child',101);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (6,'Bette','De Haan','Child',102);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (7,'Grace','Faviet','Child',109);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (8,'Matthew','Chen','Child',110);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (9,'Joe','Sciarra','Child',111);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (10,'Christian','Urman','Child',112);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (11,'Zero','Popp','Child',113);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (12,'Karl','Greenberg','Child',108);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (13,'Uma','Mavris','Child',203);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (14,'Vivien','Hunold','Child',103);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (15,'Cuba','Ernst','Child',104);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (16,'Fred','Austin','Child',105);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (17,'Helen','Pataballa','Child',106);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (18,'Dan','Lorentz','Child',107);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (19,'Bob','Hartstein','Child',201);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (20,'Lucille','Fay','Child',202);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (21,'Kirsten','Baer','Child',204);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (22,'Elvis','Khoo','Child',115);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (23,'Sandra','Baida','Child',116);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (24,'Cameron','Tobias','Child',117);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (25,'Kevin','Himuro','Child',118);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (26,'Rip','Colmenares','Child',119);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (27,'Julia','Raphaely','Child',114);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (28,'Woody','Russell','Child',145);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (29,'Alec','Partners','Child',146);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (30,'Sandra','Taylor','Child',176);

10.5 Querying Databases with SELECT

The most common SQL task is to retrieve data from one or more tables. The data is returned in the form of a result table, called result set. This is accomplished with the SELECT statement, which has the following general syntax.

SELECT
    column1, column2, columnN
FROM
    table_name

To run the code in this Jupyter Notebook, we will just insert the magic commands %sql or %%sql in front of each SQL code.

For example, in the next cell we retrieved the columns employee_id, first_name, last_name, hire_date from the employees table. When the statement is evaluated, the database system first evaluates the FROM clause and the SELECT clause afterward. I.e., from the table named employees select the listed columns.

[10]:
%%sql
SELECT
    employee_id, first_name, last_name, hire_date
FROM
    employees
 * sqlite://
Done.
[10]:
employee_id first_name last_name hire_date
100 Steven King 1987-06-17
101 Neena Kochhar 1989-09-21
102 Lex De Haan 1993-01-13
103 Alexander Hunold 1990-01-03
104 Bruce Ernst 1991-05-21
105 David Austin 1997-06-25
106 Valli Pataballa 1998-02-05
107 Diana Lorentz 1999-02-07
108 Nancy Greenberg 1994-08-17
109 Daniel Faviet 1994-08-16
110 John Chen 1997-09-28
111 Ismael Sciarra 1997-09-30
112 Jose Manuel Urman 1998-03-07
113 Luis Popp 1999-12-07
114 Den Raphaely 1994-12-07
115 Alexander Khoo 1995-05-18
116 Shelli Baida 1997-12-24
117 Sigal Tobias 1997-07-24
118 Guy Himuro 1998-11-15
119 Karen Colmenares 1999-08-10
120 Matthew Weiss 1996-07-18
121 Adam Fripp 1997-04-10
122 Payam Kaufling 1995-05-01
123 Shanta Vollman 1997-10-10
126 Irene Mikkilineni 1998-09-28
145 John Russell 1996-10-01
146 Karen Partners 1997-01-05
176 Jonathon Taylor 1998-03-24
177 Jack Livingston 1998-04-23
178 Kimberely Grant 1999-05-24
179 Charles Johnson 2000-01-04
192 Sarah Bell 1996-02-04
193 Britney Everett 1997-03-03
200 Jennifer Whalen 1987-09-17
201 Michael Hartstein 1996-02-17
202 Pat Fay 1997-08-17
203 Susan Mavris 1994-06-07
204 Hermann Baer 1994-06-07
205 Shelley Higgins 1994-06-07
206 William Gietz 1994-06-07

If we want to query all columns in a table we can use the asterisk operator * instead of the columns names.

[11]:
%sql SELECT * FROM employees
 * sqlite://
Done.
[11]:
employee_id first_name last_name email phone_number hire_date job_id salary manager_id department_id
100 Steven King steven.king@sqltutorial.org 515.123.4567 1987-06-17 4 24000.0 None 9
101 Neena Kochhar neena.kochhar@sqltutorial.org 515.123.4568 1989-09-21 5 17000.0 100 9
102 Lex De Haan lex.de haan@sqltutorial.org 515.123.4569 1993-01-13 5 17000.0 100 9
103 Alexander Hunold alexander.hunold@sqltutorial.org 590.423.4567 1990-01-03 9 9000.0 102 6
104 Bruce Ernst bruce.ernst@sqltutorial.org 590.423.4568 1991-05-21 9 6000.0 103 6
105 David Austin david.austin@sqltutorial.org 590.423.4569 1997-06-25 9 4800.0 103 6
106 Valli Pataballa valli.pataballa@sqltutorial.org 590.423.4560 1998-02-05 9 4800.0 103 6
107 Diana Lorentz diana.lorentz@sqltutorial.org 590.423.5567 1999-02-07 9 4200.0 103 6
108 Nancy Greenberg nancy.greenberg@sqltutorial.org 515.124.4569 1994-08-17 7 12000.0 101 10
109 Daniel Faviet daniel.faviet@sqltutorial.org 515.124.4169 1994-08-16 6 9000.0 108 10
110 John Chen john.chen@sqltutorial.org 515.124.4269 1997-09-28 6 8200.0 108 10
111 Ismael Sciarra ismael.sciarra@sqltutorial.org 515.124.4369 1997-09-30 6 7700.0 108 10
112 Jose Manuel Urman jose manuel.urman@sqltutorial.org 515.124.4469 1998-03-07 6 7800.0 108 10
113 Luis Popp luis.popp@sqltutorial.org 515.124.4567 1999-12-07 6 6900.0 108 10
114 Den Raphaely den.raphaely@sqltutorial.org 515.127.4561 1994-12-07 14 11000.0 100 3
115 Alexander Khoo alexander.khoo@sqltutorial.org 515.127.4562 1995-05-18 13 3100.0 114 3
116 Shelli Baida shelli.baida@sqltutorial.org 515.127.4563 1997-12-24 13 2900.0 114 3
117 Sigal Tobias sigal.tobias@sqltutorial.org 515.127.4564 1997-07-24 13 2800.0 114 3
118 Guy Himuro guy.himuro@sqltutorial.org 515.127.4565 1998-11-15 13 2600.0 114 3
119 Karen Colmenares karen.colmenares@sqltutorial.org 515.127.4566 1999-08-10 13 2500.0 114 3
120 Matthew Weiss matthew.weiss@sqltutorial.org 650.123.1234 1996-07-18 19 8000.0 100 5
121 Adam Fripp adam.fripp@sqltutorial.org 650.123.2234 1997-04-10 19 8200.0 100 5
122 Payam Kaufling payam.kaufling@sqltutorial.org 650.123.3234 1995-05-01 19 7900.0 100 5
123 Shanta Vollman shanta.vollman@sqltutorial.org 650.123.4234 1997-10-10 19 6500.0 100 5
126 Irene Mikkilineni irene.mikkilineni@sqltutorial.org 650.124.1224 1998-09-28 18 2700.0 120 5
145 John Russell john.russell@sqltutorial.org None 1996-10-01 15 14000.0 100 8
146 Karen Partners karen.partners@sqltutorial.org None 1997-01-05 15 13500.0 100 8
176 Jonathon Taylor jonathon.taylor@sqltutorial.org None 1998-03-24 16 8600.0 100 8
177 Jack Livingston jack.livingston@sqltutorial.org None 1998-04-23 16 8400.0 100 8
178 Kimberely Grant kimberely.grant@sqltutorial.org None 1999-05-24 16 7000.0 100 8
179 Charles Johnson charles.johnson@sqltutorial.org None 2000-01-04 16 6200.0 100 8
192 Sarah Bell sarah.bell@sqltutorial.org 650.501.1876 1996-02-04 17 4000.0 123 5
193 Britney Everett britney.everett@sqltutorial.org 650.501.2876 1997-03-03 17 3900.0 123 5
200 Jennifer Whalen jennifer.whalen@sqltutorial.org 515.123.4444 1987-09-17 3 4400.0 101 1
201 Michael Hartstein michael.hartstein@sqltutorial.org 515.123.5555 1996-02-17 10 13000.0 100 2
202 Pat Fay pat.fay@sqltutorial.org 603.123.6666 1997-08-17 11 6000.0 201 2
203 Susan Mavris susan.mavris@sqltutorial.org 515.123.7777 1994-06-07 8 6500.0 101 4
204 Hermann Baer hermann.baer@sqltutorial.org 515.123.8888 1994-06-07 12 10000.0 101 7
205 Shelley Higgins shelley.higgins@sqltutorial.org 515.123.8080 1994-06-07 2 12000.0 101 11
206 William Gietz william.gietz@sqltutorial.org 515.123.8181 1994-06-07 1 8300.0 205 11

List Tables in a Database

We can also use the SELECT statement to display a list of all tables in the current database. Every SQLite database has an sqlite_master table that defines the schema for the database. The following statement uses sqlite_master with the type field set to ‘table’ to return the names of all tables in the current database.

[12]:
%sql SELECT name FROM sqlite_master WHERE type='table'
 * sqlite://
Done.
[12]:
name
cars
writer
regions
sqlite_sequence
countries
locations
departments
jobs
employees
dependents

Here is the entire sqlite_master table for the current database. It stores metadata about the tables and other objects (e.g., indexes and views) in an SQLite database. It contains the columns type, name, tbl_name, rootpage (the root page of the object withing the SQL database file), and sql (the SQL CREATE statement that was used to create the database).

The sqlite_master table is automatically created and maintained by SQLite, and it is typically queried to inspect the schema of a database or retrieve information about its tables and other objects.

[13]:
%sql SELECT * FROM sqlite_master
 * sqlite://
Done.
[13]:
type name tbl_name rootpage sql
table cars cars 2 CREATE TABLE cars(
id INTEGER NOT NULL PRIMARY KEY,
name VARCHAR(50) NOT NULL,
price INTEGER)
table writer writer 3 CREATE TABLE writer(
FirstName VARCHAR(50) NOT NULL,
LastName VARCHAR(50) NOT NULL,
Year INTEGER NOT NULL PRIMARY KEY)
table regions regions 4 CREATE TABLE regions (
region_id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
region_name TEXT NOT NULL)
table sqlite_sequence sqlite_sequence 5 CREATE TABLE sqlite_sequence(name,seq)
table countries countries 6 CREATE TABLE countries (
country_id TEXT NOT NULL,
country_name TEXT NOT NULL,
region_id INTEGER NOT NULL,
PRIMARY KEY (country_id ASC),
FOREIGN KEY (region_id) REFERENCES regions (region_id) ON DELETE CASCADE ON UPDATE CASCADE)
index sqlite_autoindex_countries_1 countries 7 None
table locations locations 8 CREATE TABLE locations (
location_id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
street_address TEXT,
postal_code TEXT,
city text NOT NULL,
state_province TEXT,
country_id INTEGER NOT NULL,
FOREIGN KEY (country_id) REFERENCES countries (country_id) ON DELETE CASCADE ON UPDATE CASCADE)
table departments departments 9 CREATE TABLE departments (
department_id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
department_name TEXT NOT NULL,
location_id INTEGER NOT NULL,
FOREIGN KEY (location_id) REFERENCES locations (location_id) ON DELETE CASCADE ON UPDATE CASCADE)
table jobs jobs 10 CREATE TABLE jobs (
job_id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
job_title TEXT NOT NULL,
min_salary DOUBLE NOT NULL,
max_salary DOUBLE NOT NULL)
table employees employees 11 CREATE TABLE employees (
employee_id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
first_name TEXT,
last_name TEXT NOT NULL,
email TEXT NOT NULL,
phone_number TEXT,
hire_date TEXT NOT NULL,
job_id INTEGER NOT NULL,
salary DOUBLE NOT NULL,
manager_id INTEGER,
department_id INTEGER NOT NULL,
FOREIGN KEY (job_id) REFERENCES jobs (job_id) ON DELETE CASCADE ON UPDATE CASCADE,
FOREIGN KEY (department_id) REFERENCES departments (department_id) ON DELETE CASCADE ON UPDATE CASCADE,
FOREIGN KEY (manager_id) REFERENCES employees (employee_id) ON DELETE CASCADE ON UPDATE CASCADE)
table dependents dependents 12 CREATE TABLE dependents (
dependent_id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
first_name TEXT NOT NULL,
last_name TEXT NOT NULL,
relationship TEXT NOT NULL,
employee_id INTEGER NOT NULL,
FOREIGN KEY (employee_id) REFERENCES employees (employee_id) ON DELETE CASCADE ON UPDATE CASCADE)

For instance, we can use the following code to retrieve the CREATE statement for the table cars.

[14]:
%sql SELECT sql FROM sqlite_master WHERE type='table' AND name='cars';
 * sqlite://
Done.
[14]:
sql
CREATE TABLE cars(
id INTEGER NOT NULL PRIMARY KEY,
name VARCHAR(50) NOT NULL,
price INTEGER)

Perform Simple Calculations in SELECT Statements

We can use standard math operators such as +, *, /, % in SELECT statements to perform simple mathematical calculations. The following expression creates a new column salary * 1.05 from the salary column and adds 5% to the salary of every employee.

[15]:
%%sql
SELECT
    employee_id, first_name, salary, salary*1.05
FROM
    employees;
 * sqlite://
Done.
[15]:
employee_id first_name salary salary*1.05
100 Steven 24000.0 25200.0
101 Neena 17000.0 17850.0
102 Lex 17000.0 17850.0
103 Alexander 9000.0 9450.0
104 Bruce 6000.0 6300.0
105 David 4800.0 5040.0
106 Valli 4800.0 5040.0
107 Diana 4200.0 4410.0
108 Nancy 12000.0 12600.0
109 Daniel 9000.0 9450.0
110 John 8200.0 8610.0
111 Ismael 7700.0 8085.0
112 Jose Manuel 7800.0 8190.0
113 Luis 6900.0 7245.0
114 Den 11000.0 11550.0
115 Alexander 3100.0 3255.0
116 Shelli 2900.0 3045.0
117 Sigal 2800.0 2940.0
118 Guy 2600.0 2730.0
119 Karen 2500.0 2625.0
120 Matthew 8000.0 8400.0
121 Adam 8200.0 8610.0
122 Payam 7900.0 8295.0
123 Shanta 6500.0 6825.0
126 Irene 2700.0 2835.0
145 John 14000.0 14700.0
146 Karen 13500.0 14175.0
176 Jonathon 8600.0 9030.0
177 Jack 8400.0 8820.0
178 Kimberely 7000.0 7350.0
179 Charles 6200.0 6510.0
192 Sarah 4000.0 4200.0
193 Britney 3900.0 4095.0
200 Jennifer 4400.0 4620.0
201 Michael 13000.0 13650.0
202 Pat 6000.0 6300.0
203 Susan 6500.0 6825.0
204 Hermann 10000.0 10500.0
205 Shelley 12000.0 12600.0
206 William 8300.0 8715.0

We can use AS new_salary to assign a different name for the newly created column.

[16]:
%%sql
SELECT
    employee_id, first_name, salary, salary*1.05 AS new_salary
FROM
    employees;
 * sqlite://
Done.
[16]:
employee_id first_name salary new_salary
100 Steven 24000.0 25200.0
101 Neena 17000.0 17850.0
102 Lex 17000.0 17850.0
103 Alexander 9000.0 9450.0
104 Bruce 6000.0 6300.0
105 David 4800.0 5040.0
106 Valli 4800.0 5040.0
107 Diana 4200.0 4410.0
108 Nancy 12000.0 12600.0
109 Daniel 9000.0 9450.0
110 John 8200.0 8610.0
111 Ismael 7700.0 8085.0
112 Jose Manuel 7800.0 8190.0
113 Luis 6900.0 7245.0
114 Den 11000.0 11550.0
115 Alexander 3100.0 3255.0
116 Shelli 2900.0 3045.0
117 Sigal 2800.0 2940.0
118 Guy 2600.0 2730.0
119 Karen 2500.0 2625.0
120 Matthew 8000.0 8400.0
121 Adam 8200.0 8610.0
122 Payam 7900.0 8295.0
123 Shanta 6500.0 6825.0
126 Irene 2700.0 2835.0
145 John 14000.0 14700.0
146 Karen 13500.0 14175.0
176 Jonathon 8600.0 9030.0
177 Jack 8400.0 8820.0
178 Kimberely 7000.0 7350.0
179 Charles 6200.0 6510.0
192 Sarah 4000.0 4200.0
193 Britney 3900.0 4095.0
200 Jennifer 4400.0 4620.0
201 Michael 13000.0 13650.0
202 Pat 6000.0 6300.0
203 Susan 6500.0 6825.0
204 Hermann 10000.0 10500.0
205 Shelley 12000.0 12600.0
206 William 8300.0 8715.0

10.6 Sorting Data with ORDER BY

The clause ORDER BY can be used within a SELECT statement to sort the returned rows.

The general syntax is:

SELECT
    column1, column2, columnN
FROM
    table_name
ORDER BY sort_expression [ASC | DESC];

The sort_expression specifies the sort criteria, whereas ASC or DESC indicates whether to sort the result set into ascending (default) or descending order.

The following example uses the clause ORDER BY to sort employees by first names in alphabetical order.

[17]:
%%sql
SELECT
    employee_id, first_name, last_name, hire_date, salary
FROM
    employees
ORDER BY first_name;
 * sqlite://
Done.
[17]:
employee_id first_name last_name hire_date salary
121 Adam Fripp 1997-04-10 8200.0
103 Alexander Hunold 1990-01-03 9000.0
115 Alexander Khoo 1995-05-18 3100.0
193 Britney Everett 1997-03-03 3900.0
104 Bruce Ernst 1991-05-21 6000.0
179 Charles Johnson 2000-01-04 6200.0
109 Daniel Faviet 1994-08-16 9000.0
105 David Austin 1997-06-25 4800.0
114 Den Raphaely 1994-12-07 11000.0
107 Diana Lorentz 1999-02-07 4200.0
118 Guy Himuro 1998-11-15 2600.0
204 Hermann Baer 1994-06-07 10000.0
126 Irene Mikkilineni 1998-09-28 2700.0
111 Ismael Sciarra 1997-09-30 7700.0
177 Jack Livingston 1998-04-23 8400.0
200 Jennifer Whalen 1987-09-17 4400.0
110 John Chen 1997-09-28 8200.0
145 John Russell 1996-10-01 14000.0
176 Jonathon Taylor 1998-03-24 8600.0
112 Jose Manuel Urman 1998-03-07 7800.0
119 Karen Colmenares 1999-08-10 2500.0
146 Karen Partners 1997-01-05 13500.0
178 Kimberely Grant 1999-05-24 7000.0
102 Lex De Haan 1993-01-13 17000.0
113 Luis Popp 1999-12-07 6900.0
120 Matthew Weiss 1996-07-18 8000.0
201 Michael Hartstein 1996-02-17 13000.0
108 Nancy Greenberg 1994-08-17 12000.0
101 Neena Kochhar 1989-09-21 17000.0
202 Pat Fay 1997-08-17 6000.0
122 Payam Kaufling 1995-05-01 7900.0
192 Sarah Bell 1996-02-04 4000.0
123 Shanta Vollman 1997-10-10 6500.0
205 Shelley Higgins 1994-06-07 12000.0
116 Shelli Baida 1997-12-24 2900.0
117 Sigal Tobias 1997-07-24 2800.0
100 Steven King 1987-06-17 24000.0
203 Susan Mavris 1994-06-07 6500.0
106 Valli Pataballa 1998-02-05 4800.0
206 William Gietz 1994-06-07 8300.0

The ORDER BY clause also allows using multiple expressions for sorting, separated by commas. In the following example ORDER BY is used to sort the employees by their first name in ascending order, and the employees who have the same first name are further sorted by their last name in descending order. E.g., check the sorting for the two employees with the name Alexander.

[18]:
%%sql
SELECT
    employee_id, first_name, last_name, hire_date, salary
FROM
    employees
ORDER BY first_name, last_name DESC;
 * sqlite://
Done.
[18]:
employee_id first_name last_name hire_date salary
121 Adam Fripp 1997-04-10 8200.0
115 Alexander Khoo 1995-05-18 3100.0
103 Alexander Hunold 1990-01-03 9000.0
193 Britney Everett 1997-03-03 3900.0
104 Bruce Ernst 1991-05-21 6000.0
179 Charles Johnson 2000-01-04 6200.0
109 Daniel Faviet 1994-08-16 9000.0
105 David Austin 1997-06-25 4800.0
114 Den Raphaely 1994-12-07 11000.0
107 Diana Lorentz 1999-02-07 4200.0
118 Guy Himuro 1998-11-15 2600.0
204 Hermann Baer 1994-06-07 10000.0
126 Irene Mikkilineni 1998-09-28 2700.0
111 Ismael Sciarra 1997-09-30 7700.0
177 Jack Livingston 1998-04-23 8400.0
200 Jennifer Whalen 1987-09-17 4400.0
145 John Russell 1996-10-01 14000.0
110 John Chen 1997-09-28 8200.0
176 Jonathon Taylor 1998-03-24 8600.0
112 Jose Manuel Urman 1998-03-07 7800.0
146 Karen Partners 1997-01-05 13500.0
119 Karen Colmenares 1999-08-10 2500.0
178 Kimberely Grant 1999-05-24 7000.0
102 Lex De Haan 1993-01-13 17000.0
113 Luis Popp 1999-12-07 6900.0
120 Matthew Weiss 1996-07-18 8000.0
201 Michael Hartstein 1996-02-17 13000.0
108 Nancy Greenberg 1994-08-17 12000.0
101 Neena Kochhar 1989-09-21 17000.0
202 Pat Fay 1997-08-17 6000.0
122 Payam Kaufling 1995-05-01 7900.0
192 Sarah Bell 1996-02-04 4000.0
123 Shanta Vollman 1997-10-10 6500.0
205 Shelley Higgins 1994-06-07 12000.0
116 Shelli Baida 1997-12-24 2900.0
117 Sigal Tobias 1997-07-24 2800.0
100 Steven King 1987-06-17 24000.0
203 Susan Mavris 1994-06-07 6500.0
106 Valli Pataballa 1998-02-05 4800.0
206 William Gietz 1994-06-07 8300.0

Similarly, we can use ORDER BY to sort columns with numerical data, or to sort by date as in the following cell.

[19]:
%%sql
SELECT
    employee_id, first_name, last_name, hire_date, salary
FROM
    employees
ORDER BY hire_date;
 * sqlite://
Done.
[19]:
employee_id first_name last_name hire_date salary
100 Steven King 1987-06-17 24000.0
200 Jennifer Whalen 1987-09-17 4400.0
101 Neena Kochhar 1989-09-21 17000.0
103 Alexander Hunold 1990-01-03 9000.0
104 Bruce Ernst 1991-05-21 6000.0
102 Lex De Haan 1993-01-13 17000.0
203 Susan Mavris 1994-06-07 6500.0
204 Hermann Baer 1994-06-07 10000.0
205 Shelley Higgins 1994-06-07 12000.0
206 William Gietz 1994-06-07 8300.0
109 Daniel Faviet 1994-08-16 9000.0
108 Nancy Greenberg 1994-08-17 12000.0
114 Den Raphaely 1994-12-07 11000.0
122 Payam Kaufling 1995-05-01 7900.0
115 Alexander Khoo 1995-05-18 3100.0
192 Sarah Bell 1996-02-04 4000.0
201 Michael Hartstein 1996-02-17 13000.0
120 Matthew Weiss 1996-07-18 8000.0
145 John Russell 1996-10-01 14000.0
146 Karen Partners 1997-01-05 13500.0
193 Britney Everett 1997-03-03 3900.0
121 Adam Fripp 1997-04-10 8200.0
105 David Austin 1997-06-25 4800.0
117 Sigal Tobias 1997-07-24 2800.0
202 Pat Fay 1997-08-17 6000.0
110 John Chen 1997-09-28 8200.0
111 Ismael Sciarra 1997-09-30 7700.0
123 Shanta Vollman 1997-10-10 6500.0
116 Shelli Baida 1997-12-24 2900.0
106 Valli Pataballa 1998-02-05 4800.0
112 Jose Manuel Urman 1998-03-07 7800.0
176 Jonathon Taylor 1998-03-24 8600.0
177 Jack Livingston 1998-04-23 8400.0
126 Irene Mikkilineni 1998-09-28 2700.0
118 Guy Himuro 1998-11-15 2600.0
107 Diana Lorentz 1999-02-07 4200.0
178 Kimberely Grant 1999-05-24 7000.0
119 Karen Colmenares 1999-08-10 2500.0
113 Luis Popp 1999-12-07 6900.0
179 Charles Johnson 2000-01-04 6200.0

10.7 Filtering Data

LIMIT

LIMIT is used to constrain the number of rows returned by a query, similar to the functions head() and tail() in Pandas.

[20]:
%%sql
SELECT
    employee_id, first_name, last_name, hire_date, salary
FROM
    employees
ORDER BY first_name
LIMIT 5;
 * sqlite://
Done.
[20]:
employee_id first_name last_name hire_date salary
121 Adam Fripp 1997-04-10 8200.0
103 Alexander Hunold 1990-01-03 9000.0
115 Alexander Khoo 1995-05-18 3100.0
193 Britney Everett 1997-03-03 3900.0
104 Bruce Ernst 1991-05-21 6000.0

It is also possible to include an OFFSET clause, which will skip rows before retrieving the data. E.g., in the next cell, the first 3 rows are skipped, and rows 4-8 are returned.

[21]:
%%sql
SELECT
    employee_id, first_name, last_name, hire_date, salary
FROM
    employees
ORDER BY first_name
LIMIT 5
OFFSET 3;
 * sqlite://
Done.
[21]:
employee_id first_name last_name hire_date salary
193 Britney Everett 1997-03-03 3900.0
104 Bruce Ernst 1991-05-21 6000.0
179 Charles Johnson 2000-01-04 6200.0
109 Daniel Faviet 1994-08-16 9000.0
105 David Austin 1997-06-25 4800.0

DISTINCT

The DISTINCT clause allows to remove duplicate rows from the result set.

E.g., the next cell shows the first 15 rows of the salary columns, where some rows have the same value. In the cell afterward, DISTINCT is used to remove the rows with the same values for salary.

[22]:
%%sql
SELECT
    salary
FROM
    employees
ORDER BY salary DESC
LIMIT 15;
 * sqlite://
Done.
[22]:
salary
24000.0
17000.0
17000.0
14000.0
13500.0
13000.0
12000.0
12000.0
11000.0
10000.0
9000.0
9000.0
8600.0
8400.0
8300.0
[23]:
%%sql
SELECT DISTINCT
    salary
FROM
    employees
ORDER BY salary DESC
LIMIT 15;
 * sqlite://
Done.
[23]:
salary
24000.0
17000.0
14000.0
13500.0
13000.0
12000.0
11000.0
10000.0
9000.0
8600.0
8400.0
8300.0
8200.0
8000.0
7900.0

WHERE

The WHERE clause filters data based on specified conditions. For instance, return only the employees that have a salary greater than a certain value.

[24]:
%%sql
SELECT DISTINCT
    employee_id, first_name, last_name,salary
FROM
    employees
WHERE salary > 9000
ORDER BY salary DESC;
 * sqlite://
Done.
[24]:
employee_id first_name last_name salary
100 Steven King 24000.0
101 Neena Kochhar 17000.0
102 Lex De Haan 17000.0
145 John Russell 14000.0
146 Karen Partners 13500.0
201 Michael Hartstein 13000.0
108 Nancy Greenberg 12000.0
205 Shelley Higgins 12000.0
114 Den Raphaely 11000.0
204 Hermann Baer 10000.0

Or, return the employees who work in the department 5.

[25]:
%%sql
SELECT DISTINCT
    employee_id, first_name, last_name, salary, department_id
FROM
    employees
WHERE department_id = 5
ORDER BY first_name;
 * sqlite://
Done.
[25]:
employee_id first_name last_name salary department_id
121 Adam Fripp 8200.0 5
193 Britney Everett 3900.0 5
126 Irene Mikkilineni 2700.0 5
120 Matthew Weiss 8000.0 5
122 Payam Kaufling 7900.0 5
192 Sarah Bell 4000.0 5
123 Shanta Vollman 6500.0 5

Comparison Operators

To specify a condition, we can use the standard comparison operators, such as >, <, >=, <=, =, and note that <> can be used for ‘not equal to’.

[26]:
%%sql
SELECT DISTINCT
    employee_id, first_name, last_name, salary, department_id
FROM
    employees
WHERE department_id <> 5
ORDER BY first_name;
 * sqlite://
Done.
[26]:
employee_id first_name last_name salary department_id
103 Alexander Hunold 9000.0 6
115 Alexander Khoo 3100.0 3
104 Bruce Ernst 6000.0 6
179 Charles Johnson 6200.0 8
109 Daniel Faviet 9000.0 10
105 David Austin 4800.0 6
114 Den Raphaely 11000.0 3
107 Diana Lorentz 4200.0 6
118 Guy Himuro 2600.0 3
204 Hermann Baer 10000.0 7
111 Ismael Sciarra 7700.0 10
177 Jack Livingston 8400.0 8
200 Jennifer Whalen 4400.0 1
110 John Chen 8200.0 10
145 John Russell 14000.0 8
176 Jonathon Taylor 8600.0 8
112 Jose Manuel Urman 7800.0 10
119 Karen Colmenares 2500.0 3
146 Karen Partners 13500.0 8
178 Kimberely Grant 7000.0 8
102 Lex De Haan 17000.0 9
113 Luis Popp 6900.0 10
201 Michael Hartstein 13000.0 2
108 Nancy Greenberg 12000.0 10
101 Neena Kochhar 17000.0 9
202 Pat Fay 6000.0 2
205 Shelley Higgins 12000.0 11
116 Shelli Baida 2900.0 3
117 Sigal Tobias 2800.0 3
100 Steven King 24000.0 9
203 Susan Mavris 6500.0 4
106 Valli Pataballa 4800.0 6
206 William Gietz 8300.0 11

Logical Operators

We can also use logical operators to combine multiple conditions in the WHERE clause of an SQL statement. The following table contains the SQL logical operators.

81db452083604498b54993b741d71002 Figure source: Reference [1].

[27]:
%%sql
SELECT
    first_name, last_name, salary
FROM
    employees
WHERE salary > 5000 AND salary < 7000
ORDER BY salary;
 * sqlite://
Done.
[27]:
first_name last_name salary
Bruce Ernst 6000.0
Pat Fay 6000.0
Charles Johnson 6200.0
Shanta Vollman 6500.0
Susan Mavris 6500.0
Luis Popp 6900.0
[28]:
%%sql
SELECT
    first_name, last_name, salary
FROM
    employees
WHERE salary BETWEEN 5000 AND 7000
ORDER BY salary;
 * sqlite://
Done.
[28]:
first_name last_name salary
Bruce Ernst 6000.0
Pat Fay 6000.0
Charles Johnson 6200.0
Shanta Vollman 6500.0
Susan Mavris 6500.0
Luis Popp 6900.0
Kimberely Grant 7000.0
[29]:
%%sql
SELECT
    first_name, last_name, salary
FROM
    employees
WHERE salary = 6000 OR salary = 7000
ORDER BY salary;
 * sqlite://
Done.
[29]:
first_name last_name salary
Bruce Ernst 6000.0
Pat Fay 6000.0
Kimberely Grant 7000.0

Select employees with first names that start with jo.

[30]:
%%sql
SELECT
    employee_id, first_name, last_name
FROM
    employees
WHERE first_name LIKE 'jo%'
ORDER BY first_name;
 * sqlite://
Done.
[30]:
employee_id first_name last_name
110 John Chen
145 John Russell
176 Jonathon Taylor
112 Jose Manuel Urman

Select employees with first names whose second character is h.

[31]:
%%sql
SELECT
    employee_id, first_name, last_name
FROM
    employees
WHERE
    first_name LIKE '_h%'
ORDER BY first_name;
 * sqlite://
Done.
[31]:
employee_id first_name last_name
179 Charles Johnson
123 Shanta Vollman
205 Shelley Higgins
116 Shelli Baida
[32]:
%%sql
SELECT
    first_name, last_name, department_id
FROM
    employees
WHERE department_id IN (8, 9)
ORDER BY department_id;
 * sqlite://
Done.
[32]:
first_name last_name department_id
John Russell 8
Karen Partners 8
Jonathon Taylor 8
Jack Livingston 8
Kimberely Grant 8
Charles Johnson 8
Steven King 9
Neena Kochhar 9
Lex De Haan 9

IS NULL Operator

To determine whether a row or column has missing or non-defined values, we can use the IS NULL operator.

For instance, to find all employees who do not have phone numbers, we use the IS NUL operator as follows.

[33]:
%%sql
SELECT
    first_name, last_name, phone_number
FROM
    employees
WHERE phone_number IS NULL
ORDER BY first_name , last_name;
 * sqlite://
Done.
[33]:
first_name last_name phone_number
Charles Johnson None
Jack Livingston None
John Russell None
Jonathon Taylor None
Karen Partners None
Kimberely Grant None

10.8 Conditional Expressions

The CASE expression is used to add if-then-else logic to SQL statements, which allows to evaluate a list of conditions and returns one of the possible results.

In the next cell, the CASE expression returns the results Low, Average, or High based on the conditions regarding the salary. The results are collected in the evaluation column.

[34]:
%%sql
SELECT
    first_name, last_name,
CASE WHEN salary < 3000 THEN 'Low'
     WHEN salary >= 3000 AND salary <= 5000 THEN 'Average'
     WHEN salary > 5000 THEN 'High'
END evaluation
FROM
    employees
LIMIT 15;
 * sqlite://
Done.
[34]:
first_name last_name evaluation
Steven King High
Neena Kochhar High
Lex De Haan High
Alexander Hunold High
Bruce Ernst High
David Austin Average
Valli Pataballa Average
Diana Lorentz Average
Nancy Greenberg High
Daniel Faviet High
John Chen High
Ismael Sciarra High
Jose Manuel Urman High
Luis Popp High
Den Raphaely High

10.9 Joining Multiple Tables

SQL provides several ways to join tables, such as INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL JOIN, CROSS JOIN, and others.

INNER JOIN

INNER JOIN combines rows from two or more tables based on a related column between them. It retrieves only the rows that have matching values in both tables.

The syntax is:

SELECT
    column1, column2, ...
FROM
    table1
INNER JOIN table2 ON table1.columnX = table2.columnX;

It specifies to join table1 and table2 by matching values in columnX from table1 and columnX from table2, adn returns column1, column2, ... from the joined tables.

Let’s show how we can use INNER JOIN, where for instance, we want to retrieve the list of employees who work in departments 1, 2, and 3, and we would like to list the names of the departments.

To do that, first we can notice that both the employees table and the departments table have a column department_id. Therefore, we can use the department_id column in the employees table as the foreign key column to link the employees to the departments table.

Let’s first display the employees who work in departments 1, 2, and 3.

[35]:
%%sql
SELECT
    first_name, last_name, department_id
FROM
    employees
WHERE department_id IN (1, 2, 3)
ORDER BY department_id;
 * sqlite://
Done.
[35]:
first_name last_name department_id
Jennifer Whalen 1
Michael Hartstein 2
Pat Fay 2
Den Raphaely 3
Alexander Khoo 3
Shelli Baida 3
Sigal Tobias 3
Guy Himuro 3
Karen Colmenares 3

We can also find the names of the departments that have a department_id of 1, 2, and 3.

[36]:
%%sql
SELECT
    department_id, department_name
FROM
    departments
WHERE department_id IN (1, 2, 3);
 * sqlite://
Done.
[36]:
department_id department_name
1 Administration
2 Marketing
3 Purchasing

Next, we join the employees and departments tables, and use INNER JOIN to match the department_id column in the two tables. For each row, if the condition departments.department_id = employees.department_id is satisfied, the combined row will include data from rows in both employees and departments tables in the result set.

[37]:
%%sql
SELECT
    first_name, last_name, employees.department_id, departments.department_id, department_name
FROM
    employees
INNER JOIN departments ON departments.department_id = employees.department_id
WHERE employees.department_id IN (1, 2, 3);
 * sqlite://
Done.
[37]:
first_name last_name department_id department_id_1 department_name
Jennifer Whalen 1 1 Administration
Michael Hartstein 2 2 Marketing
Pat Fay 2 2 Marketing
Den Raphaely 3 3 Purchasing
Alexander Khoo 3 3 Purchasing
Shelli Baida 3 3 Purchasing
Sigal Tobias 3 3 Purchasing
Guy Himuro 3 3 Purchasing
Karen Colmenares 3 3 Purchasing

Let’s look at one more example, where we used INNER JOIN to join three tables. Specifically, in the next cell we use one more INNER JOIN clause to join the tables employees and jobs using the jobs_id column, in order to retrieve the information about the job_title column from the jobs table.

[38]:
%%sql
SELECT
    first_name, last_name, job_title, department_name
FROM
    employees
INNER JOIN departments ON departments.department_id = employees.department_id
INNER JOIN jobs ON jobs.job_id = employees.job_id
WHERE employees.department_id IN (1, 2, 3);
 * sqlite://
Done.
[38]:
first_name last_name job_title department_name
Den Raphaely Purchasing Manager Purchasing
Alexander Khoo Purchasing Clerk Purchasing
Shelli Baida Purchasing Clerk Purchasing
Sigal Tobias Purchasing Clerk Purchasing
Guy Himuro Purchasing Clerk Purchasing
Karen Colmenares Purchasing Clerk Purchasing
Jennifer Whalen Administration Assistant Administration
Michael Hartstein Marketing Manager Marketing
Pat Fay Marketing Representative Marketing

LEFT JOIN

LEFT JOIN (also referred to as LEFT OUTER JOIN) is similar to INNER JOIN, only it retrieves all rows from the left table in the JOIN clause and the matched rows from the right table.

The “left table” is the table that appears on the left side of the JOIN clause. It is considered the primary table, and its rows are the ones that we want to retrieve or work with primarily.

The “right table” is the table that appears on the right side of the JOIN clause. It is the secondary table, and we are typically interested in its data to complement or match with the data from the left table.

For instance, the countries table is shown below, with country_id column being the primary key. The following query returns columns for the countries US, UK, and China.

[39]:
%%sql
SELECT
    country_id,
    country_name
FROM
    countries
WHERE country_id IN ('US', 'UK', 'CN');
 * sqlite://
Done.
[39]:
country_id country_name
CN China
UK United Kingdom
US United States of America

Similarly, the locations table shown below has a country_id column as the foreign key. The query returns the locations located in the US, UK, and China.

[40]:
%%sql
SELECT
    country_id,
    street_address,
    city
FROM
    locations
WHERE country_id IN ('US', 'UK', 'CN');
 * sqlite://
Done.
[40]:
country_id street_address city
US 2014 Jabberwocky Rd Southlake
US 2011 Interiors Blvd South San Francisco
US 2004 Charade Rd Seattle
UK 8204 Arthur St London
UK Magdalen Centre, The Oxford Science Park Oxford

In the next cell, because we use the LEFT JOIN clause, all rows that satisfy the condition in the WHERE clause of the countries table are included in the result set.

For each row in the countries table, the LEFT JOIN clause finds the matching rows in the locations table. If at least one matching row is found, the database engine combines the data from columns of the matching rows in both tables.

In this case, there is no matching row found for the country with the country_id of CN, and therefore, the row in the countries table is included in the result set and the row in the locations table is filled with None values.

[41]:
%%sql
SELECT
    countries.country_name,
    countries.country_id,
    locations.country_id,
    locations.street_address,
    locations.city
FROM
    countries
LEFT JOIN locations ON locations.country_id = countries.country_id
WHERE countries.country_id IN ('US', 'UK', 'CN')
 * sqlite://
Done.
[41]:
country_name country_id country_id_1 street_address city
China CN None None None
United Kingdom UK UK 8204 Arthur St London
United Kingdom UK UK Magdalen Centre, The Oxford Science Park Oxford
United States of America US US 2014 Jabberwocky Rd Southlake
United States of America US US 2011 Interiors Blvd South San Francisco
United States of America US US 2004 Charade Rd Seattle

LEFT JOIN is useful when we want to retrieve data from a primary table and include related information from a secondary table, but we don’t want to exclude records from the primary table if there are no matching rows in the secondary table.

And, because non-matching rows in the locations table are filled with the None values, we can apply the LEFT JOIN clause to find mismatched rows between the tables.

For example, to find the country that does not have any locations in the locations table, we use the following query.

[42]:
%%sql
SELECT
    country_name
FROM
    countries
LEFT JOIN locations ON locations.country_id = countries.country_id
WHERE locations.location_id IS NULL
ORDER BY country_name;
 * sqlite://
Done.
[42]:
country_name
Argentina
Australia
Belgium
Brazil
China
Denmark
Egypt
France
HongKong
India
Israel
Italy
Japan
Kuwait
Mexico
Netherlands
Nigeria
Singapore
Switzerland
Zambia
Zimbabwe

Compare the previous cell to the next cell.

[43]:
%%sql
SELECT
    countries.country_name,
    countries.country_id,
    locations.country_id,
    locations.street_address,
    locations.city
FROM
    countries
LEFT JOIN locations ON locations.country_id = countries.country_id
 * sqlite://
Done.
[43]:
country_name country_id country_id_1 street_address city
Argentina AR None None None
Australia AU None None None
Belgium BE None None None
Brazil BR None None None
Canada CA CA 147 Spadina Ave Toronto
Switzerland CH None None None
China CN None None None
Germany DE DE Schwanthalerstr. 7031 Munich
Denmark DK None None None
Egypt EG None None None
France FR None None None
HongKong HK None None None
Israel IL None None None
India IN None None None
Italy IT None None None
Japan JP None None None
Kuwait KW None None None
Mexico MX None None None
Nigeria NG None None None
Netherlands NL None None None
Singapore SG None None None
United Kingdom UK UK 8204 Arthur St London
United Kingdom UK UK Magdalen Centre, The Oxford Science Park Oxford
United States of America US US 2004 Charade Rd Seattle
United States of America US US 2011 Interiors Blvd South San Francisco
United States of America US US 2014 Jabberwocky Rd Southlake
Zambia ZM None None None
Zimbabwe ZW None None None

RIGHT JOIN

By analogy, RIGHT JOIN (or RIGHT OUTER JOIN) returns all rows from the right table and the matched rows from the left table. If there are no matches in the left table, the result will contain None values for those columns from the left table.

[44]:
%%sql
SELECT
    countries.country_name,
    countries.country_id,
    locations.country_id,
    locations.street_address,
    locations.city
FROM
    countries
RIGHT JOIN locations ON locations.country_id = countries.country_id
WHERE countries.country_id IN ('US', 'UK', 'CN', 'CA', 'DE', 'IT')
 * sqlite://
Done.
[44]:
country_name country_id country_id_1 street_address city
Canada CA CA 147 Spadina Ave Toronto
Germany DE DE Schwanthalerstr. 7031 Munich
United Kingdom UK UK 8204 Arthur St London
United Kingdom UK UK Magdalen Centre, The Oxford Science Park Oxford
United States of America US US 2014 Jabberwocky Rd Southlake
United States of America US US 2011 Interiors Blvd South San Francisco
United States of America US US 2004 Charade Rd Seattle

FULL JOIN (also known as a FULL OUTER JOIN) combines the result sets of both a LEFT JOIN and a RIGHT JOIN, it returns all rows from both tables and includes rows from the left table that have no match in the right table (with None values for right table columns) and rows from the right table that have no match in the left table (with None values for left table columns).

CROSS JOIN (also known as a CARTESIAN JOIN), combines all rows from the first table with all rows from the second table, and generates all possible combinations of data.

10.10 Return Data Statistics

Aggregate functions in SQL take a list of values and return a single value, such as the average AVG(), MIN(), MAX(), SUM(), or COUNT().

[45]:
%%sql
SELECT
    AVG(salary)
FROM
    employees;
 * sqlite://
Done.
[45]:
AVG(salary)
8060.0
[46]:
%%sql
SELECT
    MAX(salary)
FROM
    employees;
 * sqlite://
Done.
[46]:
MAX(salary)
24000.0
[47]:
%%sql
SELECT
    SUM(salary)
FROM
    employees
WHERE department_id = 5;
 * sqlite://
Done.
[47]:
SUM(salary)
41200.0

COUNT returns the number of rows in a table. It can be used by providing the name of a column, or it can also be used with an asterisk * as in the following cell.

[48]:
%%sql
SELECT
    COUNT(employee_id)
FROM
    employees;
 * sqlite://
Done.
[48]:
COUNT(employee_id)
40
[49]:
%%sql
SELECT
    COUNT(*)
FROM
    employees;
 * sqlite://
Done.
[49]:
COUNT(*)
40

10.11 Grouping Data

GROUP BY allows to group rows based on values from more than one column. It is typically combined with aggregate functions, like COUNT, SUM, AVG, MIN, and MAX.

For instance, let’s assume that we would like to retrieve information about the average salary in each department. In the next cell we will first display the salaries of all employees per department, and in the following cell we can see how GROUP BY is used to calculate the average salary for each department.

[50]:
%%sql
SELECT
    department_id, salary
FROM
    employees
 * sqlite://
Done.
[50]:
department_id salary
9 24000.0
9 17000.0
9 17000.0
6 9000.0
6 6000.0
6 4800.0
6 4800.0
6 4200.0
10 12000.0
10 9000.0
10 8200.0
10 7700.0
10 7800.0
10 6900.0
3 11000.0
3 3100.0
3 2900.0
3 2800.0
3 2600.0
3 2500.0
5 8000.0
5 8200.0
5 7900.0
5 6500.0
5 2700.0
8 14000.0
8 13500.0
8 8600.0
8 8400.0
8 7000.0
8 6200.0
5 4000.0
5 3900.0
1 4400.0
2 13000.0
2 6000.0
4 6500.0
7 10000.0
11 12000.0
11 8300.0
[51]:
%%sql
SELECT
    department_id, AVG(salary)
FROM
    employees
GROUP BY department_id;
 * sqlite://
Done.
[51]:
department_id AVG(salary)
1 4400.0
2 9500.0
3 4150.0
4 6500.0
5 5885.714285714285
6 5760.0
7 10000.0
8 9616.666666666666
9 19333.333333333332
10 8600.0
11 10150.0

Similarly, the following cell displays the minimum, maximum, and average salary for each department. And, instead of using the department_id column, it will be more convenient to display the department names. Since the employees table does not have a column with the department names, we will use INNER JOIN to retrieve the department_name column from the departments table.

[52]:
%%sql
SELECT
    department_name, MIN(salary) AS min_salary, MAX(salary) AS max_salary,
    ROUND(AVG(salary)) AS average_salary
FROM
    employees
INNER JOIN departments ON departments.department_id = employees.department_id
GROUP BY department_name;
 * sqlite://
Done.
[52]:
department_name min_salary max_salary average_salary
Accounting 8300.0 12000.0 10150.0
Administration 4400.0 4400.0 4400.0
Executive 17000.0 24000.0 19333.0
Finance 6900.0 12000.0 8600.0
Human Resources 6500.0 6500.0 6500.0
IT 4200.0 9000.0 5760.0
Marketing 6000.0 13000.0 9500.0
Public Relations 10000.0 10000.0 10000.0
Purchasing 2500.0 11000.0 4150.0
Sales 6200.0 14000.0 9617.0
Shipping 2700.0 8200.0 5886.0

10.12 Modifying Data

INSERT

INSERT is used to insert one or more rows into a table, and we already used it in subsection 10.3 when we created new tables. The general syntax is:

INSERT INTO table (column1, column2,...)
VALUES (value1, value2, ...);

It is important to ensure that the number of values matches the number of columns, and that the value type corresponds to the data type for that column.

It is also possible to insert value without specifying the columns, as in:

INSERT INTO table
VALUES (value1, value2, ...);

For instance, to insert a new row into the dependents table, we can use the following code. The new row will be added to the bottom of the table dependents.

[53]:
%%sql
INSERT INTO
    dependents (first_name, last_name, relationship, employee_id)
VALUES ('Dustin', 'Johnson', 'Child', 178);
 * sqlite://
1 rows affected.
[53]:
[]
[54]:
# verify the changes
%sql SELECT * from dependents
 * sqlite://
Done.
[54]:
dependent_id first_name last_name relationship employee_id
1 Penelope Gietz Child 206
2 Nick Higgins Child 205
3 Ed Whalen Child 200
4 Jennifer King Child 100
5 Johnny Kochhar Child 101
6 Bette De Haan Child 102
7 Grace Faviet Child 109
8 Matthew Chen Child 110
9 Joe Sciarra Child 111
10 Christian Urman Child 112
11 Zero Popp Child 113
12 Karl Greenberg Child 108
13 Uma Mavris Child 203
14 Vivien Hunold Child 103
15 Cuba Ernst Child 104
16 Fred Austin Child 105
17 Helen Pataballa Child 106
18 Dan Lorentz Child 107
19 Bob Hartstein Child 201
20 Lucille Fay Child 202
21 Kirsten Baer Child 204
22 Elvis Khoo Child 115
23 Sandra Baida Child 116
24 Cameron Tobias Child 117
25 Kevin Himuro Child 118
26 Rip Colmenares Child 119
27 Julia Raphaely Child 114
28 Woody Russell Child 145
29 Alec Partners Child 146
30 Sandra Taylor Child 176
31 Dustin Johnson Child 178

UPDATE

UPDATE is used to update existing data in a table.

For instance, to update the last name of the employee with id=192, we can write:

[55]:
%%sql
UPDATE employees
SET last_name = 'Lopez'
WHERE employee_id = 192;
 * sqlite://
1 rows affected.
[55]:
[]

And we can verify the changes.

[56]:
%%sql
SELECT
    employee_id, first_name, last_name
FROM
    employees
WHERE employee_id = 192;
 * sqlite://
Done.
[56]:
employee_id first_name last_name
192 Sarah Lopez

DELETE

DELETE is used to delete one or more rows from a table permanently.

[57]:
%%sql
DELETE FROM dependents
WHERE dependent_id = 16;
 * sqlite://
1 rows affected.
[57]:
[]
[58]:
# verify the changes

%sql SELECT * FROM dependents
 * sqlite://
Done.
[58]:
dependent_id first_name last_name relationship employee_id
1 Penelope Gietz Child 206
2 Nick Higgins Child 205
3 Ed Whalen Child 200
4 Jennifer King Child 100
5 Johnny Kochhar Child 101
6 Bette De Haan Child 102
7 Grace Faviet Child 109
8 Matthew Chen Child 110
9 Joe Sciarra Child 111
10 Christian Urman Child 112
11 Zero Popp Child 113
12 Karl Greenberg Child 108
13 Uma Mavris Child 203
14 Vivien Hunold Child 103
15 Cuba Ernst Child 104
17 Helen Pataballa Child 106
18 Dan Lorentz Child 107
19 Bob Hartstein Child 201
20 Lucille Fay Child 202
21 Kirsten Baer Child 204
22 Elvis Khoo Child 115
23 Sandra Baida Child 116
24 Cameron Tobias Child 117
25 Kevin Himuro Child 118
26 Rip Colmenares Child 119
27 Julia Raphaely Child 114
28 Woody Russell Child 145
29 Alec Partners Child 146
30 Sandra Taylor Child 176
31 Dustin Johnson Child 178

10.13 Working with Tables

In subsection 10.3 we learned how to use the CREATE TABLE statement. Several other related SQL statements for working with tables are described next.

ALTER TABLE

ALTER TABLE allows to add new columns in an existing table using ADD, remove columns in a table with DROP, rename columns with RENAME, or modify attributes of a column such as primary key, default value, etc. with ADD CONSTRAINT or DROP CONSTRAINT and related commands.

Add a Column

[59]:
%%sql
ALTER TABLE cars
ADD mileage INT;
 * sqlite://
Done.
[59]:
[]
[60]:
%sql SELECT * FROM cars
 * sqlite://
Done.
[60]:
id name price mileage
1 Audi 52642 None
2 Mercedes 57127 None
3 Skoda 9000 None
4 Volvo 29000 None
5 Bentley 350000 None
6 Citroen 21000 None
7 Hummer 41400 None
8 Volkswagen 21600 None

Remove a Column

[61]:
%%sql
ALTER TABLE cars
DROP price;
 * sqlite://
Done.
[61]:
[]
[62]:
# verfiy the change
%sql SELECT * FROM cars
 * sqlite://
Done.
[62]:
id name mileage
1 Audi None
2 Mercedes None
3 Skoda None
4 Volvo None
5 Bentley None
6 Citroen None
7 Hummer None
8 Volkswagen None

Rename a Column

[63]:
%%sql
ALTER TABLE cars
RENAME mileage TO miles;
 * sqlite://
Done.
[63]:
[]
[64]:
# verfiy the change
%sql SELECT * FROM cars
 * sqlite://
Done.
[64]:
id name miles
1 Audi None
2 Mercedes None
3 Skoda None
4 Volvo None
5 Bentley None
6 Citroen None
7 Hummer None
8 Volkswagen None

Rename a Table

Similarly, ALTER TABLE can be used to rename a table, as in the following example.

[65]:
%%sql
ALTER TABLE writer
RENAME TO authors;
 * sqlite://
Done.
[65]:
[]
[66]:
# verfiy the change
%sql SELECT name FROM sqlite_master WHERE type='table'
 * sqlite://
Done.
[66]:
name
cars
authors
regions
sqlite_sequence
countries
locations
departments
jobs
employees
dependents

DROP TABLE

DROP TABLE removes a table from a database.

[67]:
%sql DROP TABLE cars;
 * sqlite://
Done.
[67]:
[]
[68]:
# verfiy the change
%sql SELECT name FROM sqlite_master WHERE type='table'
 * sqlite://
Done.
[68]:
name
authors
regions
sqlite_sequence
countries
locations
departments
jobs
employees
dependents

10.14 Constraints

Constraints enforce restrictions on the data that a column can contain or impose other related restrictions. So far, we described some of the constraints used in SQL databases, such as:

  • PRIMARY KEY, defines a primary key for a table.

  • NOT NULL, ensures that values are inserted in a column.

  • UNIQUE, ensures that each row in a column has unique values.

FOREIGN KEY

Beside the primary key, some tables also define a foreign key. This is a column or a group of columns that enforces a link between the data in two tables. In a foreign key reference, the primary key column of the first table is referenced by the column of the second table, and the column of the second table becomes the foreign key.

For instance, when we created the countries table (see the inserted code below), we specified that it has country_id column as a primary key, and region_id column as a foreign key. Therefore, the region_id column in countries will point to the region_id column in the regions table. This is indicated in the last line below: FOREIGN KEY (region_id) REFERENCES regions (region_id).

CREATE TABLE countries (
    country_id TEXT NOT NULL,
    country_name TEXT NOT NULL,
    region_id INTEGER NOT NULL,
    PRIMARY KEY (country_id ASC),
    FOREIGN KEY (region_id) REFERENCES regions (region_id) ON DELETE CASCADE ON UPDATE CASCADE);

Note also that in the last line ON DELETE CASCADE specifies that if a row in the regions table is deleted, all rows in the countries table that have a matching region_id will also be deleted automatically. Analogously, ON UPDATE CASCADE specifies that if a region_id value in the regions table is updated, all corresponding rows in the countries table will be updated accordingly. This is useful for maintaining referential integrity between the rows in these two tables.

Let’s inspect the countries table.

[69]:
%sql SELECT * FROM countries
 * sqlite://
Done.
[69]:
country_id country_name region_id
AR Argentina 2
AU Australia 3
BE Belgium 1
BR Brazil 2
CA Canada 2
CH Switzerland 1
CN China 3
DE Germany 1
DK Denmark 1
EG Egypt 4
FR France 1
HK HongKong 3
IL Israel 4
IN India 3
IT Italy 1
JP Japan 3
KW Kuwait 4
MX Mexico 2
NG Nigeria 4
NL Netherlands 1
SG Singapore 3
UK United Kingdom 1
US United States of America 2
ZM Zambia 4
ZW Zimbabwe 4
[70]:
%sql SELECT * FROM regions
 * sqlite://
Done.
[70]:
region_id region_name
1 Europe
2 Americas
3 Asia
4 Middle East and Africa

As we mentioned in the section on joining tables, we can use the foreign key column in a table to link to other tables. In the above example, the region_id column establishes a relationship between the countries table and the regions table, wherecountries.region_id must match an existing value in regions.region_id.

10.15 Subqueries

A subquery is a query that is nested inside another query, such as SELECT.

For instance, the following cell retrieves employees in the departments that have a location_id=1700.

The query placed inside the parentheses is a subquery. It is also known as an inner query or inner select. The query that contains the subquery is called an outer query or an outer select.

To execute the query, the database system first has to execute the subquery and retrieve the departments with a location_id of 1700, and afterward it has to execute the outer query.

[71]:
%%sql
SELECT
    employee_id, first_name, last_name
FROM
    employees
WHERE department_id IN
    (SELECT department_id FROM departments
        WHERE location_id = 1700)
ORDER BY first_name , last_name;
 * sqlite://
Done.
[71]:
employee_id first_name last_name
115 Alexander Khoo
109 Daniel Faviet
114 Den Raphaely
118 Guy Himuro
111 Ismael Sciarra
200 Jennifer Whalen
110 John Chen
112 Jose Manuel Urman
119 Karen Colmenares
102 Lex De Haan
113 Luis Popp
108 Nancy Greenberg
101 Neena Kochhar
205 Shelley Higgins
116 Shelli Baida
117 Sigal Tobias
100 Steven King
206 William Gietz

The next code finds the employee who has the highest salary.

[72]:
%%sql
SELECT
    employee_id, first_name, last_name, salary
FROM
    employees
WHERE salary =
    (SELECT MAX(salary) FROM employees)
ORDER BY first_name , last_name;
 * sqlite://
Done.
[72]:
employee_id first_name last_name salary
100 Steven King 24000.0

E.g., find all employees who salaries are greater than the average salary of all employees.

[73]:
%%sql
SELECT
    employee_id, first_name, last_name, salary
FROM
    employees
WHERE
    salary >
    (SELECT AVG(salary) FROM employees);
 * sqlite://
Done.
[73]:
employee_id first_name last_name salary
100 Steven King 24000.0
101 Neena Kochhar 17000.0
102 Lex De Haan 17000.0
103 Alexander Hunold 9000.0
108 Nancy Greenberg 12000.0
109 Daniel Faviet 9000.0
110 John Chen 8200.0
114 Den Raphaely 11000.0
121 Adam Fripp 8200.0
145 John Russell 14000.0
146 Karen Partners 13500.0
176 Jonathon Taylor 8600.0
177 Jack Livingston 8400.0
201 Michael Hartstein 13000.0
204 Hermann Baer 10000.0
205 Shelley Higgins 12000.0
206 William Gietz 8300.0

10.16 Connect to an Existing Database

Recall again that when we created new tables in section 10.3 or a new database in section 10.4, we used sqlite:// to connect to the tables or to the database. This syntax is used to specify an in-memory table or database, that is, it indicates that we want to work with tables or databases that exist only in the memory of our local machine. This syntax uses two forward slashes // after sqlite:.

To connect to an existing database that is stored in a local directory of our computer, we will use the syntax sqlite:///relative_database_path, where relative_database_path specifies the relative path to the local directory where the SQL database is stored. This syntax uses three forward slashes /// after sqlite:.

Similarly, we can connect to an existing database that is stored in a directory of our computer by using the absolute path to the database with the syntax sqlite:////absolute_database_path. This syntax uses four forward slashes //// after sqlite:.

In the following cell, we used sqlite:///data/EssentialSQL.db to initialize the connection to the database EssentialSQL.db which is located in the data subdirectory in the current working directory.

[74]:
%sql sqlite:///data/EssentialSQL.db

Note again that there are three forward slashes after sqlite:, since we provided a relative path to the database.

We can inspect the tables in the database EssentialSQL.db in the following cell.

[75]:
%sql SELECT name FROM sqlite_master WHERE type='table'
   sqlite://
 * sqlite:///data/EssentialSQL.db
Done.
[75]:
name
Customers
Shippers
Employees
Orders
OrderDetails

And we can inspect the data in the table Customers.

[76]:
%sql SELECT * FROM Customers
   sqlite://
 * sqlite:///data/EssentialSQL.db
Done.
[76]:
CustomerID CompanyName ContactName ContactTitle Address City State
1 Deerfield Tile Dick Terrcotta Owner 450 Village Street Deerfield IL
2 Sagebrush Carpet Barbara Berber Director of Installations 10 Industrial Drive El Paso TX
3 Floor Co. Jim Wood Installer 34218 Private Lane Monclair NJ
4 Main Tile and Bath Toni Faucet Owner Suite 23, Henry Building Orlando FL
5 Slots Carpet Jack Diamond III Purchaser 3024 Jackpot Drive Las Vegas NV

If the database does not exist, SQLite will create a new database with the provided name in the home directory. The following cell will create a new database named test.db in the current working directory.

[77]:
%sql sqlite:///test.db

References

  1. SQL Tutorial, available at https://www.sqltutorial.org/.

  2. Practice SQL with SQLite and Jupyter Notebook, by Chonghua Yin, available at https://github.com/royalosyin/Practice-SQL-with-SQLite-and-Jupyter-Notebook.

BACK TO TOP