Lecture 10 - Databases and SQL¶
10.1 Introduction to SQL¶
SQL (Structured Query Language) is a programming language designed for managing data in Relational Data Base Management Systems (RDBMS), or for stream processing in Relational Data Stream Management Systems (RDSMS). A relational database is a database that stores related information across multiple tables, and allows to query information in more than one table at the same time. Within a table, the data is organized in a tabular format with rows and columns.
SQL was initially developed in 1970, and since then different companies and vendors implemented SQL in their products with some variations. To bring greater conformity in the variants of SQL, the American National Standards Institute (ANSI) published the first SQL standard in 1986. The standard has been updated every few years since then.
Today, there are several variants of SQL for database management systems available, some of which were developed by companies such as IBM and Oracle, as well as there are variants developed by communities, such as MySQL, PostgreSQL, MariaDB, etc. Although these variants of SQL have certain differences, they are based on the basic SQL syntax, and are quite similar.
The main advantages of SQL include standardized syntax (since all relational database systems have an SQL query interpreter built-in), and is easy-to-understand due to using English-like commands and functions.
Relational Databases¶
Relational databases store information in multiple tables, which allows to work with more complex data, and have flexibility in the way the data is organized. An example is shown in the next figure, where a database is shown that is used for managing the HR data of a small business.
This database has seven tables:
Jobs table stores data related to job title and salary range.
Employees table stores the data of employees.
Dependents table stores the employee’s dependents.
Departments table stores department data.
Regions table stores the data of regions such as Asia, Europe, America, Middle East, and Africa.
Countries table stores the data of countries where the company is doing business.
Locations table stores the location of the departments of the company.
Each table contains many records with rows and columns (similar to an Excel spreadsheet), and the records have relationships across the tables. Using multiple tables in a relational database allows us to avoid duplication of information, in comparison to using a single table to store all information. Also, it provides flexibility in how we work with the data. To establish relationships between the records in different tables we need to use an ID or identifier for each employee. The identifier for each employee, or in general for each record (row) in a relational database, is referred to as primary key. For instance, each employee can be assigned an ID value (such as employee 162), and each table would have an ID column (primary key column) to establish the relationship with the other tables in the database.
Figure source: Reference [1].
SQL for Data Science¶
SQL is a very important tool for data scientists, data analysts, developers, and database administrators. In particular, as many companies become data-driven, SQL becomes an essential tool for handling data stored in databases and performing various data analytics operations, such as calculating data statistics, updating records, removing duplicate columns, calculating correlations between records, and similar.
SQL versus Pandas¶
SQL has similarities with the Pandas library, as it offers similar functionality to Pandas, which includes data manipulation over rows and columns, data merging, grouping, dealing with missing values, and similar. However, Pandas is not a relational database management library, but it is a data frame library.
Still, Pandas offers additional functions and flexibility for handling and manipulating tabular data, and many users download databases to their local machine, and afterward use Pandas for data processing, rather than using SQL to process the data on the server.
The benefits of using SQL over Pandas can depend on the task. Several considerations include:
In the case of a large database of information (e.g., GigaBytes of data), downloading the database to the local machine to be processed by Pandas may be slow or infeasible. Pandas is more suitable for processing small to medium size databases in Python.
Even if the user can download the data on the local machine for processing with Pandas, it may be required to apply some level of preprocessing or organizing the data on the server using SQL.
Some tasks can require that the data processing is done in the existing database. Also, when the tasks require fast data retrieval and processing, SQL can be more efficient than Pandas.
SQLite¶
In this lecture, we will use SQLite which implements a self-contained, serverless SQL database engine. SQLite is lightweight in terms of setup and required recourses. Unlike most other SQL variants, SQLite does not have a separate server process, and it reads and writes directly to disk files. That is, it does not use the client/server model. Because it has no server managing access to it, SQLite is not suitable in multiuser environments where multiple people can simultaneously edit files.
10.2 Using SQLite with Python¶
To demonstrate the use of SQLite with Python, in this lecture we will use magic commands in Jupyter Notebook. Magic commands are special commands which are not valid Python code, but perform certain actions in a Jupyter Notebook. They begin with the %
symbol.
The library ipython-sql
offers the magic functions %sql
and %%sql
, which allow to connect to a database and use standard SQL commands in Jupyter Notebooks. To run ipython-sql
on your computer, it needs to be installed (e.g., by pip install ipython-sql
). If we run the notebooks on Google Colab, ipython-sql
comes preinstalled.
To load the ipython-sql
library we will use %load_ext sql
as in the next code. %load_ext
is a magic command that loads an external package that can add new magic commands.
[1]:
%load_ext sql
The magic command %sql
is used to execute an SQL query that is contained in a single statement in a Jupyter notebook cell, and %%sql
allows to execute an SQL query that is contained in multiple SQL statements in a single cell.
10.3 Create a New Table¶
To create a new table we will use the SQL command CREATE TABLE
, as shown in the next cell. If the table already exists in the database, an error message will show up.
In order to establish a connection to the newly created table, we used %%sql sqlite://
in the cell below. If we wanted to create a new table in an existing database to which we have already established a connection, we could have used only the magic command %%sql
.
SQL has many commands or keywords that have special meaning, such as SELECT, INSERT, DELETE, and these keywords cannot be used as names of tables, columns, or other objects.
To make SQL language more readable, it is a convention to write the SQL commands with uppercase letters, and the other variables and identifiers with lowercase letters. However, this is not required, as the SQL commands are not case-sensitive.
Let’s create a table called cars
which has 3 columns: id
, name
, and price
. In the cell below, we specify that the values of id
and price
columns are integers, denoted by the INTEGER
keyword. The names in the name
column have a character string type (VARCHAR
, i.e., variable-length character string) and should have at most 50 characters. We also specified with NOT NULL
that the values should not be missing in the id
and name
columns, i.e., when we
insert data into the table we have to specify the values for the NON NULL
columns.
Each table can have only one PRIMARY KEY column that uniquely identifies each row in the table, and prevents from inserting duplicate rows in the table. For the table below we set PRIMARY KEY to the id
column. It is not required to define a primary key, however, it is a good practice to do it for every table.
[2]:
%%sql sqlite://
CREATE TABLE cars(
id INTEGER NOT NULL PRIMARY KEY,
name VARCHAR(50) NOT NULL,
price INTEGER);
Done.
[2]:
[]
The table that we just created is empty, and to add data to the table we will use the INSERT
statement. In each row we provide values for id
, name
, and price
.
When using multiple statements in SQL, the statements in each line need to be separated with a semicolon ;
. The last statement in a cell does not have to be followed by a semicolon.
Inline comments can be inserted by using two consecutive hyphens --
that comment the rest of the line, as shown in the second line of the next cell.
And also, comments that span multiple lines can be inserted by using the multiline C-style notation /* comment */
as in the last lines in the cell.
Note also that we used just the magic command %%sql
in this cell, and we didn’t need to write %%sql sqlite://
as in the above cell. The reason is that in the above cell we used %%sql sqlite://
to establish a connection to the newly created table. Once a connection is established, we can use only %sql
or %%sql
to work with the table.
[3]:
%%sql
INSERT INTO cars VALUES(1,'Audi',52642); --Two consecutive hyphens (--) comment the rest of the line
INSERT INTO cars VALUES(2,'Mercedes',57127);
INSERT INTO cars VALUES(3,'Skoda',9000);
INSERT INTO cars VALUES(4,'Volvo',29000);
INSERT INTO cars VALUES(5,'Bentley',350000);
INSERT INTO cars VALUES(6,'Citroen',21000);
INSERT INTO cars VALUES(7,'Hummer',41400);
INSERT INTO cars VALUES(8,'Volkswagen',21600);
/* A comment that spans
more than one line */
* sqlite://
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
Done.
[3]:
[]
We can display the table with the following code. Notice again that we used a single %
in the magic command %sql
, since we have only one line of code.
[4]:
%sql SELECT * from cars
* sqlite://
Done.
[4]:
id | name | price |
---|---|---|
1 | Audi | 52642 |
2 | Mercedes | 57127 |
3 | Skoda | 9000 |
4 | Volvo | 29000 |
5 | Bentley | 350000 |
6 | Citroen | 21000 |
7 | Hummer | 41400 |
8 | Volkswagen | 21600 |
Another Example of Creating a Table¶
In the next simple example, we will create another table called writer
, with columns FirstName
, LastName
, and Year
.
[5]:
%%sql sqlite://
CREATE TABLE writer(
FirstName VARCHAR(50) NOT NULL,
LastName VARCHAR(50) NOT NULL,
Year INTEGER NOT NULL PRIMARY KEY);
Done.
[5]:
[]
[6]:
%%sql
INSERT INTO writer VALUES ('William', 'Shakespeare', 1616);
INSERT INTO writer VALUES ('Lin', 'Han', 1996);
INSERT INTO writer VALUES ('Peter', 'Brecht', 1978);
* sqlite://
1 rows affected.
1 rows affected.
1 rows affected.
[6]:
[]
[7]:
%sql SELECT * from writer
* sqlite://
Done.
[7]:
FirstName | LastName | Year |
---|---|---|
William | Shakespeare | 1616 |
Peter | Brecht | 1978 |
Lin | Han | 1996 |
10.4 Database Example¶
As an example of a database, let’s create a database that was shown in the above section, related to managing the HR data of a small business.
The cells below first create the tables (recall that the database has 7 tables), and afterward the information for each table is inserted.
[8]:
%%sql sqlite://
CREATE TABLE regions (
region_id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
region_name TEXT NOT NULL);
CREATE TABLE countries (
country_id TEXT NOT NULL,
country_name TEXT NOT NULL,
region_id INTEGER NOT NULL,
PRIMARY KEY (country_id ASC),
FOREIGN KEY (region_id) REFERENCES regions (region_id) ON DELETE CASCADE ON UPDATE CASCADE);
CREATE TABLE locations (
location_id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
street_address TEXT,
postal_code TEXT,
city text NOT NULL,
state_province TEXT,
country_id INTEGER NOT NULL,
FOREIGN KEY (country_id) REFERENCES countries (country_id) ON DELETE CASCADE ON UPDATE CASCADE);
CREATE TABLE departments (
department_id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
department_name TEXT NOT NULL,
location_id INTEGER NOT NULL,
FOREIGN KEY (location_id) REFERENCES locations (location_id) ON DELETE CASCADE ON UPDATE CASCADE);
CREATE TABLE jobs (
job_id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
job_title TEXT NOT NULL,
min_salary DOUBLE NOT NULL,
max_salary DOUBLE NOT NULL);
CREATE TABLE employees (
employee_id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
first_name TEXT,
last_name TEXT NOT NULL,
email TEXT NOT NULL,
phone_number TEXT,
hire_date TEXT NOT NULL,
job_id INTEGER NOT NULL,
salary DOUBLE NOT NULL,
manager_id INTEGER,
department_id INTEGER NOT NULL,
FOREIGN KEY (job_id) REFERENCES jobs (job_id) ON DELETE CASCADE ON UPDATE CASCADE,
FOREIGN KEY (department_id) REFERENCES departments (department_id) ON DELETE CASCADE ON UPDATE CASCADE,
FOREIGN KEY (manager_id) REFERENCES employees (employee_id) ON DELETE CASCADE ON UPDATE CASCADE);
CREATE TABLE dependents (
dependent_id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
first_name TEXT NOT NULL,
last_name TEXT NOT NULL,
relationship TEXT NOT NULL,
employee_id INTEGER NOT NULL,
FOREIGN KEY (employee_id) REFERENCES employees (employee_id) ON DELETE CASCADE ON UPDATE CASCADE);
Done.
Done.
Done.
Done.
Done.
Done.
Done.
[8]:
[]
[ ]:
%%sql
/*Data for the table regions */
INSERT INTO regions(region_id,region_name) VALUES (1,'Europe');
INSERT INTO regions(region_id,region_name) VALUES (2,'Americas');
INSERT INTO regions(region_id,region_name) VALUES (3,'Asia');
INSERT INTO regions(region_id,region_name) VALUES (4,'Middle East and Africa');
/*Data for the table countries */
INSERT INTO countries(country_id,country_name,region_id) VALUES ('AR','Argentina',2);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('AU','Australia',3);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('BE','Belgium',1);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('BR','Brazil',2);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('CA','Canada',2);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('CH','Switzerland',1);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('CN','China',3);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('DE','Germany',1);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('DK','Denmark',1);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('EG','Egypt',4);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('FR','France',1);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('HK','HongKong',3);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('IL','Israel',4);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('IN','India',3);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('IT','Italy',1);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('JP','Japan',3);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('KW','Kuwait',4);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('MX','Mexico',2);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('NG','Nigeria',4);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('NL','Netherlands',1);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('SG','Singapore',3);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('UK','United Kingdom',1);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('US','United States of America',2);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('ZM','Zambia',4);
INSERT INTO countries(country_id,country_name,region_id) VALUES ('ZW','Zimbabwe',4);
/*Data for the table locations */
INSERT INTO locations(location_id,street_address,postal_code,city,state_province,country_id) VALUES (1400,'2014 Jabberwocky Rd','26192','Southlake','Texas','US');
INSERT INTO locations(location_id,street_address,postal_code,city,state_province,country_id) VALUES (1500,'2011 Interiors Blvd','99236','South San Francisco','California','US');
INSERT INTO locations(location_id,street_address,postal_code,city,state_province,country_id) VALUES (1700,'2004 Charade Rd','98199','Seattle','Washington','US');
INSERT INTO locations(location_id,street_address,postal_code,city,state_province,country_id) VALUES (1800,'147 Spadina Ave','M5V 2L7','Toronto','Ontario','CA');
INSERT INTO locations(location_id,street_address,postal_code,city,state_province,country_id) VALUES (2400,'8204 Arthur St',NULL,'London',NULL,'UK');
INSERT INTO locations(location_id,street_address,postal_code,city,state_province,country_id) VALUES (2500,'Magdalen Centre, The Oxford Science Park','OX9 9ZB','Oxford','Oxford','UK');
INSERT INTO locations(location_id,street_address,postal_code,city,state_province,country_id) VALUES (2700,'Schwanthalerstr. 7031','80925','Munich','Bavaria','DE');
/*Data for the table jobs */
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (1,'Public Accountant',4200.00,9000.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (2,'Accounting Manager',8200.00,16000.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (3,'Administration Assistant',3000.00,6000.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (4,'President',20000.00,40000.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (5,'Administration Vice President',15000.00,30000.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (6,'Accountant',4200.00,9000.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (7,'Finance Manager',8200.00,16000.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (8,'Human Resources Representative',4000.00,9000.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (9,'Programmer',4000.00,10000.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (10,'Marketing Manager',9000.00,15000.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (11,'Marketing Representative',4000.00,9000.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (12,'Public Relations Representative',4500.00,10500.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (13,'Purchasing Clerk',2500.00,5500.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (14,'Purchasing Manager',8000.00,15000.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (15,'Sales Manager',10000.00,20000.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (16,'Sales Representative',6000.00,12000.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (17,'Shipping Clerk',2500.00,5500.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (18,'Stock Clerk',2000.00,5000.00);
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (19,'Stock Manager',5500.00,8500.00);
/*Data for the table departments */
INSERT INTO departments(department_id,department_name,location_id) VALUES (1,'Administration',1700);
INSERT INTO departments(department_id,department_name,location_id) VALUES (2,'Marketing',1800);
INSERT INTO departments(department_id,department_name,location_id) VALUES (3,'Purchasing',1700);
INSERT INTO departments(department_id,department_name,location_id) VALUES (4,'Human Resources',2400);
INSERT INTO departments(department_id,department_name,location_id) VALUES (5,'Shipping',1500);
INSERT INTO departments(department_id,department_name,location_id) VALUES (6,'IT',1400);
INSERT INTO departments(department_id,department_name,location_id) VALUES (7,'Public Relations',2700);
INSERT INTO departments(department_id,department_name,location_id) VALUES (8,'Sales',2500);
INSERT INTO departments(department_id,department_name,location_id) VALUES (9,'Executive',1700);
INSERT INTO departments(department_id,department_name,location_id) VALUES (10,'Finance',1700);
INSERT INTO departments(department_id,department_name,location_id) VALUES (11,'Accounting',1700);
/*Data for the table employees */
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (100,'Steven','King','steven.king@sqltutorial.org','515.123.4567','1987-06-17',4,24000.00,NULL,9);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (101,'Neena','Kochhar','neena.kochhar@sqltutorial.org','515.123.4568','1989-09-21',5,17000.00,100,9);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (102,'Lex','De Haan','lex.de haan@sqltutorial.org','515.123.4569','1993-01-13',5,17000.00,100,9);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (103,'Alexander','Hunold','alexander.hunold@sqltutorial.org','590.423.4567','1990-01-03',9,9000.00,102,6);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (104,'Bruce','Ernst','bruce.ernst@sqltutorial.org','590.423.4568','1991-05-21',9,6000.00,103,6);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (105,'David','Austin','david.austin@sqltutorial.org','590.423.4569','1997-06-25',9,4800.00,103,6);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (106,'Valli','Pataballa','valli.pataballa@sqltutorial.org','590.423.4560','1998-02-05',9,4800.00,103,6);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (107,'Diana','Lorentz','diana.lorentz@sqltutorial.org','590.423.5567','1999-02-07',9,4200.00,103,6);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (108,'Nancy','Greenberg','nancy.greenberg@sqltutorial.org','515.124.4569','1994-08-17',7,12000.00,101,10);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (109,'Daniel','Faviet','daniel.faviet@sqltutorial.org','515.124.4169','1994-08-16',6,9000.00,108,10);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (110,'John','Chen','john.chen@sqltutorial.org','515.124.4269','1997-09-28',6,8200.00,108,10);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (111,'Ismael','Sciarra','ismael.sciarra@sqltutorial.org','515.124.4369','1997-09-30',6,7700.00,108,10);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (112,'Jose Manuel','Urman','jose manuel.urman@sqltutorial.org','515.124.4469','1998-03-07',6,7800.00,108,10);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (113,'Luis','Popp','luis.popp@sqltutorial.org','515.124.4567','1999-12-07',6,6900.00,108,10);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (114,'Den','Raphaely','den.raphaely@sqltutorial.org','515.127.4561','1994-12-07',14,11000.00,100,3);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (115,'Alexander','Khoo','alexander.khoo@sqltutorial.org','515.127.4562','1995-05-18',13,3100.00,114,3);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (116,'Shelli','Baida','shelli.baida@sqltutorial.org','515.127.4563','1997-12-24',13,2900.00,114,3);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (117,'Sigal','Tobias','sigal.tobias@sqltutorial.org','515.127.4564','1997-07-24',13,2800.00,114,3);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (118,'Guy','Himuro','guy.himuro@sqltutorial.org','515.127.4565','1998-11-15',13,2600.00,114,3);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (119,'Karen','Colmenares','karen.colmenares@sqltutorial.org','515.127.4566','1999-08-10',13,2500.00,114,3);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (120,'Matthew','Weiss','matthew.weiss@sqltutorial.org','650.123.1234','1996-07-18',19,8000.00,100,5);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (121,'Adam','Fripp','adam.fripp@sqltutorial.org','650.123.2234','1997-04-10',19,8200.00,100,5);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (122,'Payam','Kaufling','payam.kaufling@sqltutorial.org','650.123.3234','1995-05-01',19,7900.00,100,5);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (123,'Shanta','Vollman','shanta.vollman@sqltutorial.org','650.123.4234','1997-10-10',19,6500.00,100,5);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (126,'Irene','Mikkilineni','irene.mikkilineni@sqltutorial.org','650.124.1224','1998-09-28',18,2700.00,120,5);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (145,'John','Russell','john.russell@sqltutorial.org',NULL,'1996-10-01',15,14000.00,100,8);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (146,'Karen','Partners','karen.partners@sqltutorial.org',NULL,'1997-01-05',15,13500.00,100,8);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (176,'Jonathon','Taylor','jonathon.taylor@sqltutorial.org',NULL,'1998-03-24',16,8600.00,100,8);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (177,'Jack','Livingston','jack.livingston@sqltutorial.org',NULL,'1998-04-23',16,8400.00,100,8);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (178,'Kimberely','Grant','kimberely.grant@sqltutorial.org',NULL,'1999-05-24',16,7000.00,100,8);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (179,'Charles','Johnson','charles.johnson@sqltutorial.org',NULL,'2000-01-04',16,6200.00,100,8);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (192,'Sarah','Bell','sarah.bell@sqltutorial.org','650.501.1876','1996-02-04',17,4000.00,123,5);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (193,'Britney','Everett','britney.everett@sqltutorial.org','650.501.2876','1997-03-03',17,3900.00,123,5);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (200,'Jennifer','Whalen','jennifer.whalen@sqltutorial.org','515.123.4444','1987-09-17',3,4400.00,101,1);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (201,'Michael','Hartstein','michael.hartstein@sqltutorial.org','515.123.5555','1996-02-17',10,13000.00,100,2);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (202,'Pat','Fay','pat.fay@sqltutorial.org','603.123.6666','1997-08-17',11,6000.00,201,2);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (203,'Susan','Mavris','susan.mavris@sqltutorial.org','515.123.7777','1994-06-07',8,6500.00,101,4);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (204,'Hermann','Baer','hermann.baer@sqltutorial.org','515.123.8888','1994-06-07',12,10000.00,101,7);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (205,'Shelley','Higgins','shelley.higgins@sqltutorial.org','515.123.8080','1994-06-07',2,12000.00,101,11);
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (206,'William','Gietz','william.gietz@sqltutorial.org','515.123.8181','1994-06-07',1,8300.00,205,11);
/*Data for the table dependents */
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (1,'Penelope','Gietz','Child',206);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (2,'Nick','Higgins','Child',205);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (3,'Ed','Whalen','Child',200);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (4,'Jennifer','King','Child',100);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (5,'Johnny','Kochhar','Child',101);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (6,'Bette','De Haan','Child',102);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (7,'Grace','Faviet','Child',109);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (8,'Matthew','Chen','Child',110);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (9,'Joe','Sciarra','Child',111);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (10,'Christian','Urman','Child',112);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (11,'Zero','Popp','Child',113);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (12,'Karl','Greenberg','Child',108);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (13,'Uma','Mavris','Child',203);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (14,'Vivien','Hunold','Child',103);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (15,'Cuba','Ernst','Child',104);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (16,'Fred','Austin','Child',105);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (17,'Helen','Pataballa','Child',106);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (18,'Dan','Lorentz','Child',107);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (19,'Bob','Hartstein','Child',201);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (20,'Lucille','Fay','Child',202);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (21,'Kirsten','Baer','Child',204);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (22,'Elvis','Khoo','Child',115);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (23,'Sandra','Baida','Child',116);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (24,'Cameron','Tobias','Child',117);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (25,'Kevin','Himuro','Child',118);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (26,'Rip','Colmenares','Child',119);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (27,'Julia','Raphaely','Child',114);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (28,'Woody','Russell','Child',145);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (29,'Alec','Partners','Child',146);
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (30,'Sandra','Taylor','Child',176);
10.5 Querying Databases with SELECT¶
The most common SQL task is to retrieve data from one or more tables. The data is returned in the form of a result table, called result set. This is accomplished with the SELECT
statement, which has the following general syntax.
SELECT
column1, column2, columnN
FROM
table_name
To run the code in this Jupyter Notebook, we will just insert the magic commands %sql
or %%sql
in front of each SQL code.
For example, in the next cell we retrieved the columns employee_id, first_name, last_name, hire_date
from the employees
table. When the statement is evaluated, the database system first evaluates the FROM
clause and the SELECT
clause afterward. I.e., from the table named employees
select the listed columns.
[10]:
%%sql
SELECT
employee_id, first_name, last_name, hire_date
FROM
employees
* sqlite://
Done.
[10]:
employee_id | first_name | last_name | hire_date |
---|---|---|---|
100 | Steven | King | 1987-06-17 |
101 | Neena | Kochhar | 1989-09-21 |
102 | Lex | De Haan | 1993-01-13 |
103 | Alexander | Hunold | 1990-01-03 |
104 | Bruce | Ernst | 1991-05-21 |
105 | David | Austin | 1997-06-25 |
106 | Valli | Pataballa | 1998-02-05 |
107 | Diana | Lorentz | 1999-02-07 |
108 | Nancy | Greenberg | 1994-08-17 |
109 | Daniel | Faviet | 1994-08-16 |
110 | John | Chen | 1997-09-28 |
111 | Ismael | Sciarra | 1997-09-30 |
112 | Jose Manuel | Urman | 1998-03-07 |
113 | Luis | Popp | 1999-12-07 |
114 | Den | Raphaely | 1994-12-07 |
115 | Alexander | Khoo | 1995-05-18 |
116 | Shelli | Baida | 1997-12-24 |
117 | Sigal | Tobias | 1997-07-24 |
118 | Guy | Himuro | 1998-11-15 |
119 | Karen | Colmenares | 1999-08-10 |
120 | Matthew | Weiss | 1996-07-18 |
121 | Adam | Fripp | 1997-04-10 |
122 | Payam | Kaufling | 1995-05-01 |
123 | Shanta | Vollman | 1997-10-10 |
126 | Irene | Mikkilineni | 1998-09-28 |
145 | John | Russell | 1996-10-01 |
146 | Karen | Partners | 1997-01-05 |
176 | Jonathon | Taylor | 1998-03-24 |
177 | Jack | Livingston | 1998-04-23 |
178 | Kimberely | Grant | 1999-05-24 |
179 | Charles | Johnson | 2000-01-04 |
192 | Sarah | Bell | 1996-02-04 |
193 | Britney | Everett | 1997-03-03 |
200 | Jennifer | Whalen | 1987-09-17 |
201 | Michael | Hartstein | 1996-02-17 |
202 | Pat | Fay | 1997-08-17 |
203 | Susan | Mavris | 1994-06-07 |
204 | Hermann | Baer | 1994-06-07 |
205 | Shelley | Higgins | 1994-06-07 |
206 | William | Gietz | 1994-06-07 |
If we want to query all columns in a table we can use the asterisk operator *
instead of the columns names.
[11]:
%sql SELECT * FROM employees
* sqlite://
Done.
[11]:
employee_id | first_name | last_name | phone_number | hire_date | job_id | salary | manager_id | department_id | |
---|---|---|---|---|---|---|---|---|---|
100 | Steven | King | steven.king@sqltutorial.org | 515.123.4567 | 1987-06-17 | 4 | 24000.0 | None | 9 |
101 | Neena | Kochhar | neena.kochhar@sqltutorial.org | 515.123.4568 | 1989-09-21 | 5 | 17000.0 | 100 | 9 |
102 | Lex | De Haan | lex.de haan@sqltutorial.org | 515.123.4569 | 1993-01-13 | 5 | 17000.0 | 100 | 9 |
103 | Alexander | Hunold | alexander.hunold@sqltutorial.org | 590.423.4567 | 1990-01-03 | 9 | 9000.0 | 102 | 6 |
104 | Bruce | Ernst | bruce.ernst@sqltutorial.org | 590.423.4568 | 1991-05-21 | 9 | 6000.0 | 103 | 6 |
105 | David | Austin | david.austin@sqltutorial.org | 590.423.4569 | 1997-06-25 | 9 | 4800.0 | 103 | 6 |
106 | Valli | Pataballa | valli.pataballa@sqltutorial.org | 590.423.4560 | 1998-02-05 | 9 | 4800.0 | 103 | 6 |
107 | Diana | Lorentz | diana.lorentz@sqltutorial.org | 590.423.5567 | 1999-02-07 | 9 | 4200.0 | 103 | 6 |
108 | Nancy | Greenberg | nancy.greenberg@sqltutorial.org | 515.124.4569 | 1994-08-17 | 7 | 12000.0 | 101 | 10 |
109 | Daniel | Faviet | daniel.faviet@sqltutorial.org | 515.124.4169 | 1994-08-16 | 6 | 9000.0 | 108 | 10 |
110 | John | Chen | john.chen@sqltutorial.org | 515.124.4269 | 1997-09-28 | 6 | 8200.0 | 108 | 10 |
111 | Ismael | Sciarra | ismael.sciarra@sqltutorial.org | 515.124.4369 | 1997-09-30 | 6 | 7700.0 | 108 | 10 |
112 | Jose Manuel | Urman | jose manuel.urman@sqltutorial.org | 515.124.4469 | 1998-03-07 | 6 | 7800.0 | 108 | 10 |
113 | Luis | Popp | luis.popp@sqltutorial.org | 515.124.4567 | 1999-12-07 | 6 | 6900.0 | 108 | 10 |
114 | Den | Raphaely | den.raphaely@sqltutorial.org | 515.127.4561 | 1994-12-07 | 14 | 11000.0 | 100 | 3 |
115 | Alexander | Khoo | alexander.khoo@sqltutorial.org | 515.127.4562 | 1995-05-18 | 13 | 3100.0 | 114 | 3 |
116 | Shelli | Baida | shelli.baida@sqltutorial.org | 515.127.4563 | 1997-12-24 | 13 | 2900.0 | 114 | 3 |
117 | Sigal | Tobias | sigal.tobias@sqltutorial.org | 515.127.4564 | 1997-07-24 | 13 | 2800.0 | 114 | 3 |
118 | Guy | Himuro | guy.himuro@sqltutorial.org | 515.127.4565 | 1998-11-15 | 13 | 2600.0 | 114 | 3 |
119 | Karen | Colmenares | karen.colmenares@sqltutorial.org | 515.127.4566 | 1999-08-10 | 13 | 2500.0 | 114 | 3 |
120 | Matthew | Weiss | matthew.weiss@sqltutorial.org | 650.123.1234 | 1996-07-18 | 19 | 8000.0 | 100 | 5 |
121 | Adam | Fripp | adam.fripp@sqltutorial.org | 650.123.2234 | 1997-04-10 | 19 | 8200.0 | 100 | 5 |
122 | Payam | Kaufling | payam.kaufling@sqltutorial.org | 650.123.3234 | 1995-05-01 | 19 | 7900.0 | 100 | 5 |
123 | Shanta | Vollman | shanta.vollman@sqltutorial.org | 650.123.4234 | 1997-10-10 | 19 | 6500.0 | 100 | 5 |
126 | Irene | Mikkilineni | irene.mikkilineni@sqltutorial.org | 650.124.1224 | 1998-09-28 | 18 | 2700.0 | 120 | 5 |
145 | John | Russell | john.russell@sqltutorial.org | None | 1996-10-01 | 15 | 14000.0 | 100 | 8 |
146 | Karen | Partners | karen.partners@sqltutorial.org | None | 1997-01-05 | 15 | 13500.0 | 100 | 8 |
176 | Jonathon | Taylor | jonathon.taylor@sqltutorial.org | None | 1998-03-24 | 16 | 8600.0 | 100 | 8 |
177 | Jack | Livingston | jack.livingston@sqltutorial.org | None | 1998-04-23 | 16 | 8400.0 | 100 | 8 |
178 | Kimberely | Grant | kimberely.grant@sqltutorial.org | None | 1999-05-24 | 16 | 7000.0 | 100 | 8 |
179 | Charles | Johnson | charles.johnson@sqltutorial.org | None | 2000-01-04 | 16 | 6200.0 | 100 | 8 |
192 | Sarah | Bell | sarah.bell@sqltutorial.org | 650.501.1876 | 1996-02-04 | 17 | 4000.0 | 123 | 5 |
193 | Britney | Everett | britney.everett@sqltutorial.org | 650.501.2876 | 1997-03-03 | 17 | 3900.0 | 123 | 5 |
200 | Jennifer | Whalen | jennifer.whalen@sqltutorial.org | 515.123.4444 | 1987-09-17 | 3 | 4400.0 | 101 | 1 |
201 | Michael | Hartstein | michael.hartstein@sqltutorial.org | 515.123.5555 | 1996-02-17 | 10 | 13000.0 | 100 | 2 |
202 | Pat | Fay | pat.fay@sqltutorial.org | 603.123.6666 | 1997-08-17 | 11 | 6000.0 | 201 | 2 |
203 | Susan | Mavris | susan.mavris@sqltutorial.org | 515.123.7777 | 1994-06-07 | 8 | 6500.0 | 101 | 4 |
204 | Hermann | Baer | hermann.baer@sqltutorial.org | 515.123.8888 | 1994-06-07 | 12 | 10000.0 | 101 | 7 |
205 | Shelley | Higgins | shelley.higgins@sqltutorial.org | 515.123.8080 | 1994-06-07 | 2 | 12000.0 | 101 | 11 |
206 | William | Gietz | william.gietz@sqltutorial.org | 515.123.8181 | 1994-06-07 | 1 | 8300.0 | 205 | 11 |
List Tables in a Database¶
We can also use the SELECT
statement to display a list of all tables in the current database. Every SQLite database has an sqlite_master
table that defines the schema for the database. The following statement uses sqlite_master
with the type
field set to ‘table’ to return the names of all tables in the current database.
[12]:
%sql SELECT name FROM sqlite_master WHERE type='table'
* sqlite://
Done.
[12]:
name |
---|
cars |
writer |
regions |
sqlite_sequence |
countries |
locations |
departments |
jobs |
employees |
dependents |
Here is the entire sqlite_master
table for the current database. It stores metadata about the tables and other objects (e.g., indexes and views) in an SQLite database. It contains the columns type
, name
, tbl_name
, rootpage
(the root page of the object withing the SQL database file), and sql
(the SQL CREATE statement that was used to create the database).
The sqlite_master
table is automatically created and maintained by SQLite, and it is typically queried to inspect the schema of a database or retrieve information about its tables and other objects.
[13]:
%sql SELECT * FROM sqlite_master
* sqlite://
Done.
[13]:
type | name | tbl_name | rootpage | sql |
---|---|---|---|---|
table | cars | cars | 2 | CREATE TABLE cars( id INTEGER NOT NULL PRIMARY KEY, name VARCHAR(50) NOT NULL, price INTEGER) |
table | writer | writer | 3 | CREATE TABLE writer( FirstName VARCHAR(50) NOT NULL, LastName VARCHAR(50) NOT NULL, Year INTEGER NOT NULL PRIMARY KEY) |
table | regions | regions | 4 | CREATE TABLE regions ( region_id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, region_name TEXT NOT NULL) |
table | sqlite_sequence | sqlite_sequence | 5 | CREATE TABLE sqlite_sequence(name,seq) |
table | countries | countries | 6 | CREATE TABLE countries ( country_id TEXT NOT NULL, country_name TEXT NOT NULL, region_id INTEGER NOT NULL, PRIMARY KEY (country_id ASC), FOREIGN KEY (region_id) REFERENCES regions (region_id) ON DELETE CASCADE ON UPDATE CASCADE) |
index | sqlite_autoindex_countries_1 | countries | 7 | None |
table | locations | locations | 8 | CREATE TABLE locations ( location_id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, street_address TEXT, postal_code TEXT, city text NOT NULL, state_province TEXT, country_id INTEGER NOT NULL, FOREIGN KEY (country_id) REFERENCES countries (country_id) ON DELETE CASCADE ON UPDATE CASCADE) |
table | departments | departments | 9 | CREATE TABLE departments ( department_id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, department_name TEXT NOT NULL, location_id INTEGER NOT NULL, FOREIGN KEY (location_id) REFERENCES locations (location_id) ON DELETE CASCADE ON UPDATE CASCADE) |
table | jobs | jobs | 10 | CREATE TABLE jobs ( job_id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, job_title TEXT NOT NULL, min_salary DOUBLE NOT NULL, max_salary DOUBLE NOT NULL) |
table | employees | employees | 11 | CREATE TABLE employees ( employee_id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, first_name TEXT, last_name TEXT NOT NULL, email TEXT NOT NULL, phone_number TEXT, hire_date TEXT NOT NULL, job_id INTEGER NOT NULL, salary DOUBLE NOT NULL, manager_id INTEGER, department_id INTEGER NOT NULL, FOREIGN KEY (job_id) REFERENCES jobs (job_id) ON DELETE CASCADE ON UPDATE CASCADE, FOREIGN KEY (department_id) REFERENCES departments (department_id) ON DELETE CASCADE ON UPDATE CASCADE, FOREIGN KEY (manager_id) REFERENCES employees (employee_id) ON DELETE CASCADE ON UPDATE CASCADE) |
table | dependents | dependents | 12 | CREATE TABLE dependents ( dependent_id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, first_name TEXT NOT NULL, last_name TEXT NOT NULL, relationship TEXT NOT NULL, employee_id INTEGER NOT NULL, FOREIGN KEY (employee_id) REFERENCES employees (employee_id) ON DELETE CASCADE ON UPDATE CASCADE) |
For instance, we can use the following code to retrieve the CREATE
statement for the table cars
.
[14]:
%sql SELECT sql FROM sqlite_master WHERE type='table' AND name='cars';
* sqlite://
Done.
[14]:
sql |
---|
CREATE TABLE cars( id INTEGER NOT NULL PRIMARY KEY, name VARCHAR(50) NOT NULL, price INTEGER) |
Perform Simple Calculations in SELECT Statements¶
We can use standard math operators such as +
, *
, /
, %
in SELECT
statements to perform simple mathematical calculations. The following expression creates a new column salary * 1.05
from the salary
column and adds 5% to the salary of every employee.
[15]:
%%sql
SELECT
employee_id, first_name, salary, salary*1.05
FROM
employees;
* sqlite://
Done.
[15]:
employee_id | first_name | salary | salary*1.05 |
---|---|---|---|
100 | Steven | 24000.0 | 25200.0 |
101 | Neena | 17000.0 | 17850.0 |
102 | Lex | 17000.0 | 17850.0 |
103 | Alexander | 9000.0 | 9450.0 |
104 | Bruce | 6000.0 | 6300.0 |
105 | David | 4800.0 | 5040.0 |
106 | Valli | 4800.0 | 5040.0 |
107 | Diana | 4200.0 | 4410.0 |
108 | Nancy | 12000.0 | 12600.0 |
109 | Daniel | 9000.0 | 9450.0 |
110 | John | 8200.0 | 8610.0 |
111 | Ismael | 7700.0 | 8085.0 |
112 | Jose Manuel | 7800.0 | 8190.0 |
113 | Luis | 6900.0 | 7245.0 |
114 | Den | 11000.0 | 11550.0 |
115 | Alexander | 3100.0 | 3255.0 |
116 | Shelli | 2900.0 | 3045.0 |
117 | Sigal | 2800.0 | 2940.0 |
118 | Guy | 2600.0 | 2730.0 |
119 | Karen | 2500.0 | 2625.0 |
120 | Matthew | 8000.0 | 8400.0 |
121 | Adam | 8200.0 | 8610.0 |
122 | Payam | 7900.0 | 8295.0 |
123 | Shanta | 6500.0 | 6825.0 |
126 | Irene | 2700.0 | 2835.0 |
145 | John | 14000.0 | 14700.0 |
146 | Karen | 13500.0 | 14175.0 |
176 | Jonathon | 8600.0 | 9030.0 |
177 | Jack | 8400.0 | 8820.0 |
178 | Kimberely | 7000.0 | 7350.0 |
179 | Charles | 6200.0 | 6510.0 |
192 | Sarah | 4000.0 | 4200.0 |
193 | Britney | 3900.0 | 4095.0 |
200 | Jennifer | 4400.0 | 4620.0 |
201 | Michael | 13000.0 | 13650.0 |
202 | Pat | 6000.0 | 6300.0 |
203 | Susan | 6500.0 | 6825.0 |
204 | Hermann | 10000.0 | 10500.0 |
205 | Shelley | 12000.0 | 12600.0 |
206 | William | 8300.0 | 8715.0 |
We can use AS new_salary
to assign a different name for the newly created column.
[16]:
%%sql
SELECT
employee_id, first_name, salary, salary*1.05 AS new_salary
FROM
employees;
* sqlite://
Done.
[16]:
employee_id | first_name | salary | new_salary |
---|---|---|---|
100 | Steven | 24000.0 | 25200.0 |
101 | Neena | 17000.0 | 17850.0 |
102 | Lex | 17000.0 | 17850.0 |
103 | Alexander | 9000.0 | 9450.0 |
104 | Bruce | 6000.0 | 6300.0 |
105 | David | 4800.0 | 5040.0 |
106 | Valli | 4800.0 | 5040.0 |
107 | Diana | 4200.0 | 4410.0 |
108 | Nancy | 12000.0 | 12600.0 |
109 | Daniel | 9000.0 | 9450.0 |
110 | John | 8200.0 | 8610.0 |
111 | Ismael | 7700.0 | 8085.0 |
112 | Jose Manuel | 7800.0 | 8190.0 |
113 | Luis | 6900.0 | 7245.0 |
114 | Den | 11000.0 | 11550.0 |
115 | Alexander | 3100.0 | 3255.0 |
116 | Shelli | 2900.0 | 3045.0 |
117 | Sigal | 2800.0 | 2940.0 |
118 | Guy | 2600.0 | 2730.0 |
119 | Karen | 2500.0 | 2625.0 |
120 | Matthew | 8000.0 | 8400.0 |
121 | Adam | 8200.0 | 8610.0 |
122 | Payam | 7900.0 | 8295.0 |
123 | Shanta | 6500.0 | 6825.0 |
126 | Irene | 2700.0 | 2835.0 |
145 | John | 14000.0 | 14700.0 |
146 | Karen | 13500.0 | 14175.0 |
176 | Jonathon | 8600.0 | 9030.0 |
177 | Jack | 8400.0 | 8820.0 |
178 | Kimberely | 7000.0 | 7350.0 |
179 | Charles | 6200.0 | 6510.0 |
192 | Sarah | 4000.0 | 4200.0 |
193 | Britney | 3900.0 | 4095.0 |
200 | Jennifer | 4400.0 | 4620.0 |
201 | Michael | 13000.0 | 13650.0 |
202 | Pat | 6000.0 | 6300.0 |
203 | Susan | 6500.0 | 6825.0 |
204 | Hermann | 10000.0 | 10500.0 |
205 | Shelley | 12000.0 | 12600.0 |
206 | William | 8300.0 | 8715.0 |
10.6 Sorting Data with ORDER BY¶
The clause ORDER BY
can be used within a SELECT
statement to sort the returned rows.
The general syntax is:
SELECT
column1, column2, columnN
FROM
table_name
ORDER BY sort_expression [ASC | DESC];
The sort_expression
specifies the sort criteria, whereas ASC
or DESC
indicates whether to sort the result set into ascending (default) or descending order.
The following example uses the clause ORDER BY
to sort employees by first names in alphabetical order.
[17]:
%%sql
SELECT
employee_id, first_name, last_name, hire_date, salary
FROM
employees
ORDER BY first_name;
* sqlite://
Done.
[17]:
employee_id | first_name | last_name | hire_date | salary |
---|---|---|---|---|
121 | Adam | Fripp | 1997-04-10 | 8200.0 |
103 | Alexander | Hunold | 1990-01-03 | 9000.0 |
115 | Alexander | Khoo | 1995-05-18 | 3100.0 |
193 | Britney | Everett | 1997-03-03 | 3900.0 |
104 | Bruce | Ernst | 1991-05-21 | 6000.0 |
179 | Charles | Johnson | 2000-01-04 | 6200.0 |
109 | Daniel | Faviet | 1994-08-16 | 9000.0 |
105 | David | Austin | 1997-06-25 | 4800.0 |
114 | Den | Raphaely | 1994-12-07 | 11000.0 |
107 | Diana | Lorentz | 1999-02-07 | 4200.0 |
118 | Guy | Himuro | 1998-11-15 | 2600.0 |
204 | Hermann | Baer | 1994-06-07 | 10000.0 |
126 | Irene | Mikkilineni | 1998-09-28 | 2700.0 |
111 | Ismael | Sciarra | 1997-09-30 | 7700.0 |
177 | Jack | Livingston | 1998-04-23 | 8400.0 |
200 | Jennifer | Whalen | 1987-09-17 | 4400.0 |
110 | John | Chen | 1997-09-28 | 8200.0 |
145 | John | Russell | 1996-10-01 | 14000.0 |
176 | Jonathon | Taylor | 1998-03-24 | 8600.0 |
112 | Jose Manuel | Urman | 1998-03-07 | 7800.0 |
119 | Karen | Colmenares | 1999-08-10 | 2500.0 |
146 | Karen | Partners | 1997-01-05 | 13500.0 |
178 | Kimberely | Grant | 1999-05-24 | 7000.0 |
102 | Lex | De Haan | 1993-01-13 | 17000.0 |
113 | Luis | Popp | 1999-12-07 | 6900.0 |
120 | Matthew | Weiss | 1996-07-18 | 8000.0 |
201 | Michael | Hartstein | 1996-02-17 | 13000.0 |
108 | Nancy | Greenberg | 1994-08-17 | 12000.0 |
101 | Neena | Kochhar | 1989-09-21 | 17000.0 |
202 | Pat | Fay | 1997-08-17 | 6000.0 |
122 | Payam | Kaufling | 1995-05-01 | 7900.0 |
192 | Sarah | Bell | 1996-02-04 | 4000.0 |
123 | Shanta | Vollman | 1997-10-10 | 6500.0 |
205 | Shelley | Higgins | 1994-06-07 | 12000.0 |
116 | Shelli | Baida | 1997-12-24 | 2900.0 |
117 | Sigal | Tobias | 1997-07-24 | 2800.0 |
100 | Steven | King | 1987-06-17 | 24000.0 |
203 | Susan | Mavris | 1994-06-07 | 6500.0 |
106 | Valli | Pataballa | 1998-02-05 | 4800.0 |
206 | William | Gietz | 1994-06-07 | 8300.0 |
The ORDER BY
clause also allows using multiple expressions for sorting, separated by commas. In the following example ORDER BY
is used to sort the employees by their first name in ascending order, and the employees who have the same first name are further sorted by their last name in descending order. E.g., check the sorting for the two employees with the name Alexander.
[18]:
%%sql
SELECT
employee_id, first_name, last_name, hire_date, salary
FROM
employees
ORDER BY first_name, last_name DESC;
* sqlite://
Done.
[18]:
employee_id | first_name | last_name | hire_date | salary |
---|---|---|---|---|
121 | Adam | Fripp | 1997-04-10 | 8200.0 |
115 | Alexander | Khoo | 1995-05-18 | 3100.0 |
103 | Alexander | Hunold | 1990-01-03 | 9000.0 |
193 | Britney | Everett | 1997-03-03 | 3900.0 |
104 | Bruce | Ernst | 1991-05-21 | 6000.0 |
179 | Charles | Johnson | 2000-01-04 | 6200.0 |
109 | Daniel | Faviet | 1994-08-16 | 9000.0 |
105 | David | Austin | 1997-06-25 | 4800.0 |
114 | Den | Raphaely | 1994-12-07 | 11000.0 |
107 | Diana | Lorentz | 1999-02-07 | 4200.0 |
118 | Guy | Himuro | 1998-11-15 | 2600.0 |
204 | Hermann | Baer | 1994-06-07 | 10000.0 |
126 | Irene | Mikkilineni | 1998-09-28 | 2700.0 |
111 | Ismael | Sciarra | 1997-09-30 | 7700.0 |
177 | Jack | Livingston | 1998-04-23 | 8400.0 |
200 | Jennifer | Whalen | 1987-09-17 | 4400.0 |
145 | John | Russell | 1996-10-01 | 14000.0 |
110 | John | Chen | 1997-09-28 | 8200.0 |
176 | Jonathon | Taylor | 1998-03-24 | 8600.0 |
112 | Jose Manuel | Urman | 1998-03-07 | 7800.0 |
146 | Karen | Partners | 1997-01-05 | 13500.0 |
119 | Karen | Colmenares | 1999-08-10 | 2500.0 |
178 | Kimberely | Grant | 1999-05-24 | 7000.0 |
102 | Lex | De Haan | 1993-01-13 | 17000.0 |
113 | Luis | Popp | 1999-12-07 | 6900.0 |
120 | Matthew | Weiss | 1996-07-18 | 8000.0 |
201 | Michael | Hartstein | 1996-02-17 | 13000.0 |
108 | Nancy | Greenberg | 1994-08-17 | 12000.0 |
101 | Neena | Kochhar | 1989-09-21 | 17000.0 |
202 | Pat | Fay | 1997-08-17 | 6000.0 |
122 | Payam | Kaufling | 1995-05-01 | 7900.0 |
192 | Sarah | Bell | 1996-02-04 | 4000.0 |
123 | Shanta | Vollman | 1997-10-10 | 6500.0 |
205 | Shelley | Higgins | 1994-06-07 | 12000.0 |
116 | Shelli | Baida | 1997-12-24 | 2900.0 |
117 | Sigal | Tobias | 1997-07-24 | 2800.0 |
100 | Steven | King | 1987-06-17 | 24000.0 |
203 | Susan | Mavris | 1994-06-07 | 6500.0 |
106 | Valli | Pataballa | 1998-02-05 | 4800.0 |
206 | William | Gietz | 1994-06-07 | 8300.0 |
Similarly, we can use ORDER BY
to sort columns with numerical data, or to sort by date as in the following cell.
[19]:
%%sql
SELECT
employee_id, first_name, last_name, hire_date, salary
FROM
employees
ORDER BY hire_date;
* sqlite://
Done.
[19]:
employee_id | first_name | last_name | hire_date | salary |
---|---|---|---|---|
100 | Steven | King | 1987-06-17 | 24000.0 |
200 | Jennifer | Whalen | 1987-09-17 | 4400.0 |
101 | Neena | Kochhar | 1989-09-21 | 17000.0 |
103 | Alexander | Hunold | 1990-01-03 | 9000.0 |
104 | Bruce | Ernst | 1991-05-21 | 6000.0 |
102 | Lex | De Haan | 1993-01-13 | 17000.0 |
203 | Susan | Mavris | 1994-06-07 | 6500.0 |
204 | Hermann | Baer | 1994-06-07 | 10000.0 |
205 | Shelley | Higgins | 1994-06-07 | 12000.0 |
206 | William | Gietz | 1994-06-07 | 8300.0 |
109 | Daniel | Faviet | 1994-08-16 | 9000.0 |
108 | Nancy | Greenberg | 1994-08-17 | 12000.0 |
114 | Den | Raphaely | 1994-12-07 | 11000.0 |
122 | Payam | Kaufling | 1995-05-01 | 7900.0 |
115 | Alexander | Khoo | 1995-05-18 | 3100.0 |
192 | Sarah | Bell | 1996-02-04 | 4000.0 |
201 | Michael | Hartstein | 1996-02-17 | 13000.0 |
120 | Matthew | Weiss | 1996-07-18 | 8000.0 |
145 | John | Russell | 1996-10-01 | 14000.0 |
146 | Karen | Partners | 1997-01-05 | 13500.0 |
193 | Britney | Everett | 1997-03-03 | 3900.0 |
121 | Adam | Fripp | 1997-04-10 | 8200.0 |
105 | David | Austin | 1997-06-25 | 4800.0 |
117 | Sigal | Tobias | 1997-07-24 | 2800.0 |
202 | Pat | Fay | 1997-08-17 | 6000.0 |
110 | John | Chen | 1997-09-28 | 8200.0 |
111 | Ismael | Sciarra | 1997-09-30 | 7700.0 |
123 | Shanta | Vollman | 1997-10-10 | 6500.0 |
116 | Shelli | Baida | 1997-12-24 | 2900.0 |
106 | Valli | Pataballa | 1998-02-05 | 4800.0 |
112 | Jose Manuel | Urman | 1998-03-07 | 7800.0 |
176 | Jonathon | Taylor | 1998-03-24 | 8600.0 |
177 | Jack | Livingston | 1998-04-23 | 8400.0 |
126 | Irene | Mikkilineni | 1998-09-28 | 2700.0 |
118 | Guy | Himuro | 1998-11-15 | 2600.0 |
107 | Diana | Lorentz | 1999-02-07 | 4200.0 |
178 | Kimberely | Grant | 1999-05-24 | 7000.0 |
119 | Karen | Colmenares | 1999-08-10 | 2500.0 |
113 | Luis | Popp | 1999-12-07 | 6900.0 |
179 | Charles | Johnson | 2000-01-04 | 6200.0 |
10.7 Filtering Data¶
LIMIT¶
LIMIT
is used to constrain the number of rows returned by a query, similar to the functions head()
and tail()
in Pandas.
[20]:
%%sql
SELECT
employee_id, first_name, last_name, hire_date, salary
FROM
employees
ORDER BY first_name
LIMIT 5;
* sqlite://
Done.
[20]:
employee_id | first_name | last_name | hire_date | salary |
---|---|---|---|---|
121 | Adam | Fripp | 1997-04-10 | 8200.0 |
103 | Alexander | Hunold | 1990-01-03 | 9000.0 |
115 | Alexander | Khoo | 1995-05-18 | 3100.0 |
193 | Britney | Everett | 1997-03-03 | 3900.0 |
104 | Bruce | Ernst | 1991-05-21 | 6000.0 |
It is also possible to include an OFFSET
clause, which will skip rows before retrieving the data. E.g., in the next cell, the first 3 rows are skipped, and rows 4-8 are returned.
[21]:
%%sql
SELECT
employee_id, first_name, last_name, hire_date, salary
FROM
employees
ORDER BY first_name
LIMIT 5
OFFSET 3;
* sqlite://
Done.
[21]:
employee_id | first_name | last_name | hire_date | salary |
---|---|---|---|---|
193 | Britney | Everett | 1997-03-03 | 3900.0 |
104 | Bruce | Ernst | 1991-05-21 | 6000.0 |
179 | Charles | Johnson | 2000-01-04 | 6200.0 |
109 | Daniel | Faviet | 1994-08-16 | 9000.0 |
105 | David | Austin | 1997-06-25 | 4800.0 |
DISTINCT¶
The DISTINCT
clause allows to remove duplicate rows from the result set.
E.g., the next cell shows the first 15 rows of the salary columns, where some rows have the same value. In the cell afterward, DISTINCT
is used to remove the rows with the same values for salary.
[22]:
%%sql
SELECT
salary
FROM
employees
ORDER BY salary DESC
LIMIT 15;
* sqlite://
Done.
[22]:
salary |
---|
24000.0 |
17000.0 |
17000.0 |
14000.0 |
13500.0 |
13000.0 |
12000.0 |
12000.0 |
11000.0 |
10000.0 |
9000.0 |
9000.0 |
8600.0 |
8400.0 |
8300.0 |
[23]:
%%sql
SELECT DISTINCT
salary
FROM
employees
ORDER BY salary DESC
LIMIT 15;
* sqlite://
Done.
[23]:
salary |
---|
24000.0 |
17000.0 |
14000.0 |
13500.0 |
13000.0 |
12000.0 |
11000.0 |
10000.0 |
9000.0 |
8600.0 |
8400.0 |
8300.0 |
8200.0 |
8000.0 |
7900.0 |
WHERE¶
The WHERE
clause filters data based on specified conditions. For instance, return only the employees that have a salary greater than a certain value.
[24]:
%%sql
SELECT DISTINCT
employee_id, first_name, last_name,salary
FROM
employees
WHERE salary > 9000
ORDER BY salary DESC;
* sqlite://
Done.
[24]:
employee_id | first_name | last_name | salary |
---|---|---|---|
100 | Steven | King | 24000.0 |
101 | Neena | Kochhar | 17000.0 |
102 | Lex | De Haan | 17000.0 |
145 | John | Russell | 14000.0 |
146 | Karen | Partners | 13500.0 |
201 | Michael | Hartstein | 13000.0 |
108 | Nancy | Greenberg | 12000.0 |
205 | Shelley | Higgins | 12000.0 |
114 | Den | Raphaely | 11000.0 |
204 | Hermann | Baer | 10000.0 |
Or, return the employees who work in the department 5.
[25]:
%%sql
SELECT DISTINCT
employee_id, first_name, last_name, salary, department_id
FROM
employees
WHERE department_id = 5
ORDER BY first_name;
* sqlite://
Done.
[25]:
employee_id | first_name | last_name | salary | department_id |
---|---|---|---|---|
121 | Adam | Fripp | 8200.0 | 5 |
193 | Britney | Everett | 3900.0 | 5 |
126 | Irene | Mikkilineni | 2700.0 | 5 |
120 | Matthew | Weiss | 8000.0 | 5 |
122 | Payam | Kaufling | 7900.0 | 5 |
192 | Sarah | Bell | 4000.0 | 5 |
123 | Shanta | Vollman | 6500.0 | 5 |
Comparison Operators¶
To specify a condition, we can use the standard comparison operators, such as >
, <
, >=
, <=
, =
, and note that <>
can be used for ‘not equal to’.
[26]:
%%sql
SELECT DISTINCT
employee_id, first_name, last_name, salary, department_id
FROM
employees
WHERE department_id <> 5
ORDER BY first_name;
* sqlite://
Done.
[26]:
employee_id | first_name | last_name | salary | department_id |
---|---|---|---|---|
103 | Alexander | Hunold | 9000.0 | 6 |
115 | Alexander | Khoo | 3100.0 | 3 |
104 | Bruce | Ernst | 6000.0 | 6 |
179 | Charles | Johnson | 6200.0 | 8 |
109 | Daniel | Faviet | 9000.0 | 10 |
105 | David | Austin | 4800.0 | 6 |
114 | Den | Raphaely | 11000.0 | 3 |
107 | Diana | Lorentz | 4200.0 | 6 |
118 | Guy | Himuro | 2600.0 | 3 |
204 | Hermann | Baer | 10000.0 | 7 |
111 | Ismael | Sciarra | 7700.0 | 10 |
177 | Jack | Livingston | 8400.0 | 8 |
200 | Jennifer | Whalen | 4400.0 | 1 |
110 | John | Chen | 8200.0 | 10 |
145 | John | Russell | 14000.0 | 8 |
176 | Jonathon | Taylor | 8600.0 | 8 |
112 | Jose Manuel | Urman | 7800.0 | 10 |
119 | Karen | Colmenares | 2500.0 | 3 |
146 | Karen | Partners | 13500.0 | 8 |
178 | Kimberely | Grant | 7000.0 | 8 |
102 | Lex | De Haan | 17000.0 | 9 |
113 | Luis | Popp | 6900.0 | 10 |
201 | Michael | Hartstein | 13000.0 | 2 |
108 | Nancy | Greenberg | 12000.0 | 10 |
101 | Neena | Kochhar | 17000.0 | 9 |
202 | Pat | Fay | 6000.0 | 2 |
205 | Shelley | Higgins | 12000.0 | 11 |
116 | Shelli | Baida | 2900.0 | 3 |
117 | Sigal | Tobias | 2800.0 | 3 |
100 | Steven | King | 24000.0 | 9 |
203 | Susan | Mavris | 6500.0 | 4 |
106 | Valli | Pataballa | 4800.0 | 6 |
206 | William | Gietz | 8300.0 | 11 |
Logical Operators¶
We can also use logical operators to combine multiple conditions in the WHERE
clause of an SQL statement. The following table contains the SQL logical operators.
Figure source: Reference [1].
[27]:
%%sql
SELECT
first_name, last_name, salary
FROM
employees
WHERE salary > 5000 AND salary < 7000
ORDER BY salary;
* sqlite://
Done.
[27]:
first_name | last_name | salary |
---|---|---|
Bruce | Ernst | 6000.0 |
Pat | Fay | 6000.0 |
Charles | Johnson | 6200.0 |
Shanta | Vollman | 6500.0 |
Susan | Mavris | 6500.0 |
Luis | Popp | 6900.0 |
[28]:
%%sql
SELECT
first_name, last_name, salary
FROM
employees
WHERE salary BETWEEN 5000 AND 7000
ORDER BY salary;
* sqlite://
Done.
[28]:
first_name | last_name | salary |
---|---|---|
Bruce | Ernst | 6000.0 |
Pat | Fay | 6000.0 |
Charles | Johnson | 6200.0 |
Shanta | Vollman | 6500.0 |
Susan | Mavris | 6500.0 |
Luis | Popp | 6900.0 |
Kimberely | Grant | 7000.0 |
[29]:
%%sql
SELECT
first_name, last_name, salary
FROM
employees
WHERE salary = 6000 OR salary = 7000
ORDER BY salary;
* sqlite://
Done.
[29]:
first_name | last_name | salary |
---|---|---|
Bruce | Ernst | 6000.0 |
Pat | Fay | 6000.0 |
Kimberely | Grant | 7000.0 |
Select employees with first names that start with jo
.
[30]:
%%sql
SELECT
employee_id, first_name, last_name
FROM
employees
WHERE first_name LIKE 'jo%'
ORDER BY first_name;
* sqlite://
Done.
[30]:
employee_id | first_name | last_name |
---|---|---|
110 | John | Chen |
145 | John | Russell |
176 | Jonathon | Taylor |
112 | Jose Manuel | Urman |
Select employees with first names whose second character is h
.
[31]:
%%sql
SELECT
employee_id, first_name, last_name
FROM
employees
WHERE
first_name LIKE '_h%'
ORDER BY first_name;
* sqlite://
Done.
[31]:
employee_id | first_name | last_name |
---|---|---|
179 | Charles | Johnson |
123 | Shanta | Vollman |
205 | Shelley | Higgins |
116 | Shelli | Baida |
[32]:
%%sql
SELECT
first_name, last_name, department_id
FROM
employees
WHERE department_id IN (8, 9)
ORDER BY department_id;
* sqlite://
Done.
[32]:
first_name | last_name | department_id |
---|---|---|
John | Russell | 8 |
Karen | Partners | 8 |
Jonathon | Taylor | 8 |
Jack | Livingston | 8 |
Kimberely | Grant | 8 |
Charles | Johnson | 8 |
Steven | King | 9 |
Neena | Kochhar | 9 |
Lex | De Haan | 9 |
IS NULL Operator¶
To determine whether a row or column has missing or non-defined values, we can use the IS NULL
operator.
For instance, to find all employees who do not have phone numbers, we use the IS NUL
operator as follows.
[33]:
%%sql
SELECT
first_name, last_name, phone_number
FROM
employees
WHERE phone_number IS NULL
ORDER BY first_name , last_name;
* sqlite://
Done.
[33]:
first_name | last_name | phone_number |
---|---|---|
Charles | Johnson | None |
Jack | Livingston | None |
John | Russell | None |
Jonathon | Taylor | None |
Karen | Partners | None |
Kimberely | Grant | None |
10.8 Conditional Expressions¶
The CASE
expression is used to add if-then-else logic to SQL statements, which allows to evaluate a list of conditions and returns one of the possible results.
In the next cell, the CASE
expression returns the results Low
, Average
, or High
based on the conditions regarding the salary. The results are collected in the evaluation
column.
[34]:
%%sql
SELECT
first_name, last_name,
CASE WHEN salary < 3000 THEN 'Low'
WHEN salary >= 3000 AND salary <= 5000 THEN 'Average'
WHEN salary > 5000 THEN 'High'
END evaluation
FROM
employees
LIMIT 15;
* sqlite://
Done.
[34]:
first_name | last_name | evaluation |
---|---|---|
Steven | King | High |
Neena | Kochhar | High |
Lex | De Haan | High |
Alexander | Hunold | High |
Bruce | Ernst | High |
David | Austin | Average |
Valli | Pataballa | Average |
Diana | Lorentz | Average |
Nancy | Greenberg | High |
Daniel | Faviet | High |
John | Chen | High |
Ismael | Sciarra | High |
Jose Manuel | Urman | High |
Luis | Popp | High |
Den | Raphaely | High |
10.9 Joining Multiple Tables¶
SQL provides several ways to join tables, such as INNER JOIN
, LEFT JOIN
, RIGHT JOIN
, FULL JOIN
, CROSS JOIN
, and others.
INNER JOIN¶
INNER JOIN
combines rows from two or more tables based on a related column between them. It retrieves only the rows that have matching values in both tables.
The syntax is:
SELECT
column1, column2, ...
FROM
table1
INNER JOIN table2 ON table1.columnX = table2.columnX;
It specifies to join table1
and table2
by matching values in columnX
from table1
and columnX
from table2
, adn returns column1, column2, ...
from the joined tables.
Let’s show how we can use INNER JOIN
, where for instance, we want to retrieve the list of employees who work in departments 1, 2, and 3, and we would like to list the names of the departments.
To do that, first we can notice that both the employees
table and the departments
table have a column department_id
. Therefore, we can use the department_id
column in the employees
table as the foreign key column to link the employees
to the departments
table.
Let’s first display the employees who work in departments 1, 2, and 3.
[35]:
%%sql
SELECT
first_name, last_name, department_id
FROM
employees
WHERE department_id IN (1, 2, 3)
ORDER BY department_id;
* sqlite://
Done.
[35]:
first_name | last_name | department_id |
---|---|---|
Jennifer | Whalen | 1 |
Michael | Hartstein | 2 |
Pat | Fay | 2 |
Den | Raphaely | 3 |
Alexander | Khoo | 3 |
Shelli | Baida | 3 |
Sigal | Tobias | 3 |
Guy | Himuro | 3 |
Karen | Colmenares | 3 |
We can also find the names of the departments that have a department_id
of 1, 2, and 3.
[36]:
%%sql
SELECT
department_id, department_name
FROM
departments
WHERE department_id IN (1, 2, 3);
* sqlite://
Done.
[36]:
department_id | department_name |
---|---|
1 | Administration |
2 | Marketing |
3 | Purchasing |
Next, we join the employees
and departments
tables, and use INNER JOIN
to match the department_id
column in the two tables. For each row, if the condition departments.department_id = employees.department_id
is satisfied, the combined row will include data from rows in both employees
and departments
tables in the result set.
[37]:
%%sql
SELECT
first_name, last_name, employees.department_id, departments.department_id, department_name
FROM
employees
INNER JOIN departments ON departments.department_id = employees.department_id
WHERE employees.department_id IN (1, 2, 3);
* sqlite://
Done.
[37]:
first_name | last_name | department_id | department_id_1 | department_name |
---|---|---|---|---|
Jennifer | Whalen | 1 | 1 | Administration |
Michael | Hartstein | 2 | 2 | Marketing |
Pat | Fay | 2 | 2 | Marketing |
Den | Raphaely | 3 | 3 | Purchasing |
Alexander | Khoo | 3 | 3 | Purchasing |
Shelli | Baida | 3 | 3 | Purchasing |
Sigal | Tobias | 3 | 3 | Purchasing |
Guy | Himuro | 3 | 3 | Purchasing |
Karen | Colmenares | 3 | 3 | Purchasing |
Let’s look at one more example, where we used INNER JOIN
to join three tables. Specifically, in the next cell we use one more INNER JOIN
clause to join the tables employees
and jobs
using the jobs_id
column, in order to retrieve the information about the job_title
column from the jobs
table.
[38]:
%%sql
SELECT
first_name, last_name, job_title, department_name
FROM
employees
INNER JOIN departments ON departments.department_id = employees.department_id
INNER JOIN jobs ON jobs.job_id = employees.job_id
WHERE employees.department_id IN (1, 2, 3);
* sqlite://
Done.
[38]:
first_name | last_name | job_title | department_name |
---|---|---|---|
Den | Raphaely | Purchasing Manager | Purchasing |
Alexander | Khoo | Purchasing Clerk | Purchasing |
Shelli | Baida | Purchasing Clerk | Purchasing |
Sigal | Tobias | Purchasing Clerk | Purchasing |
Guy | Himuro | Purchasing Clerk | Purchasing |
Karen | Colmenares | Purchasing Clerk | Purchasing |
Jennifer | Whalen | Administration Assistant | Administration |
Michael | Hartstein | Marketing Manager | Marketing |
Pat | Fay | Marketing Representative | Marketing |
LEFT JOIN¶
LEFT JOIN
(also referred to as LEFT OUTER JOIN
) is similar to INNER JOIN
, only it retrieves all rows from the left table in the JOIN
clause and the matched rows from the right table.
The “left table” is the table that appears on the left side of the JOIN
clause. It is considered the primary table, and its rows are the ones that we want to retrieve or work with primarily.
The “right table” is the table that appears on the right side of the JOIN
clause. It is the secondary table, and we are typically interested in its data to complement or match with the data from the left table.
For instance, the countries
table is shown below, with country_id
column being the primary key. The following query returns columns for the countries US, UK, and China.
[39]:
%%sql
SELECT
country_id,
country_name
FROM
countries
WHERE country_id IN ('US', 'UK', 'CN');
* sqlite://
Done.
[39]:
country_id | country_name |
---|---|
CN | China |
UK | United Kingdom |
US | United States of America |
Similarly, the locations
table shown below has a country_id
column as the foreign key. The query returns the locations located in the US, UK, and China.
[40]:
%%sql
SELECT
country_id,
street_address,
city
FROM
locations
WHERE country_id IN ('US', 'UK', 'CN');
* sqlite://
Done.
[40]:
country_id | street_address | city |
---|---|---|
US | 2014 Jabberwocky Rd | Southlake |
US | 2011 Interiors Blvd | South San Francisco |
US | 2004 Charade Rd | Seattle |
UK | 8204 Arthur St | London |
UK | Magdalen Centre, The Oxford Science Park | Oxford |
In the next cell, because we use the LEFT JOIN
clause, all rows that satisfy the condition in the WHERE
clause of the countries
table are included in the result set.
For each row in the countries
table, the LEFT JOIN
clause finds the matching rows in the locations
table. If at least one matching row is found, the database engine combines the data from columns of the matching rows in both tables.
In this case, there is no matching row found for the country with the country_id
of CN, and therefore, the row in the countries
table is included in the result set and the row in the locations
table is filled with None
values.
[41]:
%%sql
SELECT
countries.country_name,
countries.country_id,
locations.country_id,
locations.street_address,
locations.city
FROM
countries
LEFT JOIN locations ON locations.country_id = countries.country_id
WHERE countries.country_id IN ('US', 'UK', 'CN')
* sqlite://
Done.
[41]:
country_name | country_id | country_id_1 | street_address | city |
---|---|---|---|---|
China | CN | None | None | None |
United Kingdom | UK | UK | 8204 Arthur St | London |
United Kingdom | UK | UK | Magdalen Centre, The Oxford Science Park | Oxford |
United States of America | US | US | 2014 Jabberwocky Rd | Southlake |
United States of America | US | US | 2011 Interiors Blvd | South San Francisco |
United States of America | US | US | 2004 Charade Rd | Seattle |
LEFT JOIN
is useful when we want to retrieve data from a primary table and include related information from a secondary table, but we don’t want to exclude records from the primary table if there are no matching rows in the secondary table.
And, because non-matching rows in the locations
table are filled with the None
values, we can apply the LEFT JOIN
clause to find mismatched rows between the tables.
For example, to find the country that does not have any locations in the locations
table, we use the following query.
[42]:
%%sql
SELECT
country_name
FROM
countries
LEFT JOIN locations ON locations.country_id = countries.country_id
WHERE locations.location_id IS NULL
ORDER BY country_name;
* sqlite://
Done.
[42]:
country_name |
---|
Argentina |
Australia |
Belgium |
Brazil |
China |
Denmark |
Egypt |
France |
HongKong |
India |
Israel |
Italy |
Japan |
Kuwait |
Mexico |
Netherlands |
Nigeria |
Singapore |
Switzerland |
Zambia |
Zimbabwe |
Compare the previous cell to the next cell.
[43]:
%%sql
SELECT
countries.country_name,
countries.country_id,
locations.country_id,
locations.street_address,
locations.city
FROM
countries
LEFT JOIN locations ON locations.country_id = countries.country_id
* sqlite://
Done.
[43]:
country_name | country_id | country_id_1 | street_address | city |
---|---|---|---|---|
Argentina | AR | None | None | None |
Australia | AU | None | None | None |
Belgium | BE | None | None | None |
Brazil | BR | None | None | None |
Canada | CA | CA | 147 Spadina Ave | Toronto |
Switzerland | CH | None | None | None |
China | CN | None | None | None |
Germany | DE | DE | Schwanthalerstr. 7031 | Munich |
Denmark | DK | None | None | None |
Egypt | EG | None | None | None |
France | FR | None | None | None |
HongKong | HK | None | None | None |
Israel | IL | None | None | None |
India | IN | None | None | None |
Italy | IT | None | None | None |
Japan | JP | None | None | None |
Kuwait | KW | None | None | None |
Mexico | MX | None | None | None |
Nigeria | NG | None | None | None |
Netherlands | NL | None | None | None |
Singapore | SG | None | None | None |
United Kingdom | UK | UK | 8204 Arthur St | London |
United Kingdom | UK | UK | Magdalen Centre, The Oxford Science Park | Oxford |
United States of America | US | US | 2004 Charade Rd | Seattle |
United States of America | US | US | 2011 Interiors Blvd | South San Francisco |
United States of America | US | US | 2014 Jabberwocky Rd | Southlake |
Zambia | ZM | None | None | None |
Zimbabwe | ZW | None | None | None |
RIGHT JOIN¶
By analogy, RIGHT JOIN
(or RIGHT OUTER JOIN
) returns all rows from the right table and the matched rows from the left table. If there are no matches in the left table, the result will contain None values for those columns from the left table.
[44]:
%%sql
SELECT
countries.country_name,
countries.country_id,
locations.country_id,
locations.street_address,
locations.city
FROM
countries
RIGHT JOIN locations ON locations.country_id = countries.country_id
WHERE countries.country_id IN ('US', 'UK', 'CN', 'CA', 'DE', 'IT')
* sqlite://
Done.
[44]:
country_name | country_id | country_id_1 | street_address | city |
---|---|---|---|---|
Canada | CA | CA | 147 Spadina Ave | Toronto |
Germany | DE | DE | Schwanthalerstr. 7031 | Munich |
United Kingdom | UK | UK | 8204 Arthur St | London |
United Kingdom | UK | UK | Magdalen Centre, The Oxford Science Park | Oxford |
United States of America | US | US | 2014 Jabberwocky Rd | Southlake |
United States of America | US | US | 2011 Interiors Blvd | South San Francisco |
United States of America | US | US | 2004 Charade Rd | Seattle |
FULL JOIN
(also known as a FULL OUTER JOIN
) combines the result sets of both a LEFT JOIN
and a RIGHT JOIN
, it returns all rows from both tables and includes rows from the left table that have no match in the right table (with None values for right table columns) and rows from the right table that have no match in the left table (with None values for left table columns).
CROSS JOIN
(also known as a CARTESIAN JOIN
), combines all rows from the first table with all rows from the second table, and generates all possible combinations of data.
10.10 Return Data Statistics¶
Aggregate functions in SQL take a list of values and return a single value, such as the average AVG()
, MIN()
, MAX()
, SUM()
, or COUNT()
.
[45]:
%%sql
SELECT
AVG(salary)
FROM
employees;
* sqlite://
Done.
[45]:
AVG(salary) |
---|
8060.0 |
[46]:
%%sql
SELECT
MAX(salary)
FROM
employees;
* sqlite://
Done.
[46]:
MAX(salary) |
---|
24000.0 |
[47]:
%%sql
SELECT
SUM(salary)
FROM
employees
WHERE department_id = 5;
* sqlite://
Done.
[47]:
SUM(salary) |
---|
41200.0 |
COUNT
returns the number of rows in a table. It can be used by providing the name of a column, or it can also be used with an asterisk *
as in the following cell.
[48]:
%%sql
SELECT
COUNT(employee_id)
FROM
employees;
* sqlite://
Done.
[48]:
COUNT(employee_id) |
---|
40 |
[49]:
%%sql
SELECT
COUNT(*)
FROM
employees;
* sqlite://
Done.
[49]:
COUNT(*) |
---|
40 |
10.11 Grouping Data¶
GROUP BY
allows to group rows based on values from more than one column. It is typically combined with aggregate functions, like COUNT
, SUM
, AVG
, MIN
, and MAX
.
For instance, let’s assume that we would like to retrieve information about the average salary in each department. In the next cell we will first display the salaries of all employees per department, and in the following cell we can see how GROUP BY
is used to calculate the average salary for each department.
[50]:
%%sql
SELECT
department_id, salary
FROM
employees
* sqlite://
Done.
[50]:
department_id | salary |
---|---|
9 | 24000.0 |
9 | 17000.0 |
9 | 17000.0 |
6 | 9000.0 |
6 | 6000.0 |
6 | 4800.0 |
6 | 4800.0 |
6 | 4200.0 |
10 | 12000.0 |
10 | 9000.0 |
10 | 8200.0 |
10 | 7700.0 |
10 | 7800.0 |
10 | 6900.0 |
3 | 11000.0 |
3 | 3100.0 |
3 | 2900.0 |
3 | 2800.0 |
3 | 2600.0 |
3 | 2500.0 |
5 | 8000.0 |
5 | 8200.0 |
5 | 7900.0 |
5 | 6500.0 |
5 | 2700.0 |
8 | 14000.0 |
8 | 13500.0 |
8 | 8600.0 |
8 | 8400.0 |
8 | 7000.0 |
8 | 6200.0 |
5 | 4000.0 |
5 | 3900.0 |
1 | 4400.0 |
2 | 13000.0 |
2 | 6000.0 |
4 | 6500.0 |
7 | 10000.0 |
11 | 12000.0 |
11 | 8300.0 |
[51]:
%%sql
SELECT
department_id, AVG(salary)
FROM
employees
GROUP BY department_id;
* sqlite://
Done.
[51]:
department_id | AVG(salary) |
---|---|
1 | 4400.0 |
2 | 9500.0 |
3 | 4150.0 |
4 | 6500.0 |
5 | 5885.714285714285 |
6 | 5760.0 |
7 | 10000.0 |
8 | 9616.666666666666 |
9 | 19333.333333333332 |
10 | 8600.0 |
11 | 10150.0 |
Similarly, the following cell displays the minimum, maximum, and average salary for each department. And, instead of using the department_id
column, it will be more convenient to display the department names. Since the employees
table does not have a column with the department names, we will use INNER JOIN
to retrieve the department_name
column from the departments
table.
[52]:
%%sql
SELECT
department_name, MIN(salary) AS min_salary, MAX(salary) AS max_salary,
ROUND(AVG(salary)) AS average_salary
FROM
employees
INNER JOIN departments ON departments.department_id = employees.department_id
GROUP BY department_name;
* sqlite://
Done.
[52]:
department_name | min_salary | max_salary | average_salary |
---|---|---|---|
Accounting | 8300.0 | 12000.0 | 10150.0 |
Administration | 4400.0 | 4400.0 | 4400.0 |
Executive | 17000.0 | 24000.0 | 19333.0 |
Finance | 6900.0 | 12000.0 | 8600.0 |
Human Resources | 6500.0 | 6500.0 | 6500.0 |
IT | 4200.0 | 9000.0 | 5760.0 |
Marketing | 6000.0 | 13000.0 | 9500.0 |
Public Relations | 10000.0 | 10000.0 | 10000.0 |
Purchasing | 2500.0 | 11000.0 | 4150.0 |
Sales | 6200.0 | 14000.0 | 9617.0 |
Shipping | 2700.0 | 8200.0 | 5886.0 |
10.12 Modifying Data¶
INSERT¶
INSERT
is used to insert one or more rows into a table, and we already used it in subsection 10.3 when we created new tables. The general syntax is:
INSERT INTO table (column1, column2,...)
VALUES (value1, value2, ...);
It is important to ensure that the number of values matches the number of columns, and that the value type corresponds to the data type for that column.
It is also possible to insert value without specifying the columns, as in:
INSERT INTO table
VALUES (value1, value2, ...);
For instance, to insert a new row into the dependents
table, we can use the following code. The new row will be added to the bottom of the table dependents
.
[53]:
%%sql
INSERT INTO
dependents (first_name, last_name, relationship, employee_id)
VALUES ('Dustin', 'Johnson', 'Child', 178);
* sqlite://
1 rows affected.
[53]:
[]
[54]:
# verify the changes
%sql SELECT * from dependents
* sqlite://
Done.
[54]:
dependent_id | first_name | last_name | relationship | employee_id |
---|---|---|---|---|
1 | Penelope | Gietz | Child | 206 |
2 | Nick | Higgins | Child | 205 |
3 | Ed | Whalen | Child | 200 |
4 | Jennifer | King | Child | 100 |
5 | Johnny | Kochhar | Child | 101 |
6 | Bette | De Haan | Child | 102 |
7 | Grace | Faviet | Child | 109 |
8 | Matthew | Chen | Child | 110 |
9 | Joe | Sciarra | Child | 111 |
10 | Christian | Urman | Child | 112 |
11 | Zero | Popp | Child | 113 |
12 | Karl | Greenberg | Child | 108 |
13 | Uma | Mavris | Child | 203 |
14 | Vivien | Hunold | Child | 103 |
15 | Cuba | Ernst | Child | 104 |
16 | Fred | Austin | Child | 105 |
17 | Helen | Pataballa | Child | 106 |
18 | Dan | Lorentz | Child | 107 |
19 | Bob | Hartstein | Child | 201 |
20 | Lucille | Fay | Child | 202 |
21 | Kirsten | Baer | Child | 204 |
22 | Elvis | Khoo | Child | 115 |
23 | Sandra | Baida | Child | 116 |
24 | Cameron | Tobias | Child | 117 |
25 | Kevin | Himuro | Child | 118 |
26 | Rip | Colmenares | Child | 119 |
27 | Julia | Raphaely | Child | 114 |
28 | Woody | Russell | Child | 145 |
29 | Alec | Partners | Child | 146 |
30 | Sandra | Taylor | Child | 176 |
31 | Dustin | Johnson | Child | 178 |
UPDATE¶
UPDATE
is used to update existing data in a table.
For instance, to update the last name of the employee with id=192
, we can write:
[55]:
%%sql
UPDATE employees
SET last_name = 'Lopez'
WHERE employee_id = 192;
* sqlite://
1 rows affected.
[55]:
[]
And we can verify the changes.
[56]:
%%sql
SELECT
employee_id, first_name, last_name
FROM
employees
WHERE employee_id = 192;
* sqlite://
Done.
[56]:
employee_id | first_name | last_name |
---|---|---|
192 | Sarah | Lopez |
DELETE¶
DELETE
is used to delete one or more rows from a table permanently.
[57]:
%%sql
DELETE FROM dependents
WHERE dependent_id = 16;
* sqlite://
1 rows affected.
[57]:
[]
[58]:
# verify the changes
%sql SELECT * FROM dependents
* sqlite://
Done.
[58]:
dependent_id | first_name | last_name | relationship | employee_id |
---|---|---|---|---|
1 | Penelope | Gietz | Child | 206 |
2 | Nick | Higgins | Child | 205 |
3 | Ed | Whalen | Child | 200 |
4 | Jennifer | King | Child | 100 |
5 | Johnny | Kochhar | Child | 101 |
6 | Bette | De Haan | Child | 102 |
7 | Grace | Faviet | Child | 109 |
8 | Matthew | Chen | Child | 110 |
9 | Joe | Sciarra | Child | 111 |
10 | Christian | Urman | Child | 112 |
11 | Zero | Popp | Child | 113 |
12 | Karl | Greenberg | Child | 108 |
13 | Uma | Mavris | Child | 203 |
14 | Vivien | Hunold | Child | 103 |
15 | Cuba | Ernst | Child | 104 |
17 | Helen | Pataballa | Child | 106 |
18 | Dan | Lorentz | Child | 107 |
19 | Bob | Hartstein | Child | 201 |
20 | Lucille | Fay | Child | 202 |
21 | Kirsten | Baer | Child | 204 |
22 | Elvis | Khoo | Child | 115 |
23 | Sandra | Baida | Child | 116 |
24 | Cameron | Tobias | Child | 117 |
25 | Kevin | Himuro | Child | 118 |
26 | Rip | Colmenares | Child | 119 |
27 | Julia | Raphaely | Child | 114 |
28 | Woody | Russell | Child | 145 |
29 | Alec | Partners | Child | 146 |
30 | Sandra | Taylor | Child | 176 |
31 | Dustin | Johnson | Child | 178 |
10.13 Working with Tables¶
In subsection 10.3 we learned how to use the CREATE TABLE
statement. Several other related SQL statements for working with tables are described next.
ALTER TABLE¶
ALTER TABLE
allows to add new columns in an existing table using ADD
, remove columns in a table with DROP
, rename columns with RENAME
, or modify attributes of a column such as primary key, default value, etc. with ADD CONSTRAINT
or DROP CONSTRAINT
and related commands.
Add a Column
[59]:
%%sql
ALTER TABLE cars
ADD mileage INT;
* sqlite://
Done.
[59]:
[]
[60]:
%sql SELECT * FROM cars
* sqlite://
Done.
[60]:
id | name | price | mileage |
---|---|---|---|
1 | Audi | 52642 | None |
2 | Mercedes | 57127 | None |
3 | Skoda | 9000 | None |
4 | Volvo | 29000 | None |
5 | Bentley | 350000 | None |
6 | Citroen | 21000 | None |
7 | Hummer | 41400 | None |
8 | Volkswagen | 21600 | None |
Remove a Column
[61]:
%%sql
ALTER TABLE cars
DROP price;
* sqlite://
Done.
[61]:
[]
[62]:
# verfiy the change
%sql SELECT * FROM cars
* sqlite://
Done.
[62]:
id | name | mileage |
---|---|---|
1 | Audi | None |
2 | Mercedes | None |
3 | Skoda | None |
4 | Volvo | None |
5 | Bentley | None |
6 | Citroen | None |
7 | Hummer | None |
8 | Volkswagen | None |
Rename a Column
[63]:
%%sql
ALTER TABLE cars
RENAME mileage TO miles;
* sqlite://
Done.
[63]:
[]
[64]:
# verfiy the change
%sql SELECT * FROM cars
* sqlite://
Done.
[64]:
id | name | miles |
---|---|---|
1 | Audi | None |
2 | Mercedes | None |
3 | Skoda | None |
4 | Volvo | None |
5 | Bentley | None |
6 | Citroen | None |
7 | Hummer | None |
8 | Volkswagen | None |
Rename a Table
Similarly, ALTER TABLE
can be used to rename a table, as in the following example.
[65]:
%%sql
ALTER TABLE writer
RENAME TO authors;
* sqlite://
Done.
[65]:
[]
[66]:
# verfiy the change
%sql SELECT name FROM sqlite_master WHERE type='table'
* sqlite://
Done.
[66]:
name |
---|
cars |
authors |
regions |
sqlite_sequence |
countries |
locations |
departments |
jobs |
employees |
dependents |
DROP TABLE¶
DROP TABLE
removes a table from a database.
[67]:
%sql DROP TABLE cars;
* sqlite://
Done.
[67]:
[]
[68]:
# verfiy the change
%sql SELECT name FROM sqlite_master WHERE type='table'
* sqlite://
Done.
[68]:
name |
---|
authors |
regions |
sqlite_sequence |
countries |
locations |
departments |
jobs |
employees |
dependents |
10.14 Constraints¶
Constraints enforce restrictions on the data that a column can contain or impose other related restrictions. So far, we described some of the constraints used in SQL databases, such as:
PRIMARY KEY
, defines a primary key for a table.NOT NULL
, ensures that values are inserted in a column.UNIQUE
, ensures that each row in a column has unique values.
FOREIGN KEY¶
Beside the primary key, some tables also define a foreign key. This is a column or a group of columns that enforces a link between the data in two tables. In a foreign key reference, the primary key column of the first table is referenced by the column of the second table, and the column of the second table becomes the foreign key.
For instance, when we created the countries
table (see the inserted code below), we specified that it has country_id
column as a primary key, and region_id
column as a foreign key. Therefore, the region_id
column in countries
will point to the region_id
column in the regions
table. This is indicated in the last line below: FOREIGN KEY (region_id) REFERENCES regions (region_id)
.
CREATE TABLE countries (
country_id TEXT NOT NULL,
country_name TEXT NOT NULL,
region_id INTEGER NOT NULL,
PRIMARY KEY (country_id ASC),
FOREIGN KEY (region_id) REFERENCES regions (region_id) ON DELETE CASCADE ON UPDATE CASCADE);
Note also that in the last line ON DELETE CASCADE
specifies that if a row in the regions
table is deleted, all rows in the countries
table that have a matching region_id
will also be deleted automatically. Analogously, ON UPDATE CASCADE
specifies that if a region_id
value in the regions
table is updated, all corresponding rows in the countries
table will be updated accordingly. This is useful for maintaining referential integrity between the rows in these two
tables.
Let’s inspect the countries
table.
[69]:
%sql SELECT * FROM countries
* sqlite://
Done.
[69]:
country_id | country_name | region_id |
---|---|---|
AR | Argentina | 2 |
AU | Australia | 3 |
BE | Belgium | 1 |
BR | Brazil | 2 |
CA | Canada | 2 |
CH | Switzerland | 1 |
CN | China | 3 |
DE | Germany | 1 |
DK | Denmark | 1 |
EG | Egypt | 4 |
FR | France | 1 |
HK | HongKong | 3 |
IL | Israel | 4 |
IN | India | 3 |
IT | Italy | 1 |
JP | Japan | 3 |
KW | Kuwait | 4 |
MX | Mexico | 2 |
NG | Nigeria | 4 |
NL | Netherlands | 1 |
SG | Singapore | 3 |
UK | United Kingdom | 1 |
US | United States of America | 2 |
ZM | Zambia | 4 |
ZW | Zimbabwe | 4 |
[70]:
%sql SELECT * FROM regions
* sqlite://
Done.
[70]:
region_id | region_name |
---|---|
1 | Europe |
2 | Americas |
3 | Asia |
4 | Middle East and Africa |
As we mentioned in the section on joining tables, we can use the foreign key column in a table to link to other tables. In the above example, the region_id
column establishes a relationship between the countries
table and the regions
table, wherecountries.region_id
must match an existing value in regions.region_id
.
10.15 Subqueries¶
A subquery is a query that is nested inside another query, such as SELECT
.
For instance, the following cell retrieves employees in the departments that have a location_id=1700
.
The query placed inside the parentheses is a subquery. It is also known as an inner query or inner select. The query that contains the subquery is called an outer query or an outer select.
To execute the query, the database system first has to execute the subquery and retrieve the departments with a location_id
of 1700, and afterward it has to execute the outer query.
[71]:
%%sql
SELECT
employee_id, first_name, last_name
FROM
employees
WHERE department_id IN
(SELECT department_id FROM departments
WHERE location_id = 1700)
ORDER BY first_name , last_name;
* sqlite://
Done.
[71]:
employee_id | first_name | last_name |
---|---|---|
115 | Alexander | Khoo |
109 | Daniel | Faviet |
114 | Den | Raphaely |
118 | Guy | Himuro |
111 | Ismael | Sciarra |
200 | Jennifer | Whalen |
110 | John | Chen |
112 | Jose Manuel | Urman |
119 | Karen | Colmenares |
102 | Lex | De Haan |
113 | Luis | Popp |
108 | Nancy | Greenberg |
101 | Neena | Kochhar |
205 | Shelley | Higgins |
116 | Shelli | Baida |
117 | Sigal | Tobias |
100 | Steven | King |
206 | William | Gietz |
The next code finds the employee who has the highest salary.
[72]:
%%sql
SELECT
employee_id, first_name, last_name, salary
FROM
employees
WHERE salary =
(SELECT MAX(salary) FROM employees)
ORDER BY first_name , last_name;
* sqlite://
Done.
[72]:
employee_id | first_name | last_name | salary |
---|---|---|---|
100 | Steven | King | 24000.0 |
E.g., find all employees who salaries are greater than the average salary of all employees.
[73]:
%%sql
SELECT
employee_id, first_name, last_name, salary
FROM
employees
WHERE
salary >
(SELECT AVG(salary) FROM employees);
* sqlite://
Done.
[73]:
employee_id | first_name | last_name | salary |
---|---|---|---|
100 | Steven | King | 24000.0 |
101 | Neena | Kochhar | 17000.0 |
102 | Lex | De Haan | 17000.0 |
103 | Alexander | Hunold | 9000.0 |
108 | Nancy | Greenberg | 12000.0 |
109 | Daniel | Faviet | 9000.0 |
110 | John | Chen | 8200.0 |
114 | Den | Raphaely | 11000.0 |
121 | Adam | Fripp | 8200.0 |
145 | John | Russell | 14000.0 |
146 | Karen | Partners | 13500.0 |
176 | Jonathon | Taylor | 8600.0 |
177 | Jack | Livingston | 8400.0 |
201 | Michael | Hartstein | 13000.0 |
204 | Hermann | Baer | 10000.0 |
205 | Shelley | Higgins | 12000.0 |
206 | William | Gietz | 8300.0 |
10.16 Connect to an Existing Database¶
Recall again that when we created new tables in section 10.3 or a new database in section 10.4, we used sqlite://
to connect to the tables or to the database. This syntax is used to specify an in-memory table or database, that is, it indicates that we want to work with tables or databases that exist only in the memory of our local machine. This syntax uses two forward slashes //
after sqlite:
.
To connect to an existing database that is stored in a local directory of our computer, we will use the syntax sqlite:///relative_database_path
, where relative_database_path
specifies the relative path to the local directory where the SQL database is stored. This syntax uses three forward slashes ///
after sqlite:
.
Similarly, we can connect to an existing database that is stored in a directory of our computer by using the absolute path to the database with the syntax sqlite:////absolute_database_path
. This syntax uses four forward slashes ////
after sqlite:
.
In the following cell, we used sqlite:///data/EssentialSQL.db
to initialize the connection to the database EssentialSQL.db
which is located in the data
subdirectory in the current working directory.
[74]:
%sql sqlite:///data/EssentialSQL.db
Note again that there are three forward slashes after sqlite:
, since we provided a relative path to the database.
We can inspect the tables in the database EssentialSQL.db
in the following cell.
[75]:
%sql SELECT name FROM sqlite_master WHERE type='table'
sqlite://
* sqlite:///data/EssentialSQL.db
Done.
[75]:
name |
---|
Customers |
Shippers |
Employees |
Orders |
OrderDetails |
And we can inspect the data in the table Customers
.
[76]:
%sql SELECT * FROM Customers
sqlite://
* sqlite:///data/EssentialSQL.db
Done.
[76]:
CustomerID | CompanyName | ContactName | ContactTitle | Address | City | State |
---|---|---|---|---|---|---|
1 | Deerfield Tile | Dick Terrcotta | Owner | 450 Village Street | Deerfield | IL |
2 | Sagebrush Carpet | Barbara Berber | Director of Installations | 10 Industrial Drive | El Paso | TX |
3 | Floor Co. | Jim Wood | Installer | 34218 Private Lane | Monclair | NJ |
4 | Main Tile and Bath | Toni Faucet | Owner | Suite 23, Henry Building | Orlando | FL |
5 | Slots Carpet | Jack Diamond III | Purchaser | 3024 Jackpot Drive | Las Vegas | NV |
If the database does not exist, SQLite will create a new database with the provided name in the home directory. The following cell will create a new database named test.db
in the current working directory.
[77]:
%sql sqlite:///test.db
References¶
SQL Tutorial, available at https://www.sqltutorial.org/.
Practice SQL with SQLite and Jupyter Notebook, by Chonghua Yin, available at https://github.com/royalosyin/Practice-SQL-with-SQLite-and-Jupyter-Notebook.
BACK TO TOP