App Academy W5D2
Introduction to SQL
Learning Goals
- Understand why we need databases
- 存取数据,数据保存ACID, atomic/consistent/independent/durable
- Explain “Relational DataBase Management System”
- 类似mysql,Postgresql这种数据库管理软件
- Be able to access data in a database and perform basic operations on the data
- select from where a=b
- select from (select from)
- select name, count* from table group by 1
- select name, count* from table group by 1 order by count* desc having count* > 5
How do websites store information?
关系型数据库的话就是表,每个有primary key作为id,互相有联系,可以通过id来回join,得到关联。然后连接到一块得到数据,通过用户id,连接transaction表,找到用户的所有transactions,找到transaction中的所有refund,看是否有没ship回来的。
Answer: Databases!
- There are different models of databases
- They differ in how they store data and how that data can therefore be accessed
- We’re going to use a relational database model. ‘Relational’ refers to the use of tables to store and organize data
- Each table is referred to as a “relation”, as it is a collection of related data entries
- We can imagine a relational database as an excel spreadsheet with a collection of related tables
- Sample Database
There are two parts to storing and accessing data:
- DataBase Management System
- An application that stores data at scale and can be queried for data
2. The Querying Language
- The language we use to interact with the database management system to create our database, insert data, and query (i.e. ask) it for data
“Relational DataBase Management System” (RDBMS)
- An application that stores data at scale and organizes the data in tables We will be using:
- PostgreSQL, aka Postgres (open source)
- We’ll be using this for the next two days, and for most projects
- SQLite (open source)
- Thursday’s project uses this DB
SQL
The way we communicate with our database
- SQL stands for “Structured Query Language”
- It’s a domain-specific language (DSL) for relational databases (other DSLs you’ve encountered: HTML, RSpec)
- Uses declarative programming
- 声明式的意义就是要声明一个我想要的结果,让计算机自己去处理寻求答案,并不在意如何达成,而普通的编程语言大部分是命令式的(imperative),指导计算机进行每一步操作,具体参看https://zhuanlan.zhihu.com/p/34445114
- SQL is extremely popular: https://insights.stackoverflow.com/survey/2019#technology
PostgreSQL Shell Commands
$ psql
CREATE DATABASE lecture;
- (
$ cat setup_lecture_demo.sql | psql lecture
) $ psql lecture
\d
- list tables\d table
- show schema for table\?
to list meta commands- end queries with a
;
The basics of SQL queries
SELECT
: choose which columns to extract data fromFROM
: specifies the relation (table) you're getting data fromWHERE
/WHERE NOT
: filters the data according to a logical expression=
,>=
,<=
,>
,<
,<>
/!=
IN
,BETWEEN
,LIKE
, (%
)AND
,OR
- Use
IS NULL
andIS NOT NULL
to check for null values
Query: Get the names of all App Academy instructors
Query: Get the names of all App Academy instructors.
SELECT
app_academy.name
FROM
app_academy;
Query: Get the possessions where the cost is greater than a 1000 dollars?
SELECT
*
FROM
possessions
WHERE
cost > 1000;
Query: What if I wanted all instructors whose name start with “R”?
SELECT
name
FROM
app_academy
WHERE
name
LIKE
'R%';
Common SQL Filters and Commands
ORDER BY
: Sorts results based on a specific columnASC
orDESC
ASC
is defaultLIMIT
: how many rows you want in the resultOFFSET
: how many rows you want to skip from the topDISTINCT
: De-duplicates data in a result (likeArray#uniq
)SELECT DISTINCT name, type
SELECT COUNT(DISTINCT name)
Query: Get the distinct names of possessions
SELECT
DISTINCT possessions.name
FROM
possessions;
Query Pop Quiz: Get the name & cost of the second most expensive possession whose name has at least two words.
Query: Get the name & cost of the second most expensive possession whose name has at least two words.
SELECT
name, cost
FROM
possessions
WHERE
name LIKE '% %'
ORDER BY
cost DESC
LIMIT
1
OFFSET
1;
Query: Get all of the app academy instructors who do not have a pod leader
SELECT
name
FROM
app_academy
WHERE
pod_leader_id IS NULL;
- Aggregate functions combine data from a column into a single value
- You should always use an alias with aggregate functions to make for clearer results**********************************************
- 这一条我没怎么执行过。
COUNT
,SUM
,AVG
,MIN
/MAX
, and more
Query: Get the average cost of all possessions
SELECT
AVG(cost) AS avg_cost
FROM
possessions;
Query: Get the total cost of all possessions
SELECT
SUM(cost) AS total_cost
FROM
possessions;
Query: Count the total number of possessions
SELECT
COUNT(*) AS num_possessions
FROM
possessions;
Query: Count the distinct types of possessions
SELECT
COUNT(DISTINCT type) AS num_types
FROM
possessions;
Query: Get the cost of the least expensive item
SELECT
MIN(cost) AS least_expensive
FROM
possessions;
GROUP BY
GROUP BY
groups rows with matching values for given column- Collapses each group of rows into a single row
- Aggregate functions in
SELECT
will apply to the individual groups, not the whole rows **************************************************** - 这条很重要,也就是我有些疑惑的地方。
Query: Show each type of possession using GROUP BY
SELECT
type
FROM
possessions
GROUP BY
type;
Query: For each type of possession, show the type and number of items
SELECT
type, COUNT(*) AS num_items
FROM
possessions
GROUP BY
type;
Query: For each type of possession with more than 5 items, show the type and count of each
SELECT
type, COUNT(*) AS num_items
FROM
possessions
GROUP BY
type
HAVING
COUNT(*) > 5;
*********** WHERE vs HAVING *************
HAVING
works likeWHERE
, but...WHERE
gets evaluated beforeGROUP BY
.HAVING
gets evaluated afterGROUP BY
. - Since the
WHERE
clause gets executed before theGROUP BY
clause, grouped entries cannot be filtered byWHERE
HAVING
is same as theWHERE
clause but is applied on grouped entries- Aggregate functions are not allowed in
WHERE
. You must use aggregate functions inHAVING
WHY?
Order of operation execution in SQL
- FROM
- JOIN
- WHERE
- GROUP BY
- HAVING
- SELECT
- ORDER BY
- LIMIT / OFFSET
(Why can’t we use aliased names in HAVING?)
Can I use the result of a query in a different query?
- Ex: What is the most common possession — and which ones cost more than $1,000?
Answer: Yes — Use a subquery!
Sometimes it’s useful to use the result of a query inside another query. This is a Subquery.
- Can be used in
FROM
clause (needs to be aliased) - Can be used in
WHERE
clause: WHERE IN
SubqueryWHERE NOT IN
Subquery- When to use a subquery:
- You’ll want to use a subquery if your query follows the logic of “Get me this data A from dataset B”
Query: Of the most common item, list the ones that cost over $1000
SELECT
*
FROM
possessions
WHERE
name IN (
SELECT
name
FROM
possessions
GROUP BY
name
ORDER BY
COUNT(*) DESC
LIMIT
1
) AND cost > 1000;
Query: How many types of possessions have a total cost over $1000?
SELECT
COUNT(*) AS num_types
FROM (
SELECT
type
FROM
possessions
GROUP BY
type
HAVING
SUM(cost) > 1000
) AS pricey_possessions;
Query: Who is the owner of the most expensive item?
SELECT
name
FROM
app_academy
WHERE
id = (
SELECT
owner_id
FROM
possessions
ORDER BY
cost DESC
LIMIT
1);
JOINS
- Combine data from multiple tables into one relation using a JOIN
- Types of relationships between tables:
- One-to-many (hierarchical)
- Ex: Each TA has one pod leader, and each pod-leader has many TAs
- Many-to-many (horizontal)
- Ex: Each a/A TA has many hack reactor friends, and each hack reactor TA has many a/A friends
The 3 Most Common Types of JOINs
INNER JOIN
- returns only rows from
table1
andtable2
that match each other. This is the default. LEFT OUTER JOIN
- same asLEFT JOIN
- returns all rows in
table1
, whether or not they matchtable2
. FULL OUTER JOIN
- returns all rows in
table1
andtable2
, whether or not their data matches up
Query: Show the name of all of the possessions and their owner
SELECT
app_academy.name AS owner_name,
possessions.name AS possession_name
FROM
app_academy
JOIN possessions
ON app_academy.id = possessions.owner_id;
Query: Find the total number of possessions owned by each person
SELECT
app_academy.name AS owner_name,
COUNT(*) AS total_possessions
FROM
app_academy
JOIN possessions
ON app_academy.id = possessions.owner_id
GROUP BY
app_academy.name;
Query: Get all the friendships between app academy and hack reactor people
SELECT
*
FROM
app_academy
JOIN friendships
ON app_academy.id = friendships.aa_id
JOIN hack_reactor
ON hack_reactor.id = friendships.hr_id;
What’s the difference between a subquery and a join?
- When you have lots of data across many tables
- Subqueries use less memory than joins
- Subqueries use more CPU than joins (CPU is usually the bottleneck)
- In practice, this can differ between SQL engines
What is a self-join?
- Joins a table back against itself
- You must alias the table names
- Use descriptive aliases
Query: Get all of the app academy instructors who are also pod leaders
SELECT
DISTINCT(pod_leaders.name)
FROM
app_academy AS instructors
JOIN app_academy AS pod_leaders
ON instructors.pod_leader_id = pod_leaders.id;
Review
- How can we store information?
- How can we access the data?
- What are the basics of SQL queries?
- What if I want to do calculations on my data?
- What if I want to group my data?
- Can I use the result of a query in a different query?
- How can I query relationships across tables?
- What is a self-join? ***********self join绝对需要再研究一下。