App Academy W5D2

Yizhe Wang
6 min readOct 28, 2020

--

Introduction to SQL

Learning Goals

  • Understand why we need databases
  • 存取数据,数据保存ACID, atomic/consistent/independent/durable
  • Explain “Relational DataBase Management System”
  • 类似mysql,Postgresql这种数据库管理软件
  • Be able to access data in a database and perform basic operations on the data
  • select from where a=b
  • select from (select from)
  • select name, count* from table group by 1
  • select name, count* from table group by 1 order by count* desc having count* > 5

How do websites store information?

关系型数据库的话就是表,每个有primary key作为id,互相有联系,可以通过id来回join,得到关联。然后连接到一块得到数据,通过用户id,连接transaction表,找到用户的所有transactions​,找到transaction中的所有refund,看是否有没ship回来的。

Answer: Databases!

  • There are different models of databases
  • They differ in how they store data and how that data can therefore be accessed
  • We’re going to use a relational database model. ‘Relational’ refers to the use of tables to store and organize data
  • Each table is referred to as a “relation”, as it is a collection of related data entries
  • We can imagine a relational database as an excel spreadsheet with a collection of related tables
  • Sample Database

​ ​

There are two parts to storing and accessing data:

  1. DataBase Management System
  • An application that stores data at scale and can be queried for data ​

2. The Querying Language

  • The language we use to interact with the database management system to create our database, insert data, and query (i.e. ask) it for data ​ ​ ​ ​

“Relational DataBase Management System” (RDBMS)

  • An application that stores data at scale and organizes the data in tables ​ We will be using:
  • PostgreSQL, aka Postgres (open source)
  • We’ll be using this for the next two days, and for most projects
  • SQLite (open source)
  • Thursday’s project uses this DB ​ ​ ​

SQL

The way we communicate with our database

  • SQL stands for “Structured Query Language” ​
  • It’s a domain-specific language (DSL) for relational databases (other DSLs you’ve encountered: HTML, RSpec) ​
  • Uses declarative programming ​
  • 声明式的意义就是要声明一个我想要的结果,让计算机自己去处理寻求答案,并不在意如何达成,而普通的编程语言大部分是命令式的(imperative),指导计算机进行每一步操作,具体参看https://zhuanlan.zhihu.com/p/34445114
  • SQL is extremely popular: https://insights.stackoverflow.com/survey/2019#technology ​ ​ ​

PostgreSQL Shell Commands

  • $ psql
  • CREATE DATABASE lecture;
  • ($ cat setup_lecture_demo.sql | psql lecture)
  • $ psql lecture
  • \d - list tables
  • \d table - show schema for table
  • \? to list meta commands
  • end queries with a ; ​ ​ ​

The basics of SQL queries

  • SELECT : choose which columns to extract data from
  • FROM: specifies the relation (table) you're getting data from
  • WHERE/WHERE NOT: filters the data according to a logical expression
  • =, >=, <=, >, <, <>/!=
  • IN, BETWEEN, LIKE, (%)
  • AND, OR
  • Use IS NULL and IS NOT NULL to check for null values ​

Query: Get the names of all App Academy instructors

​ ​ ​

​ Query: Get the names of all App Academy instructors. ​

SELECT
app_academy.name
FROM
app_academy;​

Query: Get the possessions where the cost is greater than a 1000 dollars?

SELECT 
*
FROM
possessions
WHERE
cost > 1000;

Query: What if I wanted all instructors whose name start with “R”?

SELECT
name
FROM
app_academy
WHERE
name
LIKE
'R%';

Common SQL Filters and Commands

  • ORDER BY: Sorts results based on a specific column
  • ASC or DESC
  • ASC is default
  • LIMIT: how many rows you want in the result
  • OFFSET: how many rows you want to skip from the top
  • DISTINCT: De-duplicates data in a result (like Array#uniq)
  • SELECT DISTINCT name, type
  • SELECT COUNT(DISTINCT name) ​ ​ ​ ​ ​

​ Query: Get the distinct names of possessions

SELECT
DISTINCT possessions.name
FROM
possessions;

Query Pop Quiz: Get the name & cost of the second most expensive possession whose name has at least two words.

​ Query: Get the name & cost of the second most expensive possession whose name has at least two words. ​

SELECT
name, cost
FROM
possessions
WHERE
name LIKE '% %'
ORDER BY
cost DESC
LIMIT
1
OFFSET
1;​

Query: Get all of the app academy instructors who do not have a pod leader ​

SELECT
name
FROM
app_academy
WHERE
pod_leader_id IS NULL;

  • Aggregate functions combine data from a column into a single value
  • You should always use an alias with aggregate functions to make for clearer results**********************************************
  • 这一条我没怎么执行过。
  • COUNT, SUM, AVG, MIN/MAX, and more ​​

Query: Get the average cost of all possessions

SELECT
AVG(cost) AS avg_cost
FROM
possessions;​

Query: Get the total cost of all possessions

SELECT
SUM(cost) AS total_cost
FROM
possessions;​

Query: Count the total number of possessions

SELECT
COUNT(*) AS num_possessions
FROM
possessions;​

Query: Count the distinct types of possessions

SELECT
COUNT(DISTINCT type) AS num_types
FROM
possessions;

Query: Get the cost of the least expensive item

SELECT
MIN(cost) AS least_expensive
FROM
possessions;

GROUP BY​

  • GROUP BY groups rows with matching values for given column
  • Collapses each group of rows into a single row
  • Aggregate functions in SELECT will apply to the individual groups, not the whole rows ​ ​ ​****************************************************
  • 这条很重要,也就是我有些疑惑的地方。

Query: Show each type of possession using GROUP BY

SELECT
type
FROM
possessions
GROUP BY
type;

Query: For each type of possession, show the type and number of items ​

SELECT
type, COUNT(*) AS num_items
FROM
possessions
GROUP BY
type;

Query: For each type of possession with more than 5 items, show the type and count of each ​

SELECT
type, COUNT(*) AS num_items
FROM
possessions
GROUP BY
type
HAVING
COUNT(*) > 5;

***********​ ​ ​WHERE vs HAVING *************

  • HAVING works like WHERE, but...
  • WHERE gets evaluated before GROUP BY.
  • HAVING gets evaluated after GROUP BY. ​
  • Since the WHERE clause gets executed before the GROUP BY clause, grouped entries cannot be filtered by WHERE
  • HAVING is same as the WHERE clause but is applied on grouped entries
  • Aggregate functions are not allowed in WHERE. You must use aggregate functions in HAVING WHY?​ ​

Order of operation execution in SQL

  1. FROM
  2. JOIN
  3. WHERE
  4. GROUP BY
  5. HAVING
  6. SELECT
  7. ORDER BY
  8. LIMIT / OFFSET ​

(Why can’t we use aliased names in HAVING?)

Can I use the result of a query in a different query?

  • Ex: What is the most common possession — and which ones cost more than $1,000? ​ ​ ​

Answer: Yes — Use a subquery!

Sometimes it’s useful to use the result of a query inside another query. This is a Subquery. ​

  • Can be used in FROM clause (needs to be aliased)
  • Can be used in WHERE clause:
  • WHERE IN Subquery
  • WHERE NOT IN Subquery
  • When to use a subquery:
  • You’ll want to use a subquery if your query follows the logic of “Get me this data A from dataset B” ​ ​ ​

Query: Of the most common item, list the ones that cost over $1000​

SELECT
*
FROM
possessions
WHERE
name IN (
SELECT
name
FROM
possessions
GROUP BY
name
ORDER BY
COUNT(*) DESC
LIMIT
1
) AND cost > 1000;

​ Query: How many types of possessions have a total cost over $1000? ​

SELECT
COUNT(*) AS num_types
FROM (
SELECT
type
FROM
possessions
GROUP BY
type
HAVING
SUM(cost) > 1000
) AS pricey_possessions;

​ Query: Who is the owner of the most expensive item? ​

SELECT
name
FROM
app_academy
WHERE
id = (
SELECT
owner_id
FROM
possessions
ORDER BY
cost DESC
LIMIT
1);

JOINS

  • Combine data from multiple tables into one relation using a JOIN
  • Types of relationships between tables:
  • One-to-many (hierarchical)
  • Ex: Each TA has one pod leader, and each pod-leader has many TAs
  • Many-to-many (horizontal)
  • Ex: Each a/A TA has many hack reactor friends, and each hack reactor TA has many a/A friends

The 3 Most Common Types of JOINs

  • INNER JOIN
  • returns only rows from table1 and table2 that match each other. This is the default.
  • LEFT OUTER JOIN - same as LEFT JOIN
  • returns all rows in table1, whether or not they match table2.
  • FULL OUTER JOIN
  • returns all rows in table1 and table2, whether or not their data matches up ​

Query: Show the name of all of the possessions and their owner

SELECT
app_academy.name AS owner_name,
possessions.name AS possession_name
FROM
app_academy
JOIN possessions
ON app_academy.id = possessions.owner_id;

Query: Find the total number of possessions owned by each person

SELECT
app_academy.name AS owner_name,
COUNT(*) AS total_possessions
FROM
app_academy
JOIN possessions
ON app_academy.id = possessions.owner_id
GROUP BY
app_academy.name;

Query: Get all the friendships between app academy and hack reactor people

SELECT
*
FROM
app_academy
JOIN friendships
ON app_academy.id = friendships.aa_id
JOIN hack_reactor
ON hack_reactor.id = friendships.hr_id;​

What’s the difference between a subquery and a join?

  • When you have lots of data across many tables
  • Subqueries use less memory than joins
  • Subqueries use more CPU than joins (CPU is usually the bottleneck)
  • In practice, this can differ between SQL engines

What is a self-join?

  • Joins a table back against itself
  • You must alias the table names
  • Use descriptive aliases ​

Query: Get all of the app academy instructors who are also pod leaders

SELECT
DISTINCT(pod_leaders.name)
FROM
app_academy AS instructors
JOIN app_academy AS pod_leaders
ON instructors.pod_leader_id = pod_leaders.id;

Review

  1. How can we store information?
  2. How can we access the data?
  3. What are the basics of SQL queries?
  4. What if I want to do calculations on my data?
  5. What if I want to group my data?
  6. Can I use the result of a query in a different query?
  7. How can I query relationships across tables?
  8. What is a self-join? ​***********self join绝对需要再研究一下。

--

--