Lesson 14: Databases

Homepage Content Slides Video

Warning

This lesson is under construction. Learn from it at your own risk. If you have any feedback, please fill out our General Feedback Survey.

Overview

  • About Databases.
  • When to use Databases.
  • When not to use Databases.
  • Database Concepts.
  • SQL Syntax.
  • Ways to use a Database.

Databases

A program that can efficiently store and retrieve large amounts of data.

Relating Data

Imagine a kitchen cupboard program that stores food currently in stock, where it is, recipes using it, expiration dates, etc.

Databases and Structure

Structure
SQL databases are based on around Relational Algebra
<Table 1>
+---------------+-----------+-----------+
| <Primary key> | <Field 1> | <Field 2> |
+---------------+-----------+-----------+
| 1             | value     | value`    |
| ...           | ...       | ...       |
+---------------+-----------+-----------+

<Table 2>
+---------------+-----------+--------------------------+
| <Primary key> | <Field 1> | <Foreign key to Table 1> |
+---------------+-----------+--------------------------+
| 1             | val       | 7                        |
| ...           | ...       | ...                      |
+---------------+-----------+--------------------------+

Concept: Relational Algebra

<Table 1>
+------------------+------------------+
| <Name>           | <Major>          |
+------------------+------------------+
| Linus Torvalds   | Computer Science |
| Richard Stallman | Computer Science |
+------------------+------------------+
<Table 2>
+------------------+--------------+----------------+
| <Major>          | <School>     | <Advisor Name> |
+------------------+--------------+----------------+
| Computer Science | Engineering  | Dennis Ritchie |
+------------------+--------------+----------------+
<Table 1> JOIN <Table 2>
+------------------+------------------+-------------+----------------+
| <Name>           | <Major>          | <School>    | <Advisor Name> |
+------------------+------------------+-------------+----------------+
| Linus Torvalds   | Computer Science | Engineering | Dennis Ritchie |
| Richard Stallman | Computer Science | Engineering | Dennis Ritchie |
+------------------+------------------+-------------+----------------+

Concept: Relational Algebra

Relational Algebra Example

When to use a Database

When you have to work with a lot of well structured data.
Databases are useful for two situations:
  1. Lots of data.
  2. High throughput.

Lots of Data

Global Internet traffic by year

Note: 1 PB = 1,000,000 GB

Concurrent Read/Writes

Atomicity:
Either the entire transaction succeeds or it fails completely
Consistency:
Transactions always leave the database in a valid state
Isolation:
Concurrent operations look like they took place sequentially
Durability:
Transactions are permanent after they're committed

When not to use a Database

Databases might not be particularly useful for:
  • Storing content for a website that rarely updates
    • Alternative: Use a static site generator such as Pelican or Jekyll
  • Hosting large individual files
    • Alternative: Store the files on disk

Types of Databases

There are two broad types of databases.

SQL

Examples:
  • MySQL/MariaDB
  • PostgreSQL
  • SQLite

NoSQL

Examples:
  • MongoDB
  • Apache Casandra
  • Dynamo
  • Redis

Database Concepts

Schemas

Schemas are how you define what a table looks like, what data will populate it, and what each field will be called. The schema also defines relationships between tables; more or less the blueprint of your database.

CREATE TABLE nobel (
    id int(11)
        NOT NULL
        AUTO_INCREMENT,
    yr int(11),
    subject varchar(15),
    winner varchar(50)
)
ENGINE = InnoDB;

Migrations

Migrations are the process of updating tables and fields in your database. Since databases might need to change in the future (you never know!) you can create and run a migrations to modify your schema as needed.

from django.db import migrations, models

class Migration(migrations.Migration):
    dependencies = [
        ('app', '0001_initial')
    ]

    operations = [
        migrations.AddField("Nobel", "topic", models.CharField(80))
    ]

Raw SQL Syntax

There are many tools out there that allow you to avoid writing raw SQL, but it's always good to know the syntax. One day you may need to write raw SQL queries, and at the very least you'll need to read SQL for debugging purposes.

SELECT

Select statements get data from the database which matches the requirements you have.

SELECT
    yr, subject, winner
FROM
    nobel
WHERE
    yr = 1960 AND subject='medicine';
+------+------------+-------------------------------+
| yr   | subject    | winner                        |
+------+------------+-------------------------------+
| 1960 | "medicine" | "Sir Frank Macfarlane Burnet" |
| 1960 | "medicine" | "Sir Peter Brian Medawar"     |
+------+------------+-------------------------------+

INSERT

Insert statements create an entry into a table and populate the fields appropriately.

INSERT INTO
    nobel
VALUES
    ('2013','Literature','Herta Müller');
+-----+------+--------------+----------------+
| id  | yr   | subject      | winner         |
+-----+------+--------------+----------------+
| ... | ...  | ...          | ...            |
| 873 | 2013 | "Literature" | "Herta Müller" |
| ... | ...  | ...          | ...            |
+-----+------+--------------+----------------+

UPDATE

Update statements modify an existing entry in a table.

UPDATE
    nobel
SET
    winner='Andrew Ryan'
WHERE
    subject='Peace' AND yr='1951';
+-----+------+---------+----------------+
| id  | yr   | subject | winner         |
+-----+------+---------+----------------+
| ... | ...  | ...     | ...            |
| 120 | 1951 | "Peace" | "Andrew Ryan"  |
| ... | ...  | ...     | ...            |
+-----+------+---------+----------------+

DELETE

Delete statements... You can guess what a delete statement does I bet.

DELETE FROM
   nobel
WHERE
   yr = 1989 AND subject = 'peace';

TODO: Crafting Queries!

Craft a query to get the following data out of our Nobel table:

Don't worry about getting it exactly right! Craft pseudo-SQL!

TODO: Crafting Queries!

Answers

SELECT winner FROM nobel
WHERE yr=1952 AND subject='medicine'; #(Selman A. Wksman)

SELECT * FROM nobel
WHERE yr=1903 AND subject='physics'; #(3)

SELECT * FROM nobel
WHERE winner='Linus Pauling'; #(2)

SELECT COUNT(*) FROM nobel
AS n0 INNER JOIN nobel AS n1 on n0.winner=n1.winner
AND (n0.yr!=n1.yr or n0.subject!=n1.subject); #(16)

TODO: Using a Real Database

Now that we have belabored the theory of databases and SQL, lets actually start doing work with databases.

Throughout this exercise we will load it up with some data (nobel.sql.gz) and learn to interact with it via the command line interface.

Importing Data

# Create a table for Nobel prizes
$ mysqladmin -u root create nobel
# Get the database from the osl server
$ wget http://osl.io/nobel -O nobel.sql.gz
# Gunzip the file and import it into the nobel db
$ gunzip nobel.sql.gz
$ mysql nobel < nobel.sql
# OR do it in one step!
$ zcat nobel.sql.gz | mysql nobel
# Open up mysql shell to execute queries
$ mysql nobel
# List all the tables
SHOW TABLES;
# Print the layout of the database to the screen
DESCRIBE nobel;

Ways to Use a Database

Now that you have a working database you have a few options for how you want to use it.

Raw Queries

We've already done this in the previous exercise. You use your choice of program to interact with the database exclusively via SQL and run the queries you want. This is rarely the way to go and isn't very useful for most applications. The SQL language is only good for doing database stuff.

mysql> SELECT subject, yr, winner FROM nobel
       WHERE yr=1960;
+------+------------+-----------------------------+
| yr   | subject    | winner                      |
+------+------------+-----------------------------+
| 1960 | Chemistry  | Willard F. Libby            |
| 1960 | Literature | Saint-John Perse            |
| ...  | ...        | ...                         |
+------+------------+-----------------------------+

Native Queries

See nobel.py

#!/usr/bin/python
import MySQLdb
import os

db = MySQLdb.connect(
  os.environ['MYSQL_PORT_3306_TCP_ADDR'],
  'root',
  os.environ['MYSQL_ENV_MYSQL_ROOT_PASSWORD'],
  "nobel"
)

cursor = db.cursor()
cursor.execute("SELECT subject, yr, winner FROM nobel WHERE yr = 1960")
data = cursor.fetchall()

for winner in data:
    print "%s winner in %s: %s " % (winner[0], winner[1], winner[2])

db.close()

Object Relational Mappers

# SELECT * FROM nobel WHERE yr = 1960
for subject, yr, winner in session.query(Nobel).filter_by(yr=1960):
    print "%s winner in %s: %s " % (subject, yr, winner)

Further Reading

CS 340
The CS 340 course at OSU (titled "Databases") is a great introduction to this topic. If you have the option to take it you should!