Exploring a Pre-existing Database with SQL¶
To get familiar with a pre-existing database using raw SQL queries, you need to explore the database schema, tables, relationships, and the data.
Table of Contents¶
- Connect to the Database
- Explore the Schema
- View Data Samples
- Check how the Table is Indexed
- Analyze table sizes and usage
- Document Findings
- Misc Queries
Connect to the Database¶
Whatever database you're using has a CLI interface that can accept SQL queries.
psql -U username -d database_name # PostgreSQL
mariadb -h 192.168.4.66 -u username database_name # mariadb
Explore the Schema¶
List all Tables¶
-- PostgreSQL
SELECT table_name
FROM information_schema.tables
WHERE table_schema = 'public';
-- MySQL
SHOW TABLES;
-- SQLITE
.tables
Check table columns and data types¶
-- PostgreSQL
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'the_table_name';
-- MySQL
DESCRIBE table_name;
-- SQLite
PRAGMA table_info(table_name);
Check Relationships and Foreign Keys¶
-- PostgreSQL
SELECT
tc.table_name,
kcu.column_name,
ccu.table_name AS foreign_table_name,
ccu.column_name AS foreign_column_name
FROM
information_schema.table_contracts AS tc
JOIN information_schema.key_column_usage AS kcu
ON tc.constraint_name = kcu.constraint_name
JOIN information_schema.constraint_column_usage AS ccu
ON ccu.constraint_name = tc.constraint_name
WHERE constraint_type = 'FOREIGN KEY';
It queries the
information_schema
tables, which are a builtin set of tables that
describe the structure of a database.The query:
- Gets the name of the table (
tc.table_name
) that has a foreign key. - Gets the column in that table that serves as the foreign key(
kcu.column_name
) - Gets the table that the foreign key points to (
ccu.table_name AS foreign_table_name
) - Identifies the specific column in the referenced table (
ccu.column_name AS foreign_column_name
)
The JOIN/AS/ON
statements here:
-
JOIN
: Combines rows from two tables based on a related column.-
table_constraints
joins withkey_column_usage
based ontc.constraint_name = kcu.constraint_name
-
This makes sure that for each foreign key constraint, we get both the table and column it originates from (
kcu
) and the table/column it points to (ccu
).
-
-
AS
: Creates an alias for columns or tables to mmake them easier to reference in the query.ccu.table_name AS foreign_table_name
: Lets you refere toccu.table_name
asforeign_table_name
.
If you're using MySQL, you might need to query the INFORMATION_SCHEMA
database in
a similar way.
View Data Samples¶
View the first few rows of a table:
Get the row count:
Check how the Table is Indexed¶
Get a list of the indexes:
-- PostgreSQL
SELECT
tablename,
indexname,
indexdef
FROM
pg_indexes
WHERE
schemaname = 'public';
-- MySQL
SHOW INDEX FROM table_name;
-- SQLite
PRAGMA index_list(table_name);
Check how Tables are Linked by Joining Them¶
With the foreign keys, check how things are tied to each other.
- Look for naming conventions.
- e.g., fields like
id
,user_id
,order_id
, etc.
- e.g., fields like
-
Use
JOIN
queries to confirm relationships.
-
JOIN
: Combines the rows from two tables based on the specified condition (ON
).orders
andusers
are combined whereo.user_id = u.id
.- It matches the row in the
orders
table where theuser_id
column corresponds to theid
column in theusers
table.
-
orders o
: Creates an alias to be referenced inside the query.users u
does the same.
So if the orders
table contains this data:
id |
user_id |
amount |
---|---|---|
1 |
101 |
50.00 |
2 |
102 |
75.00 |
And the users
table contains:
id |
name |
---|---|
101 |
Bob |
102 |
Alice |
Then that query will combine those into this:
id (orders ) |
user_id |
amount |
id (users ) |
name |
---|---|---|---|---|
1 |
101 |
50.00 |
101 |
Bob |
2 |
102 |
75.00 |
102 |
Alice |
Analyze table sizes and usage¶
-- PostgreSQL
SELECT
relname AS table_name,
pg_size_pretty(pg_total_relation_size(relid)) AS total_size
FROM
pg_catalog.pg_statio_user_tables
ORDER BY
pg_total_relation_size(relid) DESC;
-
relname
: Builtin for PostgreSQL.- It's a column in the
pg_statio_user_tables
and refers to the name of the table (or relation, in PostgreSQL terminology).
- It's a column in the
-
pg_size_pretty()
: Converts raw bute sizes into human-readable format. -
pg_total_relation_size()
: Calculates the total size of a table, including indexes, TOAST data, etc.
-
pg_statio_user_tables
: A PostgreSQL system view that provides stats for user-defined tables.
The MySQL equivalent:
-- MySQL
SELECT
table_name AS Table
round(((data_length + index_length) / 1024 / 1024), 2) AS Size_MB
FROM information_schema.TABLES
WHERE table_schema = "database_name"
ORDER BY Size_MB DESC;
-
ROUND()
: Builtin function to round a number to a specific number of decimal places. -
information_schema.TABLES
: Contains metadata about all tables in the database. DESC
: Order results in descending order (largest first).
Document Findings¶
Keep notes of the database to reference.
Keep a list of tables, their purposes and relationships, and note improtant columns -
primary keys, foreign keys, and indexed columns.
Misc Queries¶
Use "exploratory queries" to learn how the data behaves.