IMDb provides a subset of their data in tab-separated format for personal and non-commercial use. You can find more
information, including legal, at IMDb Non-Commercial Datasets. These are some notes on the schema.
There are 7 files provided, at the time of writing this article:
# |
Name |
Compressed size (MB) |
Uncompressed size (MB) |
Number of rows |
1 |
name.basics.tsv.gz |
245 |
753 |
12,981,035 |
2 |
title.akas.tsv.gz |
305 |
1783 |
37,728,267 |
3 |
title.basics.tsv.gz |
172 |
841 |
10,285,368 |
4 |
title.crew.tsv.gz |
66 |
325 |
10,285,368 |
5 |
title.episode.tsv.gz |
41 |
196 |
7,844,603 |
6 |
title.principals.tsv.gz |
436 |
2475 |
58,914,239 |
7 |
title.ratings.tsv.gz |
7 |
23 |
1,366,240 |
There are 2 unique alphanumeric identifiers in those files:
tconst
is an ID for a title, and
nconst
is an ID for a name.
This diagram shows the relationships between the 7 data exports. This isn't exactly an entity relationship diagram, but
it's not too far either.
This diagram was created using DBML and can be imported into
dbdiagram.io. Here's the code:
Table name_basics {
nconst string [primary key]
primaryName string
birthYear number
deathYear number
primaryProfession string_array
knownForTitles nconst_array [ref: < title_basics.tconst]
}
Table title_basics {
tconst string [primary key]
titleType string
primaryTitle string
originalTitle string
isAdult boolean
startYear number
endYear number
runtimeMinutes number
genres string_array
}
Table title_akas {
titleId string [ref: > title_basics.tconst]
ordering integer
title string
region string
language string
types string_array
attributes string_array
isOriginalTitle boolean
}
Table title_crew {
tconst string [ref: - title_basics.tconst]
directors nconst_array [ref: > name_basics.nconst]
writers nconst_array [ref: > name_basics.nconst]
}
Table title_episode {
tconst string [primary key]
parentTconst string [ref: > title_basics.tconst]
seasonNumber number
episodeNumber number
}
Table title_principals {
tconst string [ref: - title_basics.tconst]
ordering number
nconst string [ref: - name_basics.nconst]
category string
job string
characters string
}
Table title_ratings {
tconst string [ref: - title_basics.tconst]
averageRating number
numVotes number
}
5 First Rows
Here's a sample of the data, these are the first 5 rows from each export.
name.basics.tsv
nconst |
primaryName |
birthYear |
deathYear |
primaryProfession |
knownForTitles |
nm0000001 |
Fred Astaire |
1899 |
1987 |
soundtrack,actor,miscellaneous |
tt0050419,tt0053137,tt0072308,tt0031983 |
nm0000002 |
Lauren Bacall |
1924 |
2014 |
actress,soundtrack |
tt0075213,tt0037382,tt0038355,tt0117057 |
nm0000003 |
Brigitte Bardot |
1934 |
\N |
actress,soundtrack,music_department |
tt0054452,tt0056404,tt0057345,tt0049189 |
nm0000004 |
John Belushi |
1949 |
1982 |
actor,soundtrack,writer |
tt0080455,tt0072562,tt0077975,tt0078723 |
nm0000005 |
Ingmar Bergman |
1918 |
2007 |
writer,director,actor |
tt0083922,tt0069467,tt0050986,tt0050976 |
title.akas.tsv
titleId |
ordering |
title |
region |
language |
types |
attributes |
isOriginalTitle |
tt0000001 |
1 |
Карменсіта |
UA |
\N |
imdbDisplay |
\N |
0 |
tt0000001 |
2 |
Carmencita |
DE |
\N |
\N |
literal title |
0 |
tt0000001 |
3 |
Carmencita - spanyol tánc |
HU |
\N |
imdbDisplay |
\N |
0 |
tt0000001 |
4 |
Καρμενσίτα |
GR |
\N |
imdbDisplay |
\N |
0 |
tt0000001 |
5 |
Карменсита |
RU |
\N |
imdbDisplay |
\N |
0 |
title.basics.tsv
tconst |
titleType |
primaryTitle |
originalTitle |
isAdult |
startYear |
endYear |
runtimeMinutes |
genres |
tt0000001 |
short |
Carmencita |
Carmencita |
0 |
1894 |
\N |
1 |
Documentary,Short |
tt0000002 |
short |
Le clown et ses chiens |
Le clown et ses chiens |
0 |
1892 |
\N |
5 |
Animation,Short |
tt0000003 |
short |
Pauvre Pierrot |
Pauvre Pierrot |
0 |
1892 |
\N |
4 |
Animation,Comedy,Romance |
tt0000004 |
short |
Un bon bock |
Un bon bock |
0 |
1892 |
\N |
12 |
Animation,Short |
tt0000005 |
short |
Blacksmith Scene |
Blacksmith Scene |
0 |
1893 |
\N |
1 |
Comedy,Short |
title.crew.tsv
tconst |
directors |
writers |
tt0000001 |
nm0005690 |
\N |
tt0000002 |
nm0721526 |
\N |
tt0000003 |
nm0721526 |
\N |
tt0000004 |
nm0721526 |
\N |
tt0000005 |
nm0005690 |
\N |
title.episode.tsv
tconst |
parentTconst |
seasonNumber |
episodeNumber |
tt0041951 |
tt0041038 |
1 |
9 |
tt0042816 |
tt0989125 |
1 |
17 |
tt0042889 |
tt0989125 |
\N |
\N |
tt0043426 |
tt0040051 |
3 |
42 |
tt0043631 |
tt0989125 |
2 |
16 |
title.principals.tsv
tconst |
ordering |
nconst |
category |
job |
characters |
tt0000001 |
1 |
nm1588970 |
self |
\N |
["Self"] |
tt0000001 |
2 |
nm0005690 |
director |
\N |
\N |
tt0000001 |
3 |
nm0374658 |
cinematographer |
director of photography |
\N |
tt0000002 |
1 |
nm0721526 |
director |
\N |
\N |
tt0000002 |
2 |
nm1335271 |
composer |
\N |
\N |
title.ratings.tsv
tconst |
averageRating |
numVotes |
tt0000001 |
5.7 |
2004 |
tt0000002 |
5.8 |
269 |
tt0000003 |
6.5 |
1903 |
tt0000004 |
5.5 |
178 |
tt0000005 |
6.2 |
2685 |