IMDB non-commercial datasets schema

IMDb provides a subset of their data in tab-separated format for personal and non-commercial use. You can find more information, including legal, at IMDb Non-Commercial Datasets. These are some notes on the schema.

There are 7 files provided, at the time of writing this article:

# Name Compressed size (MB) Uncompressed size (MB) Number of rows
1 name.basics.tsv.gz 245 753 12,981,035
2 title.akas.tsv.gz 305 1783 37,728,267
3 title.basics.tsv.gz 172 841 10,285,368
4 title.crew.tsv.gz 66 325 10,285,368
5 title.episode.tsv.gz 41 196 7,844,603
6 title.principals.tsv.gz 436 2475 58,914,239
7 title.ratings.tsv.gz 7 23 1,366,240

There are 2 unique alphanumeric identifiers in those files:

  1. tconst is an ID for a title, and
  2. nconst is an ID for a name.

This diagram shows the relationships between the 7 data exports. This isn't exactly an entity relationship diagram, but it's not too far either.

This diagram was created using DBML and can be imported into Here's the code:

Table name_basics {
  nconst string [primary key]
  primaryName string
  birthYear number
  deathYear number
  primaryProfession string_array
  knownForTitles nconst_array [ref: < title_basics.tconst]

Table title_basics {
  tconst string [primary key]
  titleType string
  primaryTitle string
  originalTitle string
  isAdult boolean
  startYear number
  endYear number
  runtimeMinutes number
  genres string_array

Table title_akas {
  titleId string [ref: > title_basics.tconst]
  ordering integer
  title string
  region string
  language string
  types string_array
  attributes string_array
  isOriginalTitle boolean

Table title_crew {
  tconst string [ref: - title_basics.tconst]
  directors nconst_array [ref: > name_basics.nconst]
  writers nconst_array [ref: > name_basics.nconst]

Table title_episode {
  tconst string [primary key]
  parentTconst string [ref: > title_basics.tconst]
  seasonNumber number
  episodeNumber number

Table title_principals {
  tconst string [ref: - title_basics.tconst]
  ordering number
  nconst string [ref: - name_basics.nconst]
  category string
  job string
  characters string

Table title_ratings {
  tconst string [ref: - title_basics.tconst]
  averageRating number
  numVotes number

5 First Rows

Here's a sample of the data, these are the first 5 rows from each export.


nconst primaryName birthYear deathYear primaryProfession knownForTitles
nm0000001 Fred Astaire 1899 1987 soundtrack,actor,miscellaneous tt0050419,tt0053137,tt0072308,tt0031983
nm0000002 Lauren Bacall 1924 2014 actress,soundtrack tt0075213,tt0037382,tt0038355,tt0117057
nm0000003 Brigitte Bardot 1934 \N actress,soundtrack,music_department tt0054452,tt0056404,tt0057345,tt0049189
nm0000004 John Belushi 1949 1982 actor,soundtrack,writer tt0080455,tt0072562,tt0077975,tt0078723
nm0000005 Ingmar Bergman 1918 2007 writer,director,actor tt0083922,tt0069467,tt0050986,tt0050976


titleId ordering title region language types attributes isOriginalTitle
tt0000001 1 Карменсіта UA \N imdbDisplay \N 0
tt0000001 2 Carmencita DE \N \N literal title 0
tt0000001 3 Carmencita - spanyol tánc HU \N imdbDisplay \N 0
tt0000001 4 Καρμενσίτα GR \N imdbDisplay \N 0
tt0000001 5 Карменсита RU \N imdbDisplay \N 0


tconst titleType primaryTitle originalTitle isAdult startYear endYear runtimeMinutes genres
tt0000001 short Carmencita Carmencita 0 1894 \N 1 Documentary,Short
tt0000002 short Le clown et ses chiens Le clown et ses chiens 0 1892 \N 5 Animation,Short
tt0000003 short Pauvre Pierrot Pauvre Pierrot 0 1892 \N 4 Animation,Comedy,Romance
tt0000004 short Un bon bock Un bon bock 0 1892 \N 12 Animation,Short
tt0000005 short Blacksmith Scene Blacksmith Scene 0 1893 \N 1 Comedy,Short


tconst directors writers
tt0000001 nm0005690 \N
tt0000002 nm0721526 \N
tt0000003 nm0721526 \N
tt0000004 nm0721526 \N
tt0000005 nm0005690 \N


tconst parentTconst seasonNumber episodeNumber
tt0041951 tt0041038 1 9
tt0042816 tt0989125 1 17
tt0042889 tt0989125 \N \N
tt0043426 tt0040051 3 42
tt0043631 tt0989125 2 16


tconst ordering nconst category job characters
tt0000001 1 nm1588970 self \N ["Self"]
tt0000001 2 nm0005690 director \N \N
tt0000001 3 nm0374658 cinematographer director of photography \N
tt0000002 1 nm0721526 director \N \N
tt0000002 2 nm1335271 composer \N \N


tconst averageRating numVotes
tt0000001 5.7 2004
tt0000002 5.8 269
tt0000003 6.5 1903
tt0000004 5.5 178
tt0000005 6.2 2685

Posted on