In isolation, the data appeared futile. When asking the question ’Is there any customer information in the dataset that should be kept private?’, the answer was:
‘No, all customer identifying information has been removed; all that remains are ratings and dates. This follows our privacy policy …’
However, Narayanan (2008) from the University of Texas at Austin proved otherwise. The combination of grades, date of grade and movie of an individual forms a unique movie-fingerprint. Think about your own Netflix behavior. How many people do you think watched the same set of movies? How many watched the same set of movies at the same time?
Main question, how to match this fingerprint? It was rather simple. Based on information from the well-known movie-rating website IMDb (Internet Movie Database), a similar fingerprint could be formed. Consequently, individuals could be re-identified.
While movie-watching behavior might not be presumed as sensitive information, think about your own behavior – would you mind if it went public? Examples that Narayanan provided in his paper are political preferences (ratings on ‘Jesus of Nazareth’ and ‘The Gospel of John’) and sexual preferences (ratings on ‘Bent’ and ‘Queer as folk’) that could be easily distilled.