In my current research with Alex Chaudhry, we study the relationship between trailer design and opening weekend box-office sales. One of the things we’re interested in is the relationship between how much of the movie is revealed by the trailer and the type of movie that is being advertised. If you think about movies as an experience good (like restaurants, video games, resort, etc.), there’s often a tradeoff between wanting to be sure you’re going to like the experience (risk aversion) and wanting to be (pleasantly) surprised by the experience. While risk aversion is a general property across consumers and their purchase decisions, surprise-seeking seems to be most important for experience goods – and more specifically, experience goods where the experience is path dependent. If going through the latter part of the experience first changes how the consumer feels about the earlier part of the experience, then path dependence is more important. In order to get a measure of path dependence in movies, we wanted to characterize basic emotional arcs of movie plots as belonging to a few archetypes:

This visualization is created using time-stamped movie scripts collected from yify subtitles for a dataset of ~1000 movies debuted in the last decade. We created these clusters using the following steps:
- Compute polarity using pre-trained sentiment classifiers.
- Normalize polarity by movie (compute polarity z-score)
- Normalize (compress/stretch) the timelines of every movie to 100 “minutes”
- Create 10-“minute” rolling average standardized polarity scores for every movie.
- Compute the entire correlation matrix for all movies.
- Construct a network graph, treating the correlation matrix as the adjacency matrix, dropping all correlations <0.
- Apply the Louvain algorithm for community detection on networks. This reveals the partition categorization for each node (movie).
- The graphs are just the standardized polarity timelines for each movie separated by partition. The orange line is the average polarity for each partition at each point in time.
The most representative 5 movies for each story arc are:
Story Arc ID | Movie |
1 | Elizabeth: The Golden Age |
1 | Jarhead |
1 | Elysium |
1 | Star Trek |
1 | Final Destination 3 |
2 | Bridge to Terabithia |
2 | Hostel |
2 | War Horse |
2 | Cloudy with a Chance of Meatballs |
2 | Takers |
3 | Blended |
3 | Red Tails |
3 | Skyfall |
3 | The American |
3 | Logan |
4 | Meet Dave |
4 | Batman v Superman: Dawn of Justice |
4 | Pulse |
4 | Beverly Hills Chihuahua |
4 | Devil |
5 | Rough Night |
5 | Ice Age: Collision Course |
5 | A Christmas Carol |
5 | Snowden |
5 | Your Highness |
I just wanted to share this cool intermediate step in our research project. Any thoughts on these storylines?