Continuing with part 2 of the causal inference blog (read part 1 here), I want to examine 2 more examples of causal inference in the substantive sphere of online reviews. The first example is on research demonstrating managers’ manipulation of online reviews. The second example is on research demonstrating the externalities of public management response to online reviews.
Example 2: Promotional Reviews: An Empirical Investigation of Online Review Manipulation (Mayzlin, Dover & Chevalier 2014 AER) We’ve always wondered if that glowing 5-star review with very specific details about a company’s operations is actually written by a real customer. This paper investigates the extent to which online reviews are fake taking advantage of a natural experiment. In a nutshell, the natural experiment is the fact that TripAdvisor allows anyone to write a review while Expedia only allows customers who booked through their website to write a review. Obviously, it would be really expensive for hotels to write fake reviews on Expedia compared to TripAdvisor. The argument then follows that differences in the distribution of reviews between the 2 sites can be attributed to the presence of fake reviews. The authors demonstrate that this difference is especially the case when hotels have greater incentives to “cheat.” For example, while a small independent operator might get some bad press if caught writing fake reviews, it is nothing compared to the negative publicity that a large chain like Marriott would get if it did such a thing. The authors find that, indeed, small operators have inflated reviews on TripAdvisor vs. Expedia while chains do not. This finding simultaneously show the incentive hypothesis while demonstrating the comparability of reviews across both sites. I think the most interesting finding from this paper is actually about fake negative reviews. The authors find that hotels that are poorly rated on Expedia (and thus objectively bad), also bring down their competitors with them on TripAdvisor. The authors find that competitors that are located close to these objectively bad hotels are lower rated on TripAdvisor compared to Expedia. I think these 2 results are about as clean an empirical finding as a researcher can hope for and serve as my personal standard for empirical causal inference papers.
Example 3: Online reputation management: estimating the impact of management responses on consumer reviews (Proserpio & Zervas 2015 working paper)
No, not the kind above. Though I think this one might be very effective.
This kind. Which might be less effective…
Proserpio and Zervas, similar to another working paper by Ye, Gu & Chen (2010), test the idea that, by responding to individual reviewers publicly, managers create an externality. The idea is that when subsequent reviewers observe this response the recall of their opinion or the presentation of their opinion can be influenced by the presence of the manager’s response. In order to demonstrate this causal link, both papers use an empirical strategy termed difference-in-differences (DD). DD is a type of natural experiment that compares the difference in outcome before and after an intervention from an intervention group to that of a control group. In the case of management response to online reviews, both papers compare 2 sites, one that many managers use to respond to reviews and another that exhibits little management presence. Proserpio and Zervas use TripAdvisor (response site) and Expedia (no response site). The authors align the review history of every hotel (a sample of TripAdvisor hotels in Texas) at the point when the first response is written. They then compare the average ratings of reviewers on TripAdvisor +/- N months around this first “intervention” incident to those in the same periods on Expedia. The authors find that management response causes a .09 rating increase on TripAdvisor.
What is the type of confounding explanations / endogeneity issue that DD controls for? In general, DD controls for some underlying (perhaps latent) factor that affects both the outcome of interest and is correlated with the intervention of interest. In the case of management response to online reviews, this confounding factor is the hotel’s actual quality. Maybe managers’ responses reflect their attention to the actual problems with the hotel that guests complain about. As a result, these actual quality improvements increase the opinion of subsequent guests. By controlling for actual quality using ratings on Expedia, the authors argue that the additional gains in ratings on TripAdvisor above the Expedia gains are the result of the causal impact of responding to reviews. This suggests that it is not enough just to improve the hotel quality, that responding to reviewers can actually also increase subsequent ratings.
As persuasive as this argument is, there are some fundamental issues with DD. First, and most obvious, we can only use the first incidence of a response as a sign of a long term change in policy. Do hotels continue to respond after this first instance? If not, how would a management response buried in the archives of TripAdvisor reviews affect subsequent traveler opinion? Second, an endemic problem with DD is the endogenous timing of the treatment.
Imagine this hypothetical DD context. There are 2 villages, Alpha and Bravo. Identical twins are separated at birth and sent to live in these 2 towns. Both villages are susceptible to the common cold. A doctor claims to have created a cure for the cold. So researchers decide to administer the drug in Alpha every time someone has the cold. Now, logically, we would want to compare the duration of the cold for Twin A in Alpha with Twin B in Bravo. Instead, what DD does, is the equivalent of comparing the change in health of twin A to twin B every time twin A is administered the drug. Kind of weird, right? The thing is that DD works best if the treatment is exogenous. If managers are forced to respond to reviewers on TripAdvisor and cannot do so on Expedia, then we can have a fair comparison. The equivalent hypothetical example would be to replace the drug with a vaccine that gets administered only in Alpha. In this scenario, we should find that fewer colds will occur in Alpha than Bravo after this vaccine policy is put in place.
So how do we go about addressing this treatment selection issue? As we suggest in the hypothetical example, we would like to compare the case when the treatment is applied to twin A to the case when the treatment should have been (but wasn’t) applied to twin B. This method controls for the endogenous selection criteria for applying the treatment. Can we do this for the case of management response to online reviews?
Sure. One way is to have a 2 step estimation approach that uses a selection model in the first step to match cases when management response is applied on TripAdvisor to similar cases on Expedia where management response “would have been” applied had managers followed the same policy used on TripAdvisor. This is known as the Heckman correction method. While this method has a broad appeal in econometrics, it also suffers from some endemic issues. The most important one is that we need to be able to predict the manager’s policy. This generally requires us to find variables that affect the selection policy (management responses) but doesn’t affect our outcome (ratings). Basically, we need exclusionary (or instrumental) variables. [One technical note is that the selection is parametrically identified given some joint covariance distribution assumptions on the error structures of the selection and outcome equations, but this type of identification is generally not seen as credible.] I would suggest using a variable like average city level response rate as a possible predictor that would affect management response but not hotel rating.
In the next blog entry, I will introduce a natural experiment identification strategy that I use in my working paper with Alex Chaudhry that we think is a cleaner strategy than the Heckman correction method or similar propensity score matching methods.