Data clustering is one of the most fundamental components of the machine learning toolkit. Many of these algorithms are easily implemented in Python, making it the go to language for data scientists when implementing canned routines. At the PyData conference last fall, I attended this talk by Leland McInnes & John Healy which introduced the HDBSCAN … Continue reading Automatic Neighborhood Detection
In the spirit of disseminating our latest edition of the manager response to online reviews paper, Alex and I have posted the current manuscript to SSRN for those who are interested. Let us know your thoughts via email. http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2831402 Abstract: This manuscript investigates the externalities of managers’ responses (MR) to online reviews on popular travel … Continue reading Revised and Better?
The previous series of blogs documented how causal inference arguments (difference in differences, regression discontinuity design, and natural experiments) are applied in big data settings within the online reviews domain. This domain also happens to be a great setting to quantitatively analyze textual data. We have a corpus of nearly 18 million reviews for hotels … Continue reading Basic text analysis tutorial – string manipulations, basic sentiment analysis
Continuing with the ongoing blog series on causal inference with big data (part 1 & part 2 here), we pick where we last left off. As a brief refresher, recall example 3. I explained the use of difference in differences (DD) methods is being applied to estimate the effect of management response to online reviews on … Continue reading Causal inference with big data – Part 3
Continuing with part 2 of the causal inference blog (read part 1 here), I want to examine 2 more examples of causal inference in the substantive sphere of online reviews. The first example is on research demonstrating managers’ manipulation of online reviews. The second example is on research demonstrating the externalities of public management response … Continue reading Causal inference with big data – Part 2
I pause the ongoing blog sequence on causal inference to bring you this. It is a global projection of our review data. And also this map for Houston Restaurant Weeks! Continue reading Halftime show (best viewed on desktop)
Big data is all the rage right now. Everything that wasn’t called big data before the catchphrase became a catchphrase is now big data. While size of a database is certainly part of what makes data “big,” that in itself isn’t doesn’t make large data the phenomenon that the collective consciousness has coined “big data.” I think … Continue reading Causal inference with big data – Part 1