Basic text analysis tutorial – string manipulations, basic sentiment analysis

The previous series of blogs documented how causal inference arguments (difference in differences, regression discontinuity design, and natural experiments) are applied in big data settings within the online reviews domain. This domain also happens to be a great setting to quantitatively analyze textual data. We have a corpus of nearly 18 million reviews for hotels … Continue reading Basic text analysis tutorial – string manipulations, basic sentiment analysis

Natural language processing (NLP), grammar, computational times

A large portion of my current research on consumer reviews (of hotels) with coauthor Alex Chaudhry revolves around processing online review data that includes a lot of text (200k+ reviews or about 34m+ words). One of the ways we wanted to explore the data is through grammatical structure. For example, reviews with a lot of verb … Continue reading Natural language processing (NLP), grammar, computational times