Big data is all the rage right now. Everything that wasn’t called big data before the catchphrase became a catchphrase is now big data. While size of a database is certainly part of what makes data “big,” that in itself isn’t doesn’t make large data the phenomenon that the collective consciousness has coined “big data.” I think … Continue reading Causal inference with big data – Part 1
Last time, I stopped with a preview of a TripAdvisor data on an animated gif map of the US. Here it is again: (Click to enlarge) I think the patterns in the timing of reviews (which I assume to approximate when people travel) are quite subtle, yet obvious. But when the obvious is visualized, it is generally more … Continue reading Fun with TripAdvisor maps!
My paper with Alex Chaudhry on management response to online reviews draws heavily upon crawled data from TripAdvisor. We began collecting data looking at Las Vegas hotels, a good starting source for sampling travelers from around the world. Everyone’s been to Vegas, and Vegas guests have been literally everywhere: We created the above plot by plotting … Continue reading Some empirical tidbits from TripAdvisor
Ok, so this blog will be more of a tutorial for those who face similar issues of using a storage instance on EC2. As a Linux/Ubuntu newbie, figuring out how to write files from Python to that 1TB EC2 SSD storage volume was probably one of the more frustrating things. The solution is actually quite simple, so … Continue reading So iPython can’t use the EC2 storage I paid for … solution
A large portion of my current research on consumer reviews (of hotels) with coauthor Alex Chaudhry revolves around processing online review data that includes a lot of text (200k+ reviews or about 34m+ words). One of the ways we wanted to explore the data is through grammatical structure. For example, reviews with a lot of verb … Continue reading Natural language processing (NLP), grammar, computational times