This website is for educational and class presentation purposes only and not intended for professional use.
Summary
This website has been created to showcase our final project for DS105M. Our goal was to create a web scraping plug-in program to seamlessly extract Amazon customer reviews. To demonstrate our program, we extracted data on a computer mouse – the Logitech MX Master 3 and performed data analysis on its customer reviews.
Context
In the modern day business-to-consumer industry, no company has had as strong of an influence on reshaping the competitive landscape as Amazon, especially when it comes to expediting the transition from brick-and-mortar to ecommerce dependency. With an average of 200 million unique visitors per month and over 6 million worldwide sellers, Amazon’s sheer size allows it to maintain an ever-growing grip over the ecommerce industry.
One of the key factors that has contributed to Amazon’s growth over time is the center-stage role that customer reviews play in the Amazon shopping experience. Customer reviews are a huge value add for both buyers and sellers on Amazon as they function as an accurate and digestible information channel for customer satisfaction. Customer satisfaction is a benefit that extends both ways: buyers want products they purchase to fulfill their needs, while sellers want to increase customer retention and future customer acquisition by giving more buyers what they want. Amazon enables new buyers to refer to reviews of customers with similar desires to themselves, identifying whether the product succeeds in satisfying their areas of need. At the same time, sellers can analyze customer reviews to derive conclusions so they can improve future iterations of their products.
Typically, scrolling through the customer reviews section of an Amazon-listed product is enough information for most users to make a conclusion about the product. However, there is a vast variety of extraordinary cases that would require extraction of a subset or all reviews available on an Amazon page. The most obvious use case of review extraction is simple record-keeping, say if a seller wants to keep a copy of reviews published on their product at a given point in time. In more advanced use cases, review data can be analyzed further through descriptive analytics like data visualizations, or NLP methods such as sentiment analysis, opinion modeling, topic modeling, and more. Such analysis should help sellers understand what customers think about their products in a more efficient and digestible manner.
Motivation
With this large variety of use cases comes an unaddressed issue: there is no easy or efficient method to extract customer reviews. With this in mind, our primary goal was to create a web-scraping plug in program that can seamlessly extract customer reviews from any Amazon page. Such a program can then be used by businesses, researchers, shopppers, or any other third party interested in scraping Amazon reviews.
Additionally, we wanted to provide our own use case of how such data can be analyzed. We attempted to perform some analysis on a computer mouse popularized by the gaming and professional services industries, the Logitech MX Master 3. Our goal was to find some key insights that Logitech can consider in future iterations of the mouse and provide an example of how the data can be used in a business setting. We attempted this process using a variety of descriptive analytics and data visualizations, as worked on a NLP model for further analysis. In the end, we ran into a variety of challenges that made us realize that we overestimated the usability of Amazon review data, particularly relating to the data itself being biased towards positive reviews, making it especially difficult to perform quality data analysis.
Collaboration
This project was completed by David Veksler, Abhinav Vijayakumara, and Zixuan (Vanessa) Wang. The following lists each group member’s contributions:
David: Worked on project scope, project planning, code to extract Amazon Reviews, code on cleaning the extracted data, finalized code for users that want to try the plug-in model, presentation tempelate, presentation text and visuals for week 8 and week 11, website configuration and design, website text and visuals, repository management and commits.
Abhinav: Worked on code to extract Amazon Reviews, code on cleaning the extracted data, presentation slides for week 8 and 11, Data Analysis including bar chart,boxplots,histogram and line graph,website text and visuals,commits.
Zixuan (Vanessa): Worked on natural language processing analysis namely sentiment analysis, used sentiment intensity analyzer to compare distribution and generated various types of wordclouds, made data visualizations such as boxplots and bar charts, presentaion slides for week 8 and 10, commits.
- This website is for educational purposes only and not intended for any professional or monetizable use.
- To access our Github Repository click here.
- To learn about the course and project requirements click here.