What does Everyone Buy on Amazon?
By: Avishek Ghosh(Project Lead), Sudhanshu Agrawal, Angelina Kim, Zoey Meng, Cyrus Ho
An investigation of the best-selling products on Amazon, and drawing a comparison to its competitor Walmart.
Introduction
With the COVID-19 pandemic forcing many stores to shut down, people realized the convenience and comfort of shopping on online retailers such as Amazon over a physical store. Our team was interested in analyzing the Kaggle dataset containing information for over 2 million products that were classified as ‘Best Seller’ on Amazon. The team also wanted to draw a comparison between Amazon and Walmart using this supplementary dataset.
Most Popular Products within Different Categories
Visualizing the most frequently appearing words in the product titles across various categories gives us a sense of consumer preferences and desires. For example, the word waterproof appears frequently across multiple categories, emphasizing that this is a quality that consumers want in many of their products. Looking at the pantry section, it is interesting to note that dietary restrictions and desires such as organic, gluten free and keto diet are observed more often than any particular food item.
While words such as shirt and pants are very common in the fashion category as one might expect, it is noteworthy that words such as pockets and plus size appear frequently as well. We can hypothesize that this can be due to womens jeans often not having pockets, leading many to explicitly search for jeans with pockets. Rather than any particular size such as XL showing up, the word ‘plus size’ is used much more frequently. This can be an indication of the great variance of what each size represents at each clothing brand. It is already known that plus sized people have difficulty in finding good clothes and perhaps, the use of the word ‘plus size’ makes it easier for them to find what they are looking for, rather than having to purchase the outfit and then try it on.
Rating / Price
It is a common assumption that public ratings have a negative bias. People who like their products don’t feel a need to leave a rating. Rather it is the people who are dissatisfied with their products who are more inclined to leave a rating, therefore tilting the overall rating towards the lower end. This graph seems to contradict that assumption.
The graph above shows the distribution of the different ratings on Amazon products. As it is quickly obvious, most of the ratings cluster towards the higher end of the scale. In fact, over 80% of ratings have 4 or more stars, and over 50% of the ratings are 4.5 stars or above. A mean of 4.393 stars shows definitively that contrary to what the assumption of negative bias suggests, ratings on Amazon actually tilt towards the higher end of the scale. Less than 10% of those who rated gave 3 stars or below — that is fewer than those who gave 5/5 stars.
Should you, as a consumer, spend more for higher quality or buy cheaper to save more money? One of the first things considered by many shoppers is the price of a product. Due to the common sentiment that higher prices indicate better quality, it’s also expected to create more satisfied customers and garner higher ratings. To further investigate this belief, we charted each product based off of their price and rating as an indicator for customer satisfaction.
Based on the scatterplot and regression line, we can conclude that a more expensive product does not necessarily correlate with a higher rating. In fact, it seems that there is no linear trend at all to explain the relationship between prices and ratings. There were unexpected instances with prices as low as 1 cent earning a rating of 4.7 while prices as high as $867 were given a rating of 2. This may be due to several factors such as with more expensive products, there are higher expectations that might not be met whereas inexpensive products have more value for the given cost. However, in future shopping expenditures, don’t count out inexpensive products because they may exceed expectations for a much lower cost.
Clustering Product Categories
Due to the very large number of unique product categories (over 6000), we attempted to cluster them based on semantic similarity. We chose to use the BERT model to calculate the semantic similarity between the different category titles. Once this score was calculated, we performed K-means clustering using the semantic similarity as a distance metric. We then calculated the center of each of the clusters and chose the category that was closest to this center as the ‘label’ for the cluster. The cluster centers as seen above include ‘Computer Accessories’, ‘Bed and Furniture’, ‘Battery Accessories’, etc. We chose to use 20 clusters as seen in the above image.
We then analyzed the products corresponding to each of these categories considering the price, number of reviews, and rating. The image above shows the clusters as bubbles, where the size is proportional to the number of reviews available. Interestingly, and perhaps unsurprisingly, we found that the ‘Computer Accessories’ cluster had the highest number of reviews, followed by ‘Mattresses’, representing bedroom furnishings.
As mentioned above, we created clusters of the different product categories. The visualization above plots the product category feature vectors as projected onto the 2D plane using TSNE. The colour of the points correspond to the cluster they belong to. By examining the interactive visualization, we can see that the items that fall into the clusters are described well by the cluster label. For example, the ‘Computers and Accessories’ cluster includes ‘HDMI Cables’, ‘VGA Cables’, ‘PC Games’, etc. Thus, the clusters that are formed also make sense when inspected manually.
Amazon vs Walmart
In terms of the retail industry, Amazon has assuredly taken over the world by providing quick services, an abundance of products, and relatively cheap prices, where people can find most of what their heart desires. In order to maximize their profits while also remaining affordable to most individuals, Amazon is shown to have one of the larger range of products that are available for lower prices than most services. Thus we wanted to investigate how product prices compare to other retail services, deciding to focus on Walmart.
The figures above display the popularity of certain products bought from Walmart and Amazon with overlapping similarities where ‘Health and Household’, ‘Home’, and ‘Home improvement’ are the more popular categories. It can be seen that Amazon has a bigger entry of items for its most popular category (Home and Kitchen) at 13000 counts as opposed to Walmart that has around 6000 counts for its most popular category (Clothes and Jewelry).
Next we wanted to compare the range of prices for similar categories of products from the two datasets and explore any overlaps or differences. The figures below show a bar plot of prices vs categories from the Walmart and Amazon datasets, with most price ranges falling between $10–300.
By comparing the two plots we can notice a significant difference in price ranges for the two retail services. By contrasting categories such as ‘Home’, ‘Clothes and Jewelry’, and ‘Home Improvement’, we can clearly spot a trend where Walmart shows on average higher prices than Amazon. This could be explained by hypothesizing that Amazon decreases the prices or sells less expensive items in order to appeal more to its target audience. Indeed, as we researched more about this topic, we found studies that showed that Amazon and Walmart directly compete with each other in terms of their e-commerce sales with Amazon leading the retail races most of the time.
Conclusion
When in need of something, many people today only search for their items on Amazon. From a consumer standpoint, this is a very valid choice because Amazon consistently offers an enormous range of products at prices lower than any of its competitors. From a business standpoint, Amazon dominates the retail market. Local and small businesses have suffered greatly, especially with the pandemic, and continue to lose customers. As climate change becomes a more pressing issue, the unsustainable operations of Amazon may soon come into the spotlight and deter certain consumers from buying anymore from Amazon. However, it is unpredictable whether that will happen and until then, Amazon will continue to be the strongest player in this industry.
Our team hopes that this article can help shed some light on consumer buying and e-commerce trends that can help us become more well informed buyers.
The link to our Github repository can be found here.