Analysis of Yelp Ratings and Health Inspection Grades of Restaurants in Los Angeles
Authors: Paige Lee (Project Lead), Damien Ha, Kelsey Lin, Ayaan Raina, Tommy Tan, Caleb Tran
Introduction
Over the past decade, Los Angeles has consistently ranked within the top 10 best food cities in America, according to The Washington Post, U.S. News & World Report, Food & Wine, and more. We wanted to investigate success metrics of LA restaurants that focus more on the customer experience rather than business productivity. Specifically, we selected customer ratings of restaurants from Yelp (1 to 5 stars) and health inspection grades (A to C) that are displayed on restaurant storefronts.
Our dataset focuses on 307 restaurants in LA. These restaurants were identified in the intersection of the Yelp dataset and the health inspection dataset that we used. The Yelp dataset contains customer ratings through 2019, while the health inspection dataset is limited from 2015 to 2018. Our merged dataset contains a total of 2,738 observations since many restaurants were inspected for compliance with food safety regulations multiple times during this timeframe.
Our article is divided into two main sections. The first section contains various statistical analyses and data visualizations in the form of scatterplots, line graphs, and bar plots. The second section analyzes the data through a geographic lens using maps.
1. Statistical analyses and data visualizations
1.1 What is the relationship between the number of customer reviews from Yelp (popularity) and health inspection grade?
The scatter plot illustrates the relationship between the number of reviews and restaurant health grade. The line of best fit, displaying a slight negative slope, suggests a weak negative correlation between these variables. However, it is important to note that review scores above 90 have been excluded from the analysis due to their overwhelming presence as they masked any discernible correlation. The negative slope may be influenced by consistency issues, as higher review counts could indicate challenges in maintaining quality and adherence to health standards. Additionally, increased visibility could lead to more diverse reviews, potentially including negative experiences that contribute to a downward trend in the health grade.
1.2 What is the relationship between health inspection ratings and customer ratings from Yelp?
Next, we decided to examine the correlation between average health inspection ratings and customer ratings in restaurants in Los Angeles County. The x-axis represents average health inspection ratings out of 100, while the y-axis represents average overall customer ratings.The trend line plotted suggests a weak positive correlation between the two variables, namely, the equation of the line is: 0.0220275 * Avg Score + 2.04179. As health inspection ratings increase, there is a slight tendency for overall ratings to also increase. This implies that higher health inspection ratings are associated with better overall ratings from customers or reviewers. While the correlation is weak, it emphasizes the importance of maintaining good health inspection ratings. Adherence to health regulations and cleanliness standards can positively impact the dining experience and customer satisfaction. We must keep in mind that other factors, such as food quality, service, and location, can also influence overall ratings. However, the observed positive trend suggests that health inspection ratings contribute to shaping the overall perception and satisfaction of customers.
1.3 How do the average health inspection score and the average Yelp rating vary over time?
These two line graphs aim to explore if the health inspection ratings, as well as overall customer ratings of LA restaurants, have generally changed over time. Looking at the graph comparing average health score with inspection date, one can notice an overall increasing trend in the scores over the course of June 2015 to June 2018. This may suggest that either health inspectors are becoming less strict in their gradings, or restaurants in LA are better adhering to health inspection criteria in recent years. On the other hand, the graph comparing average overall customer rating against time appears to demonstrate no strong correlation. We assume that the peaks throughout the timeframe are due to random fluctuations, and that average customer ratings of restaurants are not impacted by timescale.
1.4 How do average and maximum Yelp ratings vary across major cuisines?
The above bar chart aims to explore average customer ratings as well as the maximum customer ratings for the top 5 most popular cuisines in Los Angeles from our dataset. Looking first at the average customer ratings within each category, we can see that Asian cuisine received an average rating of 4.033, followed by Mediterranean (4.0769), American/Contemporary (4.103), Mexican (4.107), with Italian scoring the highest (4.175). While the variation in average ratings is more subtle, we can obtain deeper insights into the range of customer experiences by simultaneously comparing overall maximum ratings for each cuisine.
American/Contemporary, Italian, Mexican, and Mediterranean cuisines all received 4.5 as their highest customer satisfaction ratings, but an interesting pattern was within Asian cuisine. More specifically, while Asian cuisine received the lowest average customer rating, it also received the highest maximum customer rating of 5.00, indicating that customer experiences within Asian cuisine vary widely. Individual experiences and opinions play a crucial role in shaping ratings, and this suggests that Asian cuisine offers a very diverse range of dining options within the Los Angeles area. A possible conclusion that can be drawn from this observation is that the larger variation in ratings can be attributed to a greater variety of Asian restaurants represented in our dataset compared to other cuisines. A higher sample size can suggest greater variation in customer satisfaction, ultimately leading to ratings for dining experience on both ends: quite exceptional versus subpar.
2. Maps
2.1 What does the geographic distribution of average health inspection scores in LA look like?
For our first map, we decided to explore the relationship between location and average health inspection scores for restaurants in Los Angeles County, aiming to answer the question whether a correlation exists between health ratings and the cleanliness of the neighborhood/city. In the above visualization, darker blue circles indicate higher average health scores. The map reveals a scattered distribution of health scores across the city, indicating no concrete relationship with location. However, one possible cluster that can be observed is that there appear to be darker shades near downtown Los Angeles suggest higher health scores among restaurants in that region. The concentration of darker shades near downtown Los Angeles implies a potential clustering of reputable restaurants with excellent health standards. This trend may be influenced by factors such as established dining establishments and a focus on healthy dining practices.
2.2 What does the geographic distribution of average Yelp ratings in LA look like, by individual restaurant and by zip code?
The above two maps examine the distribution of 1–5 star ratings across LA County. The top map has each zipcode colored by rating while the bottom map has every individual restaurant plotted, again colored by rating. From the zip code map (top), it appears that average ratings of restaurants across LA county are actually fairly consistent overall, with few variations in ratings across the county. Most of the zipcodes are a very similar color (a medium dark blue) suggesting their average ratings are fairly close to each other. Specifically, the color seems to indicate moderately positive average ratings, around 3.5–4 stars. That said, there is a curiously dark zipcode right next to a curiously light zipcode on the southeast region of the zip code map (top). Upon inspection of the individual restaurants map (bottom), it may appear this way simply due to a low number of reviewed restaurants in the data for this region, thus only a few restaurants/reviews affect the whole zipcode’s rating. Also, there is more variation in ratings between individual restaurants as opposed to average ratings in each zip code. There are lighter dots scattered amongst the medium/darker ones, which represent lower rated restaurants. Based on this observation, it would seem that higher rated restaurants must boost the average of each county considering most counties are a similar shade
2.3 How does the geographic distribution of health inspection grades in LA relate to restaurant popularity (number of Yelp reviews)?
The above two maps explore the relationship between the location, health inspection grades, and number of Yelp reviews for restaurants in LA County. The maps show the distribution of restaurant locations, colored by their health inspection grades, and represented by dots that correspond to the number of Yelp reviews each restaurant received. The first map shows the distribution of all health grades (A, B, C), while the second map shows the distribution of only health grades B and C. The reason we created two maps to illustrate the distribution of health inspection grades is because most restaurants in the dataset were inspected multiple times between 2015 and 2018. In the first map, almost none of the B or C ratings are visible, meaning that almost all of the restaurants that received B or C ratings at some point also received at least one A rating. The second map filters the restaurants by only showing those that received at least one B or C rating.
Based on the first map, we observed that most restaurants that received at least one A rating from 2015 to 2018 are located in West Hollywood and Downtown LA. Further, the restaurants that received the greatest numbers of Yelp reviews (most popular) tended to be located in Downtown LA. Based on the second map, it seems like most restaurants that received B or C ratings are also located in West Hollywood and Downtown LA, with several in Santa Monica as well. Of the restaurants that received B or C ratings, most of them received at least 100 Yelp reviews as of 2019, so these poorly rated restaurants still had decent popularity on Yelp.
Next, we were curious about further investigating the restaurants that stood out most on the second map. We examined a few specific restaurants that received poor health inspection grades in addition to large numbers of Yelp reviews. One such restaurant was a fast food restaurant called The Oinkster in Eagle Rock, CA. According to our data, The Oinkster had 3,648 Yelp reviews and a 4 star rating as of 2019. From 2015 to 2018, the restaurant received 18 health inspections, two of which were Bs, and the remaining were As. Another restaurant that stood out on the second map is a Korean BBQ restaurant called Road to Seoul in Harvard Heights. Road to Seoul had 3,740 Yelp reviews and a 4 star rating as of 2019. From 2015 to 2018, the restaurant received eight health inspections, one of which was a B in 2015, and the remaining inspections were As. The third restaurant we examined was Tatsu Ramen in West Hollywood. As of 2019, this Tatsu Ramen location had 3,625 Yelp reviews and a 4 star rating. From 2015 to 2018, the restaurant received 18 health inspections, one of which was a C, four were Bs, and the rest were As. In the case of both The Oinkster and Tatsu Ramen, their poor health inspection grades did not seem to be related to time–the few poor ratings were flanked by A ratings. From this analysis, we can infer that even restaurants that may have received a poor health inspection grade at some point, may still receive mostly As, high Yelp ratings, and high popularity as reflected by large numbers of Yelp reviews.