What’s in a Good Read: Why Books do Well on Goodreads and the NYT Bestsellers List
Authors: Nikhil Dewitt (Lead), Peter Wang, Angelina Yang, Shaina Grover
What’s in a good book? It’s a refrain repeated by many people. For most people the answer is subjective; some people prefer fantasy novels, while others prefer intense thrillers. However, we can take a look at the reviews on goodreads and datasets of New York Times best sellers to get a sense of the factors that make books popular for both casual readers and hardcore analysts.
What are Good Reviews Made Up Of?
This gives a sense of what reviews on Goodreads look like. Words found in reviews with high ratings are in blue while those in reviews with lower ratings are in orange. You can see how reviewers with positive reviews almost feel that their books are perfect, but less positive reviews are often less aggressively bad and more nuanced, using words like “actually” to suggest that while the book was fine it wasn’t “perfect”, or words like “first” to indicate the dislike of a fresh idea. While this visualization provides a general gist of what words appear most often in good and bad reviews, we must dive deeper into more complex analysis to predict a potential NYT Best Seller.
Star Ratings and Written Reviews
Book reviews will often have two components to them: a star rating and a written review. A star rating is a visual system where readers can rate a book using a set number of stars (usually 1–5), providing a quick summary of their satisfaction with their reading experience. A written review is a more detailed text explanation outlining the reasons behind their ratings, usually offering deeper insights into their experience with the book. The quick and easy nature of star ratings tends to present more automatic and intuitive systematic thinking, while the longer and more detailed nature of written reviews generally involves a more deliberate and logical assessment of their experience. Therefore, it is common practice to pair a star rating with a written review, as it helps potential readers to understand the “why” behind a rating before they commit to a book.
However, we must consider that the more intentional nature of written reviews may misalign with their star ratings. This possibility can communicate a complex, conflicting evaluation of a book, resulting in some readers avoiding a book and the uncertain sentiment surrounding it. So, how often do star ratings and their written reviews really align in general sentiment?
To explore this topic, we construct a confusion matrix mapping the predicted sentiment of written reviews to the true sentiment of star ratings. As a method of comparing predicted values to actual values, the confusion matrix can assist in identifying accuracies and discrepancies between written reviews and their star ratings. Within the polarity range [-1, 1] we labeled reviews with polarity greater than 0 a positive and less than 0 as negative, while labeling star reviews greater than or equal to 4 as positive, less than or equal to 2 as negative, and all else as neutral.
The confusion matrix shows that 91% of true positive sentiments are predicted accurately, so readers can be mostly confident in the trustworthiness of 4–5 star ratings. Readers may also find favorability for a book in 3-star ratings, as these ratings may be explained by readers who may have enjoyed reading the book but would not pick it up again. Finally, it is best for readers to take into account that a slight majority of negative reviews may be more constructive than disagreeable. The predicted positive sentiments of negative star ratings may stem from a critic that may not have found enjoyment in the book but wants to contribute positively to an author’s growth in their next work.
Predicting Whether Books are Bestsellers
The next step was aiming to try to predict whether a book was an NYT Bestseller based on its Goodreads associated data. The number of pages, number of ratings, average ratings, and number of reviews were all included as features, with the goal being to predict whether it was a bestseller or not. A Logistic Regression model from SKLearn was used to conduct predictions. After initial results showed the vast majority of bestsellers were being labeled as not bestsellers, standard scaling was used to adjust the model to ensure that despite bestsellers comprising a small portion of the dataset, they could still be predicted accurately.
The following confusion matrix shows how the model performed in classifying both bestsellers and non bestsellers. The model correctly classified 88.77% of non-bestsellers as non-bestsellers and correctly classified 63.74% of bestsellers as bestsellers. It’s important to note that in the dataset there were 37,422 non-bestsellers versus 1,284 bestsellers, and even with standard scaling, it was still difficult to accurately predict bestsellers.
Important Features in Reviews
Further evaluations showed that some features mattered far more than others. Since looking at the coefficients of the regression is misleading as some features like number of pages had far more deviation than others, another technique had to be used. Permutation Importance analysis involves randomly shifting some of the parameters for one feature at the time and measuring how much it changed the model. This test was conducted to determine the true importance of each feature in terms of forming the predictive model.
Per the permutation test, the amount of reviews was the most significant variable in predicting whether a book was a best seller or not. This observation makes sense as most books named bestsellers have already gained a significant following in the literary community with most enthusiasts lining up to critique it. The number of pages was insignificant as most of the books in the dataset were fairly similar and the number of pages did not matter.
Breaking Down the NYT Bestsellers List by Category
A shared belief between many people is that almost every book is a New York Times Bestseller. Let’s break it down — the reason we see so many NYT bestsellers is because there is a new list each week, and there are a different number of slots allocated to each category. The large quantity of NYT bestsellers is why we must delve into understanding what makes a book stay on the bestseller list, and why it remains compared to other books.
From the visualization above, we see that series books and paperback advice books stay on the list for the largest number of weeks, averaging more than 140 weeks per book for both categories. This data can be interpreted in several ways. For instance, it is a clear indication of what categories of books have the largest popularity and the highest demand amongst readers, demonstrating what types of books people prefer and what sells the best. Furthermore, while it may be unexpected, the format of the book does in fact matter. Paperbacks tend to stay on the NYT Bestsellers list significantly longer than hardcovers, which is often a result of their lower costs.
Summary
While classifiers can provide some insight, the question of what makes a book a good read remains complicated. Metrics like ratings on Goodreads and whether a book is a NYT Bestseller can highlight how good a book is in the eyes of hardcore literary enthusiasts, but the reality is that a book’s value is subjective. Some people want to read literature that they can relate to, others want intricate fantasy novels that are far from their actual life. However, understanding these metrics is still valuable, especially for authors. Producing “successful” books is more than appealing to the most hardcore of literary enthusiasts, it’s about obtaining invaluable recognition that can boost one’s ability to get publishing deals for future novels. So while casual readers should not judge a book by these metrics, classifiers and charts like the ones outlined here can guide authors who need that final boost to producing works that can get the recognition these authors deserve.