Stranger Than Fiction

Driven by recent insights into my personal reading habits, I was eager to look more broadly into national habits. I headed to the New York Times’ bestseller lists for answers, recognizing the power those lists hold in shaping American reading. What drives the country’s book purchases? Do certain publishers hold the keys to reaching wide audiences? Do the identity of the authors mirror us as a society?

The answers to those questions seem relevant to both those who revere the list and those who revile it. A deeper understanding of the publishers and authors behind those top-selling tomes may help us all become more discerning users of the lists anyway, helping us put our buying power behind books and presses that reflect our values and perhaps borrowing reads that we prefer not to have that sway.

The data set I found, one posted on Kaggle and pulled from the Times’ API, focused on the hardcover fiction bestsellers from 2008 – 2018—a full decade’s worth of titles. The first chart depicts the total number of weeks in which each publisher held bestselling spots. In addition, each publisher’s bar is segmented by the total weeks for each of their bestselling authors. The visualization looks hopeful: aside from a few big names and a few lead publishing houses, it appears that a wide variety of writers and companies get a shot.

Clicking on the PUBLISHERS link provides a deeper look, one where those publishing houses or imprints are grouped by the parent companies that own them. (I gathered this data through online searches, corroborating Google and Wikipedia information with company pages.) This bar graph reveals a very different story, as Penguin Random House owns 50 of the imprints, as many as its next two competitors combined. The list shrinks from a seeming expanse of over 150 companies to the mere 18 parent companies who own them. The bar chart below that then layers the number of weeks each of those owned companies contribute to the parent companies’ slices of the pie, with Penguin Random dominating the field. They account for over half of the 10,400 decade’s slots (20 slots for each of the 52 weeks per year), outstripping not just their two closest competitors, but all of their competition combined.

Heading back to the “Times” page and choosing AUTHORS provides an exploration of the writers behind these books by name, gender, race, and nationality. This data set was, again, collected by hand, corroborating Wikipedia pages with author and publisher pages. The original data set was also “righted” by attributing all books written by well-known authors, such as Clive Cussler or Anne Rice, to the famed name, regardless of whether they had a co-writers or wrote under pen names.

Again, the first view at the data seems hopeful, with an array of authors making the list. Yes, some big names such as James Patterson or John Grisham get more space than others, but so do some individual books such as Anthony Doerr’s All the Light We Cannot See and Kathryn Stockett’s The Help. A look by gender seems fairly equitable as well, as men account for just over half of the celebrated titles. A look by race is fairly shocking: nearly all of the books are written by white writers. Noting the publishers (included in the tool tips that appear when the cursor is on any one data point) reveals one upside to the conglomerations behind the list: the big houses are usually the ones supporting the writers of color. Finally, a view by nationality suggests that we read mostly American authors, and, beyond that, at authors from English-speaking countries such as England, Ireland, Canada, and Australia. Countries like Sweden and Chile are largely represented thanks to the popularity of a particular author such as Stieg Larsson and Isabelle Allende respectively.

Back on the “Times” page is a link to one more graph: the FINAL THOUGHT. Coming to this exploration, I was most eager to see if our bestselling authors mirrored our society. The data start in 2008 when Barak Obama became our first black president, so this final waffle grid compares the percentage of black male authors on the list over the decade with the percentage of black men in American society. The disparity is significant, with less than one percent on the list despite a 6% share of the citizenry.

I designed the Tableau pages to mimic the look of the best sellers lists themselves, using Times New Roman for the prominent title and gray subtitle, all-caps Arial text for the “genres” of the graphs, and thin gray and black lines as delimiters.

For the graphs, I let the data drive the design choices. The three stacked, horizontal bar charts so clearly showed Penguin Random House spatially surpassing the others in the race to prominence, unbeatable by the final, most accurate calculation. The tree charts used to investigate the writers’ identities both allowed a view by proportion while promoting a deeper dive into individual titles and authors. (With the granularity of titles, the tree charts also look a bit like compartmental bookshelves.) Those four looks—name, gender, race, and nationality—together on one page afford a bird’s-eye look at diversity as well, giving the viewer a chance to make predictions based on their expectations, and then sussing the details out. Finally, I chose a waffle grid for the FINAL THOUGHT not only to show comparative proportion, but to humanize the data. Every author is, of course, a person—both an individual and a set of identity markers—and a 15 x 15 waffle grid seems to depict what must feel like the absolute isolation a black, male author must feel in this company. If the list better mirrored the population, each black, male author should have another 12 to join him, but he does not.

This glimpse begs for further investigation, both within the set itself, across other Times lists, and outside of the Times. Within the set, I’d like to consider time as a factor. Are more diverse authors an upward trend even within their small representation? I’d also like to look at paperback and audio sales of the same time period, wondering to what degree the form of a text factors into who buys what. Perhaps more suburban readers choose heavy hardbacks as opposed to urban commuters (like me) who prefer paperbacks. The paperback list might better represent a wider variety of authors—those who were discovered years after their books first came out—perhaps by winning a literary prize or by being dubbed as an Oprah Winfrey book club selection. Of course, the Times lists are also just one marker of American readership, so I’d also like to look into other lists including those not based on sales (such as the Times’ or NPR’s best books of each year) or library records.