The rise and fall of biodiversity in literature: a comprehensive quantification of historical changes in the use of vernacular labels for biological taxa in Western creative literature.
Nature's non-material contributions to people are difficult to quantify and one aspect in particular, nature's contributions to communication (NCC), has so far been neglected. Recent advances in automated language processing tools enable us to quantify diversity patterns underlying the distribution of plant and animal taxon labels in creative literature, which we term BiL (biodiversity in literature). We assume BiL to provide a proxy for people's openness to nature's non-material contributions enhancing our understanding of NCC. We assembled a comprehensive list of 240,000 English biological taxon labels. We pre-processed and searched a subcorpus of digitised literature on Project Gutenberg for these labels. We quantified changes in biodiversity indices commonly used in ecological studies for 16,000 books, encompassing 4,000 authors, as proxies for BiL between 1705 and 1969. We observed hump-shape patterns for taxon label richness, abundance and Shannon diversity indicating a peak of BiL in the middle of the 19th century. This is also true for the ratio of biological to general lexical richness. The variation in label use between different sections within books, quantified as β-diversity, declined until the 1830s and recovered little, indicating a less specialised use of taxon labels over time. This pattern corroborates our hypothesis that before the onset of industrialisation BiL may have increased, reflecting several concomitant influences such as the general broadening of literary content, improved education and possibly an intensified awareness of the starting loss of biodiversity during the period of romanticism. Given that these positive trends continued and that we do not find support for alternative processes reducing BiL, such as language streamlining, we suggest that this pronounced trend reversal and subsequent decline of BiL over more than 100 years may be the consequence of humans' increasing alienation from nature owing to major societal changes in the wake of industrialisation. We conclude that our computational approach of analysing literary communication using biodiversity indices has a high potential for understanding aspects of non-material contributions of biodiversity to people. Our approach can be applied to other corpora and would benefit from additional metadata on taxa, works and authors.