Comparison of two machine learning frameworks for predicting aggregatory behaviour of sharks.
Abstract
Ecological monitoring is critical for conservation efforts, particularly in marine ecosystems. Recently, greater emphasis on analyses of data resultant from long-term ecological research has allowed for broader questions pertaining to climate change, species range shifts and reproductive phenology to be assessed, which may improve ecosystem management. Data collected via ecological monitoring, however, often feature strong class imbalances, complicating the development of models to predict such events. Here, we propose two modelling frameworks-a boosted regression tree (BRT) model and an artificial neural network (ANN)-for predicting exceptionally rare aggregatory behaviour of bull, Carcharhinus leucas and blacktip sharks, C. limbatus along the Gulf coast of Texas. In tandem with aggressive techniques for handling zero-inflated data, both methods produced accurate predictions of aggregations of three or more individuals in one survey event, with the BRT outperforming the ANN in minimizing type I error. Additionally, both models maintained relatively high area under the receiver operating characteristic (ROC AUC) values when the threshold for defining aggregative behaviour was raised, though there was a measurable drop-off in the precision-recall (PR) AUC at each successive threshold increase. These results provide support for both modelling approaches as highly viable in generating predictions based on monitoring data, even in situations where negative cases outweigh positive cases by more than 10-fold. This is promising for conservation and management of species that exhibit biologically and ecologically significant, but rare, behaviours like aggregations, and species that are rare in abundance and thus vulnerable to future declines. More accurately predicting aggregation events provides the information necessary to improve the protection of species that gather during key life-history events (e.g. mating, parturition, migration), and assess the spatiotemporal consistency of such events, thereby improving the efficacy of adaptive management. Synthesis and applications. Conservation and management programmes serve a critical role in maintaining the health of ecosystems, and regularly make use of ecological monitoring to collect data relevant to population studies. In marine systems, particularly, monitoring is often expensive and time-consuming, which can result in incomplete or sparse datasets. Especially where an ecological event of interest is rare, subsequent analyses are particularly affected by the many limitations of zero-inflated datasets. When training predictive models to classify these events, special care must be taken to avoid simply predicting negative events in every case. Here, we propose and compare two machine learning approaches: gradient boosting and the artificial neural network (ANN), to predict aggregatory behaviour of two shark species: bull, Carcharhinus leucas and blacktip sharks, C. limbatus in the Gulf of Mexico. Applied to a large dataset considering many spatial and environmental variables, and in tandem with negative case downsampling, our results support the application of each method to improve analyses of sparse monitoring datasets, with the gradient boosting machine performing particularly well in classifying positive cases.