Predicting non-native seaweeds global distributions: The importance of tuning individual algorithms in ensembles to obtain biologically meaningful results
Sainz-Villegas, S.; de la Hoz, C.F.; Juanes, J.A.; Puente, A. (2022). Predicting non-native seaweeds global distributions: The importance of tuning individual algorithms in ensembles to obtain biologically meaningful results. Front. Mar. Sci. 9: 1009808. https://dx.doi.org/10.3389/fmars.2022.1009808 In: Frontiers in Marine Science. Frontiers Media: Lausanne. e-ISSN 2296-7745, more | |
Keywords | | Author keywords | ensemble, invasive, macroalgae, non-native, seaweeds, species distribution models |
Authors | | Top | Dataset | - Sainz-Villegas, S.
- de la Hoz, C.F.
- Juanes, J.A.
- Puente, A.
| | |
Abstract | Modelling non-native marine species distributions is still a challenging activity. This study aims to predict the global distribution of five widespread introduced seaweed species by focusing on two mains aspects of the ensemble modeling process: (1) Does the enforcement of less complex models (in terms of number of predictors) help in obtaining better predictions? (2) What are the implications of tuning the configuration of individual algorithms in terms of ecological realism? Regarding the first aspect, two datasets with different number of predictors were created. Regarding the second aspect, four algorithms and three configurations were tested. Models were evaluated using common evaluation metrics (AUC, TSS, Boyce index and TSS-derived sensitivity) and ecological realism. Finally, a stepwise procedure for model selection was applied to build the ensembles. Models trained with the large predictor dataset generally performed better than models trained with the reduced dataset, but with some exceptions. Regarding algorithms and configurations, Random Forest (RF) and Generalized Boosting Models (GBM) scored the highest metric values in average, even though, RF response curves were the most unrealistic and non-smooth and GBM showed overfitting for some species. Generalized Linear Models (GLM) and MAXENT, despite their lower scores, fitted smoother curves (especially at intermediate complexity levels). Reliable and biologically meaningful predictions were achieved. Inspecting the number of predictors to include in final ensembles and the selection of algorithms and its complexity have been demonstrated to be crucial for this purpose. Additionally, we highlight the importance of combining quantitative (based on multiple evaluation metrics) and qualitative (based on ecological realism) methods for selecting optimal configurations. |
Dataset | - Ramos, E., Sainz-Villegas, S., de la Hoz, C.F., Puente, A., Juanes, J.A. (2023) Species Distribution Models for invasive macroalgae. Integrated data products created under the European Marine Observation Data Network (EMODnet) Biology project Phase IV (EMFF/2019/1.3.1.9/Lot 6/SI2.837974), funded by the by the European Union under Regulation (EU) No 508/2014 of the European Parliament and of the Council of 15 May 2014 on the European Maritime and Fisheries Fund., more
|
|