发布者:抗性基因网 时间:2023-06-09 浏览量:263
摘要
抗生素耐药性细菌和抗生素耐药性基因(ARGs)是全球关注的污染物,严重威胁着公众健康和生态系统。机器学习(ML)预测模型已被应用于预测海滩水域的ARG。然而,现有的研究是在单一地点进行的,预测性能较低。此外,ML模型是“黑匣子”,无法揭示其预测的内部细微差别和机制。当在高风险决策中使用这些模型时,这种缺乏透明度和信任的情况可能会导致严重后果。在这项研究中,我们开发了一个基于梯度增强回归树(GBRT)的ML模型,然后使用六种可解释人工智能(XAI)模型不可知的解释方法描述了其行为。我们使用了来自韩国和巴基斯坦海滩的水文气象和qPCR数据,并开发了aac(6′-lb-cr)、sul1和tetX的ML预测模型,其10倍时间阻塞交叉验证性能分别为4.9、2.06和4.4均方根对数误差。然后,我们使用四种解释方法分析了所开发的ML模型的局部和全局行为。所开发的ML模型表明,水温、降水量和潮汐是预测休闲海滩ARGs的最重要预测因素。我们表明,模型不可知的解释方法不仅解释了ML模型的行为,而且还提供了对ML模型在新的看不见的条件下的行为的见解。此外,这些后处理技术可以作为基于ML的建模的调试工具。
Abstract
Antibiotic-resistant bacteria and antibiotic resistance genes (ARGs) are pollutants of worldwide concern that seriously threaten public health and ecosystems. Machine learning (ML) prediction models have been applied to predict ARGs in beach waters. However, the existing studies were conducted at a single location and had low prediction performance. Moreover, ML models are “black boxes” that do not reveal their predictions' internal nuances and mechanisms. This lack of transparency and trust can result in serious consequences when using these models in high-stakes decisions. In this study, we developed a gradient boosted regression tree based (GBRT) ML model and then described its behavior using six explainable artificial intelligence (XAI) model-agnostic explanation methods. We used hydro-meteorological and qPCR data from the beaches in South Korea and Pakistan and developed ML prediction models for aac (6′-lb-cr), sul1, and tetX with 10-fold time-blocked cross-validation performances of 4.9, 2.06 and 4.4 root mean squared logarithmic error, respectively. We then analyzed the local and global behavior of the developed ML model using four interpretation methods. The developed ML models showed that water temperature, precipitation and tide are the most important predictors for prediction of ARGs at recreational beaches. We show that the model-agnostic interpretation methods not only explain the behavior of the ML model but also provide insights into the behavior of the ML model under new unseen conditions. Moreover, these post-processing techniques can be a debugging tool for ML-based modeling.
https://www.sciencedirect.com/science/article/abs/pii/S0301479722025427