Abstract
Machine learning models that rely on core data as dataset labels have become a mainstream method for predicting reservoir parameters. However, the high costs and insufficient spatial sampling density associated with core data acquisition often result in weak nonlinear representation, poor generalization ability, and overfitting in these models. To address limited core data challenges, we propose a reliability analysis-driven workflow that optimally selects multiple core data augmentation (CDA) methods to enhance reservoir parameter prediction. This workflow achieves two primary advancements: Firstly, it mitigates data scarcity by treating core data as a minority class and applying diverse tabular data augmentation techniques to generate and rigorously evaluate reliable synthetic data. This effectively expands the useable core dataset. Secondly, leveraging this augmented data, the workflow integrates machine learning with pre-trained language models (PLMs) to develop and apply multiple combinations of augmentation-prediction models for both lithology classification and physical property parameter prediction. Field data applications demonstrate that the combination of Tabular Denoising Diffusion Probabilistic Model (TabDDPM) and Tabular Prior Data Fitting Network (TabPFN) in CDA achieves outstanding performance in evaluation metrics and case studies for lithology classification and petrophysical parameter prediction. This study provides a reproducible framework for enhancing small-sample reservoir parameter prediction in oil and gas exploration, proving that synthetic data augmentation can effectively mitigate data scarcity and open new pathways for geophysical data analysis.
Paper Information:
Luo Xin, Ci Xing-hua, Sun Jian-meng, et al. Enhancing Reservoir Parameter Prediction Workflows via Advanced Core Data Augmentation. Marine and Petroleum Geology, 2025: 107605. Doi: https://doi.org/10.1016/j.marpetgeo.2025.107605.

