Although machine learning models can provide tremendous value to the unconventional oil and gas industry, interpreting their inner workings and outputs can be a laborious, time consuming, and difficult process. Here we present a novel method for extracting an overall rock quality index from a machine learning model trained on well logs. This rock quality index (RQI), which we term geoSHAP, can be used for performance benchmarking, completions tailoring, and acreage evaluation workflows. We trained a decision trees-based model on a regional Williston Basin dataset. The model predicts oil, gas, and water production at 30-day increments out to IP 720 based on training features of completions design, petrophysical grids, and spacing/stacking parameters. We started with over 400 petrophysical grids and reduced them down to 5 principal components using a Gaussian Kernel Principal Components Analysis. We then employ SHAP values (SHapley Additive exPlanations), which reflect how much each individual feature contributed to the model prediction. To extract our RQI, we sum the SHAP values for each of the principal geologic components for each well at each IP day. These summed geoSHAP values reflect the overall rock quality around the basin, identifying sweet spots and low performing areas. The model is able to identify high-performing areas on the Nesson Anticline, Antelope Anticline, Fort Berthold area, and Parshall/Sanish. We also show how the geoSHAP trends with overall operator performance and can be used to benchmark performance relative to expectation. This method is repeatable across trees-based machine learning algorithms. It removes the need to construct partial dependence plots or to take the time-consuming steps of running synthetic pads across the entire basin. Additionally, this method simplifies the selection of petrophysical grids and removes issues with multicollinearity that can debilitate machine learning models. GeoSHAP provides a purely empirical perspective on rock quality that can be compared to more prescriptive, assumptions-laden traditional methods, such as combining Archie’s equation with recovery factors. It also provides a generalizable method applicable to models built with simpler, easier to obtain data such as formation tops and isopachs.


Over the past several years, machine learning methods have found increasingly common usage for well performance prediction and design optimization in unconventional reservoirs. These algorithms offer several advantages that have made them attractive to engineers and geoscientists, including increased accuracy, ability to deal with complex problems, and reduced bias. However, difficulty in assembling datasets and lack of interpretability have limited widespread usage. Because machine learning models thrive on large datasets, operators are often forced to incorporate publicly-available data from non-operated wells (whether directly from a state database or through a vendor). Even if an operator does have hundreds or even thousands of horizontal wells within a given basin, their implemented completions and spacing/stacking configurations may poorly sample the distribution of design parameters, limiting the effectiveness of a single-operator model.

This content is only available via PDF.
You can access this article if you purchase or spend a download.