This article, written by JPT Technology Editor Chris Carpenter, contains highlights of paper URTeC 3723023, “Using Machine Learning To Customize Development Unit Spacing for Maximum Acreage Value,” by M. Maguire, SPE, Diamondback Energy; A. Cui, Novi Labs; and T.E. Witham, Diamondback Energy, et al. The paper has not been peer reviewed.


In the complete paper, machine-learning (ML) models were trained using geologic, completion, and spacing parameters to predict production across the primary developed formations in the Midland Basin. The approach of using ML to test several different combinations of spacing and completion designs can be repeated across a basin to find an economical, customized solution for each development unit.


In contrast to conventional methods, ML offers a data-driven approach that can leverage the large amount of data generated by operators within unconventional plays. Several characteristics of ML models make them well-suited for spacing optimization, including the following:

- ML models rely on statistical methods to establish relationships between the input variables and the output variables.

- ML handles nonlinear relationships well.

- Complex variable interactions can be difficult to understand with traditional methods.

The downside is the time required to assemble the input data used for training an ML model. However, once a model is built that has an acceptable level of error along with data examples covering the range of cases to be evaluated, many alternative development scenarios can be quickly evaluated. In addition, with the right variables (features) included, a model could be applicable to an area much larger than that of pilot wells.

ML Methods

Each ML model used production data (monthly or daily format), directional survey data (to locate the wells and allow for spacing calculations), a header table with a variety of information on all the wells (i.e., completion information, formation, and completed lateral length), and grid data (typically geology data, such as effective porosity).

For all models, private data, such as daily production data and detailed completion data, had precedence over data acquired from public sources or a third-party data provider. For a well to be included in the ML model, it generally had to have all data values populated; wells with missing data were excluded. Once the data were gathered, the derivative variables were calculated, along with several spacing parameters.

For ML studies of unconventional developments, practitioners face two main subsurface questions: how to subdivide the formations to represent the drainage heights and which rock properties to pass to the model. Drainage heights defined by the operator are based on the geochemical typing data. These heights generally extended further than traditionally defined formations (Fig. 1). Because these drainage boundaries are defined by the true vertical depth (TVD) positions of specific formational tops, the total vertical extent of these drainage heights mirrors the geospatial variation in the TVD positions of their boundary tops.

To generate the grids, geologic attributes were calculated from well logs. Log processing began after drainage heights were determined across the Midland Basin. Attributes with high correlation were selectively removed from the model inputs.

This content is only available via PDF.
You can access this article if you purchase or spend a download.