## Summary

Nuclear magnetic resonance (NMR) relaxation responses in porous media provide a sensitive probe of the microstructure and yet are influenced by a number of factors which are not easily detangled. Low-field $T2$ transverse relaxation measurements can be carried out quickly and are frequently used as pore size distributions, while adding $T1$ longitudinal relaxation measurements provides additional insights into surface properties and fluid content. Here we present an inverse solution workflow extracting related intrinsic physical parameters of the system by fitting experiment and numerical simulation(s). An efficient NMR forward solver for the simultaneous calculation of $T1$ and $T2$ responses is introduced, which honors existing inequality relationships between $T1$ and $T2$ parameters. We propose a Bayesian optimization approach that jointly identifies $T1$- and $T2$-related properties satisfying physical constraints by simultaneously fitting $T1$ and $T2$ experiments to simulations. This dual-task inverse solution workflow (DT-ISW) identifies the solution by minimizing the sum of the L2 norm of the fitting residuals of both $T1$ and $T2$ distributions into a single objective and jointly models the two highly correlated objectives with high efficiency using the vector-valued Gaussian process (GP) kernel for transfer learning. A multimodal search strategy is used to identify nonunique solution sets of the problem. The workflow is demonstrated on Bentheimer sandstone, identifying five intrinsic physical parameters. The performance of the joint DT-ISW (DT-ISW-J) is compared to a sequential DT-ISW (DT-ISW-S) approach as well as an independent single-task ISW (ST-ISW) of the $T1$ and $T2$ responses. Both dual-task versions converge more than two times faster than the single-task version. DT-ISW-J equally minimizes the L2 norm of $T1$ and $T2$ fitting residuals whereas DT-ISW-S only preferentially minimizes the objective assigned higher importance. A Pareto optimal solution (POS) is provided to allow operators to subjectively balance the preference of $T1$ and $T2$ data fits for the slightly conflicting objectives. The ability to extract five intrinsic physical parameters simultaneously provides new techniques for tracking wettability alteration and assessing the influence of clay amount and distribution on petrophysical property estimates.

## Introduction

NMR relaxation and diffusion measurements as probe of the microstructure of porous media have a long history. Focusing on relaxation responses, this includes the estimation of pore size distributions (Prammer 1994; Kenyon 1997), (pseudo) capillary pressure curves (Marschall et al. 1995; Slijkerman et al. 2001), wettability index (Fleury and Deflandre 2003; Looyestijn and Hofman 2005; Chen et al. 2006; Al-Garadi et al. 2022), and permeability (Timur 1968; Straley et al. 1987; Banavar and Schwartz 1987; Sen et al. 1990; Borgia et al. 1998; Dunn et al. 1999). Length scale interpretations of NMR relaxation responses from saturated rocks are based on the assumption of the fast diffusion limit (Brownstein and Tarr 1979) (i.e., there is a direct correspondence between a specific relaxation time component and pore size). The assumption is valid if a range of conditions are fulfilled, including fast diffusion and diffusionally isolated pores, insignificant internal gradients (absence of paramagnetic impurities), and uniform mineralogy. Under those conditions, the relationship between relaxation time and pore size may be described by a mono-exponential decay with relaxation rate (Torrey 1956)

where *T*_{2} and $T2b$ are transverse relaxation time and bulk transverse relaxation time, respectively; $\rho 2$ is transverse surface relaxivity; $S$ and $V$ are surface area and volume of an individual pore, respectively; $De$ is effective diffusion coefficient; $\gamma $ is the gyromagnetic ratio; $G$ is the average or effective internal gradient; and $tE$ is echo time. Summing over all pores results in the magnetization decay of the sample. Assuming a pore shape (e.g., spherical, and constant surface relaxivity), the magnetization decay can then be converted to a pore size. Practically, the prediction of pore sizes requires a calibration of field data to NMR and mercury intrusion porosimetry laboratory measurements using surface relaxivity as the constant conversion parameter. In literature, such a parameter is often (and correctly) called an effective surface relaxivity (Chen et al. 2006) or apparent surface relaxivity (Müller-Petke et al. 2015; Costabel et al. 2018) to emphasize the averaged nature of the parameter (Marschall et al. 1995). Depending on the method utilized for estimation of the surface relaxivity (e.g., using surface area measurements by gas adsorption analysis, accounting for surface roughness or not, utilizing short time limit restricted diffusion, by Bayesian magnetic resonance derived pore size distribution or derived by utilizing nonground eigenstates following Brownstein-Tarr theory), the literature-reported surface relaxivity values vary by an order of magnitude. For the benchmark rock we use in this work, Bentheimer sandstone, the reported values of $\rho 2$ spread from 8 μm/s (Holland et al. 2013) to 26 μm/s (Afrough et al. 2019).

The application of the concept of constant effective surface relaxivity to realistic heterogeneous rocks leads to errors of various magnitude. Toumelin et al. (2003b) demonstrated possible errors approaching 50% in estimating immobile fluid fraction due to unaccounted diffusional coupling, Arns et al. (2006) reported a 30% spread of permeability estimates due to relaxivity heterogeneity, while Benavides et al. (2017) showed a very significant narrowing of the predicted pore size distribution (twice as narrow compared to a known reference on a standard log scale) if a uniform relaxivity value is utilized. Approaching this problem by introducing computational methods utilizing intrinsic *phase-specific* surface properties may allow one to quantify those uncertainties and in general enhance the interpretation of NMR relaxation responses.

Additional information about fluid content or compartmentalization of the pore space can be gained from *T*_{1}-*T*_{2} correlation experiments (Song et al. 2002; Tinni et al. 2014). Pseudo *T*_{1}-*T*_{2} correlations may be derived by utilizing independent *T*_{1} and *T*_{2} measurements (Williamson et al. 2018). As mentioned above, it is well known that NMR relaxation of fluids in porous media is very sensitive to the wetting properties of solid phases. The close relationship between affinity and surface relaxivity was utilized to develop an NMR wettability index under the assumption of uniform wetting properties. The concept was extended for the more practical case of heterogeneous wettability of rock [e.g., we demonstrated in Shikhov et al. (2018) the approach to determine fractional wetting properties of grains surfaces using a combination of transverse relaxation techniques, though the reference data for the rock of interest at two end wetting states in either case is required, limiting applications to laboratory core analysis]. Approaches relying on both types of relaxation (*T*_{1} and *T*_{2}) may not require such calibration, allowing estimates of arbitrary wet rocks at field conditions (e.g., well logs). Korb et al. (2018) utilized a combination of 2D *T*_{1}-*T*_{2} and *T*_{1} field-dependent relaxation (dispersion) measurements to probe the wetting properties of fluids confined in the pore space of rocks without prior knowledge of endpoint wettability state responses. Gizatullin et al. (2019) practically demonstrated a high sensitivity of certain types of aromatic compounds to wetting conditions of sandstones and carbonates for *T*_{1} dispersion measurements. This suggests that future NMR methods of formation wettability characterization from well logs are likely to be based on a combination of *T*_{2} and *T*_{1} field-dependent measurements. The interpretation of such measurements would require an ability to separate the responses from internal gradient effects when present and efficient extraction of multiple longitudinal and transverse surface relaxivity values $\rho 1$ and $\rho 2$. The current work presents a significant step toward this goal.

Numerical simulations based on digitized materials offer a way to test such methods and potentially overcome some of their limitations. Basic random walk methods on digitized media were introduced in several studies (Mendelson 1990; Wilkinson et al. 1991; Bergman et al. 1995). Straley and Schwartz (1996) compared experiments on glass beads to numerical simulations with reasonable accuracy. Hidajat et al. (2002) introduced a first-passage random walk method for NMR responses and correlated *T*_{2} log mean relaxation time to lattice Boltzmann calculations of permeability for synthetic random structures. Øren et al. (2002) modeled the NMR *T*_{2} response of Fontainebleau sandstone utilizing a random walk method with fixed timestep. Toumelin et al. (2003b) considered a periodic array of spheres to capture the response of vuggy carbonates at different water saturations conceptually, where the microporosity itself also formed a periodic array of spheres. More algorithmic detail was given in Toumelin et al. (2003a)—the approach captures dephasing in the presence of an external constant background gradient and utilizes the first-passage algorithm. Valckenborg et al. (2003) extended the modeling of NMR responses to nonuniform fields. Song et al. (2000) measured the magnetization decay due to diffusion in the internal field, a technique that probes higher eigenmodes of the NMR *T*_{2} relaxation response. It was first modeled by Arns (2004) on a series of micro-computed tomography (CT) images. Interestingly, the associated length scale estimates indicated a larger pore size as well as a wider pore size distribution than would have resulted from the direct use of the ground states. Arns et al. (2005) compared a range of length scale measures for permeability estimation including NMR-derived ones, showing strong correlations between NMR-based estimates and direct numerical simulation for Fontainebleau sandstone, in agreement with Sen et al. (1990). Numerical techniques operating on 3D representations of microstructure extracted from micro-CT images have since been refined further and were applied to higher-dimensional NMR measurements (e.g., $D$-*T*_{2}) (Arns et al. 2011). Internal field effects were accounted for by application of a dipole approximation with random walkers accumulating an effective phase; as in Arns (2004), a part of the dephasing behavior is thus accounted for explicitly, rather than by an effective relaxation rate.

Given the significant advances in numerical techniques, a standing problem is the determination of the correct intrinsic parameters associated with relaxation at surfaces or smaller-scale microstructure where bulk relaxation rates of effective materials may be specified. However, determination for those parameters using forward simulations requires an extensive search in the multidimensional solution space, posing an ill-conditioned inverse problem, the solution of which may lead to new NMR relaxation response interpretation techniques directly utilizing intrinsic physical properties. This is particularly so if both *T*_{1} and *T*_{2} responses are considered. There are two major issues related to the relaxation time inverse problem: The micro-CT image-based NMR simulation is expensive to evaluate and the solution sets are nonunique. For the computational burden, Bayesian optimization has become a powerful strategy for identifying the extrema of the expensive objective functions with a limited budget (Mockus 1994; Jones et al. 1998; Snoek et al. 2012), and we have developed ST-ISW demonstrating the efficiency and reproducibility of Bayesian optimization for *T*_{2} relaxation time inverse problem in which multiple solution sets are identified simultaneously (Li et al. 2021). The key advantage of Bayesian optimization is that it models the objective function using GPs or kriging known in geostatistics as the cheap proxy, with which the most promising candidate is identified and proposed for exact function evaluation so that the computational cost is significantly reduced. GPs as a surrogate or proxy model have been applied to solving history matching problems (Horowitz et al. 2013; Sayyafzadeh 2015; Rana et al. 2018; Dachanuwattana et al. 2019; Li et al. 2015) and to recovering physical, geometrical, or structural parameters where expensive measurements or simulations are involved (Vargas-Hernández et al. 2019; Deng et al. 2020). For the identification of nonunique solution sets under box constraints, the objective function is intrinsically multimodal. Li et al. (2021) recovered both the global optimal solution and (major) local optimal solutions using a multimodal search strategy, comprising a multistart Box-constrained limited-memory Broyden–Fletcher–Goldfarb–Shanno algorithm (L-BFGS-B) optimizer searching for local optimum solutions and a global optimizer social-learning particle swarm optimizer for global optimum solutions. With this technique, more than one physically valid solution set is identified by computationally reproducing the transverse relaxation time (*T*_{2}) measurement using the NMR simulation. From the mathematical point of view, the solutions are considered more physically valid if the agreement between the simulated longitudinal relaxation time (*T*_{1}) distribution arising from the identified physical quantities and the measured *T*_{1} distribution can also be achieved. From an NMR forward solver point of view, it is easy to output *T*_{1} and *T*_{2} relaxation responses with little extra computational effort, making it desirable to develop an inverse solution workflow that can use this advantage. However, compared to ST-ISW for *T*_{2} of Bentheimer sandstone where the clay regions effective transverse relaxation time, effective diffusion coefficient, and transverse surface relaxivity $\rho 2$ were determined, here the corresponding two longitudinal relaxation parameters would also be required, leading to a five-parameter inverse problem.

Incorporating the matching of *T*_{1} distributions in addition to *T*_{2} distributions following ST-ISW presented in Li et al. (2021) is straightforward: Both *T*_{1} and *T*_{2} fitting residuals can be jointly minimized by taking the sum of the two L2 norms of the fitting residuals as the objective. The inverse problem can then be solved by ST-ISW, except that incorporation of the *T*_{1} distribution leads to additional physical constraints, which are helpful for the identification of the most realistic solution sets. This leads to a strategy mainly focused on extending the expected improvement (EI) or entropy search acquisition functions to the constrained optimization scenarios (e.g., the expected constrained improvement; Schonlau et al. 1998), and the predictive entropy search with constraints (Hernández-Lobato et al. 2015). The family of EI acquisition functions is superior for multimodal optimization as it automatically balances exploration and exploitation by taking into account both value information and uncertainty information. In this study, we identify the nonunique solution sets by multimodal maximization of the expected constrained improvement acquisition function using the multistart sequential quadratic programming (SQP).

While conceptually simple and easy to implement, ST-ISW fails to account for correlations between the two objectives, which may lead to a waste in computational resources. Historically, the (spatial) correlation between related objectives are mostly studied in geostatistics [e.g., Goovaerts (1997) used the cokriging approach to account for the spatial correlations between the concentration of the heavy metals that are expensive to evaluate and the variables that are easy to evaluate, for example, pH levels]. Cokriging with a linear model of coregionalization (LCM) is widely applied in geostatistics (Babak and Deutsch 2009; Ochie and Rotimi 2018; Al-Mudhafar 2018) and is then extensively studied and applied in transfer learning (Bonilla et al. 2007; Álvarez et al. 2012; Swersky et al. 2013). In this study, we incorporate a transfer learning ability to ST-ISW using LCM which jointly models the *T*_{1} and *T*_{2} objectives by leveraging information from each other. We propose two approaches that differentiate themselves by whether the two objectives (matching *T*_{1} and *T*_{2} distributions) are considered of equal importance: DT-ISW-S and DT-ISW-J, indicating if the two objectives are optimized sequentially or jointly; here DT stands for the dual-task. In sequential optimization, unknown physical quantities related to the objective with higher importance are identified first, setting the bounds for another objective, whereas in joint optimization all unknowns are identified simultaneously. DT-ISW-S is formulated for the scenario where one objective is considered of higher importance and it does not matter if the other objective strictly matches, whereas DT-ISW-J is for the scenario where the two objectives are considered of equal importance and good matches of the two objectives are desirable. We demonstrate the performance of DT-ISW-S and DT-ISW-J, with ST-ISW as a baseline by the identification of the five parameters of a Bentheimer sandstone sample: one shared quantity (i.e., the effective diffusion coefficient in clay regions), together with four unique quantities (i.e., surface relaxivity of quartz and effective relaxation time in clay regions for *T*_{1} and *T*_{2}, respectively). We also provide the separate optimization (SEP) as the reference for which the physical quantities related to *T*_{1} and *T*_{2} are separately identified. Finally, we provide the POS, which allows operators to balance the preference of *T*_{1} and *T*_{2} data fits for the slightly conflicting objectives.

This paper is organized as follows: An overview of DT-ISW is presented, followed by a detailed description of the materials, NMR measurements, and NMR forward simulation. Next, a detailed description of DT-ISW for efficient modeling of the two related objectives using transfer learning is provided. Finally, the performances of DT-ISW-S and DT-ISW-J are investigated and compared against ST-ISW and SEP for a Bentheimer sandstone sample, determining five *T*_{1} and *T*_{2} relaxation-related intrinsic physical properties simultaneously.

## The Dual-Task Inverse Solution Workflow

There are five components in the developed DT-ISW for the inverse problem of *T*_{1}, *T*_{2} parameter estimation, namely (a) the observables, (b) the forward NMR solver, (c) the cost-function, (d) the biobjective Bayesian optimization, and (e) the solution analysis, as presented in Fig. 1.

We briefly summarize these components in the itemized list below, while detailed descriptions follow in the next section.

(a) Measure NMR *T*_{1} and *T*_{2} relaxation decays $ml,exp$ and $mt,exp$.

(b) Obtain the digitized representation of the sample via CT imaging (first iteration only), numerically derive the decays $ml,sim$ and $mt,sim$ for the chosen sets of unknown intrinsic parameters $x(T1)$ and $x(T2)$ (e.g., surface relaxivities of the resolved phase: quartz, effective diffusion coefficient of clay, effective relaxation times of clay regions only resolved on scanning electron miscroscope images) while honoring known quantities like porosity $\varphi $, volumetric magnetic susceptibility $\chi \nu $, bulk physical properties of water, and matching the physical kernel (see Eqs. 2 and 3) and regularization parameter $\lambda $.

(c) Acquire the probability density distribution for both *T*_{1} and *T*_{2} using both simulated and measured decays [i.e., $\zeta sim(T1)$, $\zeta exp(T1)$, $\zeta sim(T2)$, and $\zeta exp(T2)$]. Evaluate the cost-function and calculate the sum of the L2 norm of the fitting residuals for both *T*_{1} and *T*_{2} distributions.

(d) Jointly model the *T*_{1} and *T*_{2} objective functions using the vector-valued GP with correlation captured using the LCM, followed by maximum likelihood estimation of the model hyperparameters; then, the EI acquisition function is maximized to identify the promising candidates $x(T1)$ and $x(T2)$. Finally, solutions within sparsely explored regions are evaluated using exact simulation.

(e) When the optimization budget is depleted, the objective functions are modeled with the updated model hyperparameters followed by a division of the solution space into various unique partitions (UPs) associated with different modes of the objective function using the solution space partitioning introduced in Li et al. (2021). Finally, for each UP, the top three solutions and the POS are reported.

## Materials, Measurements, and Forward Simulations

### Sample Characterization

**Rock Sample**

NMR measurements were performed on a Bentheimer sandstone core plug (which is hereafter labeled as BHG2) fully saturated with 3 wt% NaCl brine (30,000 ppm). It was cored out of a rock block sourced from Bad Bentheim quarry (Lower Saxony, Germany) exhibiting 23.9% porosity. This sandstone is a remarkably clean quartz arenite, as five analyzed samples exhibit a narrow range of 97.94 to 98.07 wt% of silicon dioxide determined by X-ray fluorescence spectroscopy. The corresponding mineralogy associations are 95.67% quartz, 1.35% kaolinite, 2.39% feldspar, and 0.03% iron-rich components. Owing to the very small iron and manganese content, the rock is diamagnetic. The bulk volumetric magnetic susceptibility of three finely powdered samples (about 1 g each) measured using magnetic susceptibility balance (Sherwood Scientific, Cambridge, UK) is −7.45 × 10^{–6} (SI units).

**Digitized Image**

A 5-mm diameter core of Bentheimer sandstone was imaged on the UNSW Tyree X-ray CT facility’s HeliScan Micro-CT set equipped with a GE Phoenix NanoFocus X-ray source and a 3,072 × 3,072 flatbed detector. Radiographs were acquired over 12 hours with 80 kV X-ray energy using a 0.75-mm aluminum filter in double helix mode. The resulting spatial resolution of the 3D image was $\epsilon =2.16$μm. Following reconstruction, the gray-scale tomogram was segmented into five phases: (macro) pore space, quartz, clay region (a microporous effective phase), feldspar, and an iron-rich dense mineral phase. The segmentation first identified pore space, intermediate intensity clay region, and high density (quartz and denser) in a standard converging active contour approach (Sheppard et al. 2004)—see Li et al. (2021) for the actual sample—followed by a further partitioning of the solid phase into quartz, feldspar, and an iron-rich phase. Feldspar and quartz grains were separated using grain partitioning by a watershed algorithm (Sheppard et al. 2004) followed by a thresholding based on average grain intensity, while the high-intensity iron-rich phase was extracted directly based on intensity. For reference, a slice through the tomogram and resultant phase segmentation is given in Fig. 2.

Statistics of the Bentheimer sandstone segmentations as well as magnetic susceptibility values for the calculation of the internal magnetic field at lattice resolution *ε* that follows is listed in Table 1.

Sample . | $\epsilon $(μm)
. | Domain . | . | . | . | . | . | . |
---|---|---|---|---|---|---|---|---|

BHG2 | 2.226 | 1,240^{3} | $\varphi sim$ | $fp$ | $fc$ | $fq$ | $ff$ | $fh$ |

0.2373 | 0.2252 | 0.0241 | 0.7369 | 0.0126 | 0.0011 | |||

$\chi \nu $ | $\chi \nu ,w$ | $\chi \nu ,c$ | $\chi \nu ,q$ | $\chi \nu ,f$ | $\chi \nu ,h$ | |||

−7.45 | −9 | $(\chi \nu ,w+\chi \nu ,k)/2$ | −12 | −12 | 1,900 |

Sample . | $\epsilon $(μm)
. | Domain . | . | . | . | . | . | . |
---|---|---|---|---|---|---|---|---|

BHG2 | 2.226 | 1,240^{3} | $\varphi sim$ | $fp$ | $fc$ | $fq$ | $ff$ | $fh$ |

0.2373 | 0.2252 | 0.0241 | 0.7369 | 0.0126 | 0.0011 | |||

$\chi \nu $ | $\chi \nu ,w$ | $\chi \nu ,c$ | $\chi \nu ,q$ | $\chi \nu ,f$ | $\chi \nu ,h$ | |||

−7.45 | −9 | $(\chi \nu ,w+\chi \nu ,k)/2$ | −12 | −12 | 1,900 |

Since we focus on NMR relaxation at low external field strength of 0.047 T and given the small amount of feldspar, we assume that the two phases, feldspar and quartz, share the same surface relaxivity and volumetric magnetic susceptibility, so that the increased magnetic susceptibility is assigned to the iron content in the high-density phase, leading to a higher $\chi \nu ,h$, with the clay region considered as a mixture of kaolinite and NaCl brine, as shown in Table 1.

**T**_{1} and **T**_{2} Reference Data

**T**

**T**

_{2}Here we measured *T*_{1} relaxation time using the standard inversion recovery pulse sequence (Carr and Purcell 1954). It is formed by a 180° pulse which inverts the magnetization, followed by recovery at time delay $\tau $, and concluded by a 90° pulse that projects the attenuated magnetization on the transverse plane for the acquisition of the free induction decay. By varying time intervals $\tau $ in subsequent scans, a set of free induction decays with different amplitudes is obtained. *T*_{1}-inversion recovery measurements were performed with the following set of parameters: 64 time increments $\tau $ were spaced logarithmically between 0.2 and 10,000 ms. The signal was acquired as an average of four scans following a phase cycling sequence to suppress systematic noise due to hardware imbalances. Recording the initial value of each free induction decay as a function of the recovery time results in a mono-, or in a general case, in a multiexponential decay described by the following expression with the *T*_{1}-inversion recovery kernel $1-2e-t/T1$as (Callaghan 1994; Song et al. 2002; Wang et al. 2017)

where $mT1(t)$ is the longitudinal magnetization decay as a function of time $t$, $\eta n$ is noise assumed white and additive, and our objective is the estimation of the probability density function of *T*_{1} [i.e., $\zeta (T1)$].

The standard method to measure transverse relaxation time *T*_{2} is the Carr, Purcell, Meiboom, and Gill (CPMG) pulse sequence (Carr and Purcell 1954; Meiboom and Gill 1958). It consists of a 90° pulse that projects the initial magnetization on the transverse plane, followed by a series of 180° refocusing pulses separated by a time interval of $tE=2\tau E$ (interecho time). Each 180° pulse produces an attenuated signal (echo) with the maximum occurring at time $\tau E$ after the pulse. By acquiring the top of each echo, the exponential (or multiexponential, in a more general case of porous media) signal decay is obtained with the decaying rate equal to the inverse of the transverse relaxation time *T*_{2}, thus giving an $e-t/T2$ kernel for the multiexponential problem as (Callaghan 1994; Song et al. 2002)

where $mT2(t)$ is the transverse magnetization decay as a function of time $t$ and $\eta n$ is additive white noise, and the objective is the estimation of the probability density function of *T*_{2} [i.e., $\zeta (T2)$]. All NMR experiments were performed using the Magritek 2 MHz Rock Core Analyzer. Being equipped with the P54 RF probe (54-mm bore opening), it is capable of achieving signal-to-noise ratio in excess of 50 from four scans for the most saturated rock samples of about 20 cm^{3} in volume. The CPMG measurements were recorded with an echo spacing of 0.2 ms and 50,000 pulses per echo-train; the echo amplitude was determined by averaging 16 points on top of each echo.

The resultant magnetization decays are converted to relaxation time distributions by solving the respective Fredholm integrals of the first kind (i.e., Eqs. 2 and 3) for the probability density functions $\zeta (T1)$ and $\zeta (T2)$. Both are nonnegative least squares problems, for which various numerical solvers are available (Bertsekas 1982; Hansen 1992; Lawson and Hanson 1995). For consistency, all experimental and simulated decays use exactly the same kernels with 256 logarithmically spaced bins for both *T*_{1} and *T*_{2} and are inverted using the algorithm of Lawson and Hanson (1995).

### Joint **T**_{1} and **T**_{2} Simulation

**T**

**T**

The NMR simulations are carried out via a random walk method. Some modifications to our previous implementations have been made over time. Arns (2004) utilized a fixed displacement and fixed timestep random walk following Mendelson (1990) and Bergman et al. (1995) by directly using the natural resolution of the underlying micro-CT image, with the addition of accounting for dephasing. Arns et al. (2011) evolved that approach to one with fixed displacement, but variable timestep while also implementing a higher time resolution through a logical subgridding approach: Given the tomogram voxel size *ε*, random walks are carried out on a lattice with resolution $\epsilon w=\epsilon /ls$, where $\epsilon w$ is the step length of the random walk and *l*_{s} is a grid refinement factor. Furthermore, the approach allows for the existence of effective properties (e.g., voxels which are microporous and exhibit an effective diffusion coefficient). Alhwety et al. (2016) utilized the approach for a dual-scale synthetic laminated sphere pack, tracking the magnetization exchange of different microporosity domains with distance maps embedded into the NMR simulations for spatial analysis. SayedAkram et al. (2016) utilized effective medium approaches for describing the motion and NMR relaxation in Indiana limestone exhibiting significant microporosity, comparing experiment and simulation under mixed-wet conditions. Shikhov et al. (2017) measured and modeled the NMR *T*_{1}-*z* spatially resolved NMR relaxation response for a set of sandstones under centrifuge desaturation conditions resulting in good agreement between experiment and simulation. More recently, Zheng et al. (2018) considered the case of flow propagators and exchange of magnetization between microporous and macroporous regions of a glass-bead sample. The effective medium approach for nonresolved porosity was also utilized in several studies (Cui et al. 2021; Li et al. 2021). In this work, modifications are added, allowing the simultaneous extraction of *T*_{1} and *T*_{2} responses, which adds significant additional analysis options to the approach while also enabling the efficient determination of intrinsic parameters enabling these applications. For this reason, we summarize our method in the following in more detail.

Consider a microstructure given by a 3D regular grid from a micro-CT image segmented into $N$ phases. Then the material properties *at voxel scale* relevant for NMR response simulation can be presented by a set of arrays $T1b\u2208\u211c1\xd7N$, $T2b\u2208\u211c1\xd7N$, $\chi \nu \u2208\u211c1\xd7N$, $\rho 1\u2208\u211cN\xd7N$, $\rho 2\u2208\u211cN\xd7N. De\u2208\u211cN\xd7N$. $De\u2208\u211cN\xd7N$. Here, $T1b$ is longitudinal relaxation time, $T2b$ is transverse relaxation time, $\chi \nu $ is volumetric susceptibility, $\rho 1$ is longitudinal surface relaxivity, $\rho 2$ is transverse surface relaxivity, and $De$ is the effective diffusion coefficient. We note that the array entries represent either intrinsic properties for fully resolved voxels or effective properties. For relaxation-related parameters, we have the element-wise conditions $T2b\u2264T1b$ and $\rho 2\u2265\rho 1$. For a single fluid and a fully saturated sample considered here, random walks start only in voxels where porosity is greater than zero and with a probability proportional to that of porosity. Consequently, the introduced arrays are sparse—not all entries are meaningful and if a full notation is utilized for convenience, those entries may be set to zero. To translate the material properties into timesteps describing the motion and movement and stepwise signal damping factors, we introduce a set of additional quantities, namely $\Delta t\u2208\u211cN\xd7N$, $pmov\u2208\u211cN\xd7N$, $ST1\u2208\u211cN\xd7N$, and $ST2\u2208\u211cN\xd7N$ as timestep, movement probability, and *T*_{1} and *T*_{2} signal damping factors. A random walker at step count $s$ with position $rs$ in a voxel of phase $p$ and internal clock *t*_{s} attempts a step in a random direction (one of six nearest neighbors). If the attempted motion is successful, the random walker changes to position $rs+1$ of phase $q$ with timestep $\Delta tpq=\epsilon w26Dpq$ and signal damping factor following Bergman et al. (1995) as

advancing the clock to $ts+1=ts+\Delta tpq$. Here, the terms in round brackets correspond to survival probabilities for *T*_{1} and *T*_{2} surface relaxation processes, respectively, while the exponential terms account for bulk relaxation. The surface relaxivities are defined in the usual way (e.g., independently for a *T*_{1} or *T*_{2} experiment/simulation). This implies $\rho 2\u2265\rho 1$, as a *T*_{1} surface relaxation will also destroy phase coherence. When jointly simulating *T*_{1} and *T*_{2} relaxation, the relaxation already accounted for by *T*_{1} processes therefore needs to be corrected for in the *T*_{2} process, leading to the prefactor $1/ST1,pq$ in Eq. 5. If the attempted motion is not successful due to $pmov,pq\xa7lt;1$, the damping factors $ST1,pq$ and $ST2,pq$ are still applied for the attempted move. However, the timestep is taken to be $\Delta tpp$ and the internal field (see below) is considered to be the one of the current position. The above formulation allows one to write the signal decay of an individual walker $w$ carrying out $Ns$ steps with initial signal strength *M*_{0} as a product, while recording the *T*_{1} and *T*_{2} decays simultaneously

Here, $\varphi D$ is the accumulated phase shift due to diffusion in an inhomogeneous (offset) internal magnetic field $Bi$, with total magnetic field $B(r)=Bi(r)+B0$. The latter is calculated in the dipole approximation by convolution of the susceptibility field $\chi \nu (r)$ with the dipole field $Bdip$ as

Details of the implementation are given in Arns et al. (2011). Following the random path of a walker over steps $j$ and weighting the phase accumulation due to diffusion in the internal field by the timestep for changing locations, the total phase shift after $Ns$ steps can be written as

$r0$and $rj$ denote the location of the start and later positions of the random walker. The CPMG sequence is implemented by starting random walks with phase $\Delta \varphi D=0$ and switching the sign of $\Delta \varphi D$ at times $(nE+12)tE$, $nE=1,2,\u2026NE$, where $tE$ is the echo spacing and $NE$ is the total number of echoes recorded. Recording at times $t=nEtE$, $nE=1,2,\u2026NE$, and averaging over random walks result in the magnetization decays $MT1(t)$ and $MT2(t)$. We note that $t$ is an integer multiple of $tE$. As timesteps are allowed to vary, individual timesteps are split into two where necessary by linear interpolation for accurate timestepping of the CPMG sequence. Finally, a comment on the choice of *p*_{mov}: We assume continuity of the pore space (maximal correlation) at interfaces. This leads to $pmov,pq=1$ for $\varphi sim,p\u2264\varphi sim,q$, $pmov,pq=\varphi sim,q/\varphi sim,p$ for $\varphi sim,p\xa7gt;\varphi sim,q$, and $pmov,pq=0$ otherwise, where $\varphi sim,p$ denotes the calculated porosity of phase $p$ from simulation. Furthermore, we determine the off-diagonal entries of $De$ by taking the harmonic mean of the respective nonzero diagonal entries. This is the simplest assumption (e.g., treating the interface as two parallel half-spaces).

## Minimization of the **T**_{1}, **T**_{2} Objectives

**T**

**T**

_{2}The specific problem of minimization of the sum of the L2 norm of the fitting residuals for both *T*_{1} and *T*_{2} distributions can be formulated as:

where $xl$ and $xu$ are the lower and upper bounds of the prespecified search domain and $f(T1)$ and $f(T2)$ are L2 norm of the fitting residuals for *T*_{1} and *T*_{2} distributions, respectively, which may be expressed as

with weights for all data points of the *T*_{1} and *T*_{2} distribution being equally 1. Each lower-dimension component $x(Ti),i\u2208{1,2}$ comprises the unique components $x(Ti,unique)=[\rho i,q,log10\u2061Tie,c]$ and the shared component $x(shared)=log10\u2061De,c$, and therefore Eq. 11 specifies the complete solution. The constraints Eqs. 14 and 15 apply because in most practical situations for saturated porous media, *T*_{2} relaxation rate is faster than *T*_{1} due to near-surface interactions specific to transverse relaxation. An important nature of the problem is that the NMR simulator is adapted in a way that both $f(T1)$ and $f(T2)$ can be simulated simultaneously in a single function evaluation, enabling individual modeling of the two objectives to better reveal the structure of the true objective.

### Scalar-Valued GPs

GP is a distribution over functions and is characterized by its mean function $m(x)$ and covariance function $k(x,x\u2032)$ as (Rasmussen and Williams 2005; Li et al. 2021)

where $x=(x1,x2,\u2026,xD)\u22a4$ is a candidate in $\mathcal{X}$ of dimension $D\xd71$, $f(x)$ is the process evaluated at location $x$. In GP, the prior distribution over $f$ is expressed as

where $\mathcal{N}$ denotes normal distribution, $m0\u2208\u211cn\xd71$ denotes the mean vector for observations, $(X,\mathbf{y})={xi,yi}i=1n$ and $(x*,y*)$ denotes observed and predicted candidates, respectively, and $Kf(X,X)$ denotes the $n\xd7n$ covariance matrix for $f$, with the $(p,q)$ entry expressed by $kSE(xp,xq)$. Using the Gaussian likelihood function to accommodate for the noise in observations $\mathbf{y}$ yields $y|f=N(f,\sigma n2I)$. As we are interested in prediction of a single candidate, the joint distribution of the observations $\mathbf{y}$ and a single prediction $y*$ can be expressed as

where $Kf(X,x*)$ is the $n\xd71$ covariance between observed and predicted candidates, $Kf(x*,x*)$ is the variance of the predicted candidate, $m0*$ is the mean for $y*$, and $I$ is the $n\xd7n$ identity matrix. By conditioning the predictions on observations from Eq. 21, the posterior distribution for a single predicted candidate becomes

where $\mathcal{N}$ denotes normal distribution, $\theta s$ is a vector of model hyperparameters for the scalar-valued GP, and

### Vector-Valued GPs

The vector-valued GP indicates GP with multiple outputs. We use $Q$ to represent the number of correlated objective functions to be modeled, with $Q=2$ applied throughout this study. To model the correlations between the $Q$ outputs of $f$, the GP kernel in Eq. 19 has to be replaced by a vector-valued GP kernel, so that the vector-valued function $f$ can be expressed as (Bonilla et al. 2007; Swersky et al. 2013)

where $m(x)\u2208\u211cQ\xd71$ denotes the stacked mean function ${mq(x)}q=1Q$ for each output and $\mathbf{K}$ denotes a matrix-valued function whose entries $(\mathbf{K}(x,x\u2032))q,q\u2032$ express the correlation not only between the input pairs $x$ and $x\u2032$, but also between the outputs pairs $fq(x)$ and $fq\u2032(x\u2032)$. Similar as in Li et al. (2021), the null hypothesis is assumed as there is usually no prior knowledge about our problem; as a result, we use a constant mean function for each objective throughout this study (i.e., $m(x)=m0$), with a stationary covariance function: parameterized squared exponential for each objective. The squared exponential kernel $Kf,q\u2032(X,X)$ is parameterized using automatic relevance determination, that is,

where $\sigma f2$ is the signal amplitude, $\Lambda =diag(\u2113)-2$ accommodates different levels of sensitivity against each component of $x$, with $\u2113=(l1,\u2026,lD)\u22a4$ being a vector of positive values, referred to as length scales. Automatic relevance determination has been incorporated to infer the sensitivity against each dimension of the input, which has been used successfully for removing irrelevant input associated with a very large inferred length scale (Williams and Rasmussen 1995; Rasmussen and Williams 2005; Wipf and Nagarajan 2007). $\sigma f2$ is set to 1 when applied to vector-valued GP.

Various coregionalization models have been proposed in literature (Goovaerts 1997; Gelfand et al. 2004; Majumdar and Gelfand 2007; Krupskii and Joe 2015), and here we use the standard LCM originated from geostatistics to model $f$, with the complete derivation of LCM shown in ** Appendix A**. Assume that there are $n$ distinct observed candidates and observations for each objective [i.e.,$\mathcal{D}obs=(X,\mathbf{y})={(xt,i,yt,i)|i=1,\u2026,n,q=1,\u2026,Q}$]. Using LCM, the vector-valued GP kernel for the latent variable $f$ can be expressed as (Álvarez et al. 2012)

where $\u2297$ is the Kronecker product between matrices, $Q\u2032$ is the number of separable kernels, $Bq\u2032$ is a $Q\xd7Q$ covariance matrix measuring the relationship between the $Q$ outputs, and $Kf,q\u2032(X,X)$ is an $n\xd7n$ covariance matrix measuring the relationship between inputs, with the $(i,j)$ entry specified by $kSE(xi,xj)$. Typically $Q\u2032\u2264Q$ and Eq. 26 degrades to intrinsic coregionalization model if $Q\u2032=1$. Both matrices are symmetric and positive semidefinite. We use a full-rank LCM (i.e., $Q\u2032=Q$) throughout this study.

Eq. 26 is called the sum of separable kernels as the covariance function $\mathbf{K}f(X,X)$ is expressed as the sum of $Q\u2032$ product of two covariance functions. It is observed from Eq. 26 that for each latent function, the correlation within tasks and between tasks are decoupled. Similar to the scalar-valued GP case, we use a Gaussian likelihood to account for the noisy observations (i.e., $\mathbf{y}|f=\mathcal{N}(f,\Sigma n\u2297I)$, where $\Sigma n=diag\u2061(\sigma n2)$ is a $Q\xd7Q$ diagonal matrix accommodating task-dependent noise levels, and $\sigma n2=(\sigma n,12,\u2026,\sigma n,Q2)\u22a4$ is a vector of noise variances). Similar to the case of scalar-valued GP, the posterior distribution of mean and covariance for a single predicted candidate $x*$ can be expressed as

where $\mathbf{y}*=[\mathbf{y}1*,\u2026,\mathbf{y}Q*]\u22a4$ concatenates $Q$ outputs for $x*$, $\theta v$ is a vector of model hyperparameters where $v$ stands for vector-valued GP, and

where $m0*$ denotes the stacked constant mean function for each objective, $\mathbf{K}f(X,x*)$ is the covariance matrix for the latent variable $f$ evaluated at all pairs of observed candidates and predicted candidate, $\mathbf{K}f(x*,x*)$ is the covariance matrix for the predicted candidate, $\mu x*=[\mu 1(x*),\u2026,\mu Q(x*)]\u22a4$ concatenates $Q$ outputs, and $\Sigma x*\u2208\u211cQ\xd7Q$ is the covariance matrix for $Q$ tasks with variances on the diagonal. Fig. 3 shows a comparison of the prediction and uncertainty quantification between scalar-valued GP and vector-valued GP.

It is clear that given unevenly distributed observations for both tasks, the scalar-valued GP cannot truly reflect the unexplored region, whereas the vector-valued GP predicts the trend in unexplored region using knowledge from the related objective, leading to a more informative EI guiding the search.

### Proposal of Promising Candidates

Proposed by Mockus (1975), the EI acquisition function has been widely applied in Bayesian optimization with the standard GP, balancing exploration and exploitation by taking information on value and local uncertainty (Huang et al. 2006; Forrester et al. 2006; Quan et al. 2013). It is defined as the amount of improvement that the evaluation of a candidate is expected to induce.

Extending the EI to the multiobjective case is straightforward. For vector-valued GP, the EI acquisition function associated with the $qth$ component can be expressed as

where $xq-$ is the current best observed candidate for the $qth$ objective and $uq=yq*|X,\mathbf{y},x*,\theta v$ is the short-hand symbol for the Gaussian distributed variable given by Eq. 27. Eqs. 27 and 29 yield

where $\Phi (x)$ and $\phi (x)$ are the cumulative distribution function and the probability density function of the standard normal distribution respectively.

In each iteration, the choice of the promising candidate is based on maximization of the EI acquisition function. The order of maximization requires a choice: Sometimes the *T*_{1} data is only used for verification of the nonunique solution sets obtained from the *T*_{2} inverse problem, whereas sometimes *T*_{1} and *T*_{2} data are considered of equal importance and it might be desirable to have both *T*_{1} and *T*_{2} distributions fitted of equal quality. Formally, we propose two routines for maximization of the EI acquisition function corresponding to these two scenarios:

• $aEI,T2$ is first maximized, followed by maximization of $aEI,T1$ in a subspace constrained by Eqs. 14 and 15. For clarity we denote DT-ISW with sequential maximization of $aEI,Tq$ as DT-ISW-S.

• $aEI,T1$ and $aEI,T2$ are jointly maximized by maximization of the sum of $aEI,T1$ and $aEI,T2$. For clarity, we denote DT-ISW with joint maximization of $aEI,Tq$ as DT-ISW-J.

Throughout this study, $f(T2)$ is assigned more importance as the *T*_{2} decays acquired using CPMG measurement consist of significantly more data points than the *T*_{1} decays acquired during *T*_{1} inversion recovery measurement, making *T*_{2} distributions and the associated objective function more reliable. Maximization of $aEI,Tq$ under linear constraints can be tackled using SQP (Boggs and Tolle 1995). The initial guesses are placed randomly in the feasible domain, ensuring that both the global optimal solution and major local optimal solutions are identified. The performance of DT-ISW-J and DT-ISW-S is compared in the next section.

### Solution Analysis

After the optimization budget is depleted, the model hyperparameters are updated and the *T*_{1}, *T*_{2} objectives are jointly modeled under the new set of hyperparameters. The solution space of Eq. 10 is then divided into various UPs corresponding to various modes of the objective function. The solution with the lowest fitness value in each UP is referred to as the local minimum (LM). For details, see Li et al. (2021). As the two objectives involve noisy measurements which are never noise-free, it is expected that they will be slightly conflicting and there is no solution that simultaneously optimizes each objective precisely. To account for this, we provide the POS of the *T*_{1} and *T*_{2} data fits for the slightly conflicting objectives, as shown in Fig. 4.

POS is defined as the set of solutions that is not dominated by any other solutions. Generally speaking, $x1$ dominates $x2$ if $x1$ beats or ties $x2$ for each of the two objectives; mathematically, it is expressed as $f(i)(x1)\u2264f(i)(x2)$, $\u2200i\u2208{1,2}$, and $f(i)(x1)<f(i)(x2)$, $\u2203i\u2208{1,2}$.

## Results

In this study, we demonstrate the performance of the three approaches: DT-ISW-J, DT-ISW-S, and ST-ISW on Bentheimer sandstone, identifying five parameters by computationally producing *T*_{1} and *T*_{2} distributions almost identical with experiments using the dual-output NMR simulator. The individual optimization for the *T*_{1} and *T*_{2} are presented as reference methods. As was done in Li et al. (2021), practically the base-10 logarithm of the effective relaxation time and of the effective diffusion coefficient are used since they usually vary across orders of magnitude, and the search for $x$ is limited to a domain bounded by the following constraints:

0 $\u2264$$\rho 1,q$/(μm/s), $\rho 2,q$/(μm/s) $\u2264$ 16

−3 $\u2264$$log10\u2061(T1e,c/s)$, $log10\u2061(T2e,c/s)$$\u2264$ 0

−6.28 $\u2264$$log10\u2061[De,c/(cm2/s)]$$\u2264$ −4

and the corresponding 5D solution space $\mathcal{X}$ is rather large so that a grid search or random search is computationally intractable. We compare the performance of the following three approaches discussed in the previous section for the *T*_{1}, *T*_{2} inverse problem: DT-ISW-S, DT-ISW-J, and ST-ISW. As the reference method, the SEP minimizes *T*_{1} and *T*_{2} independently in the 3D solution space as was done in Li et al. (2021), ignoring that $log10\u2061De,c$ is a shared value. We summarize the description of the three approaches and the reference in Table 2 and note that the determined values $\rho 1,q$ and $\rho 2,q$ are the lattice-based discrete surface relaxivities of quartz, whereas the actual continuum values are $A$ times higher, and $A$ is 3/2 following Bergman et al. (1995). The optimization budget is 180 function evaluations.

Method . | Order of Optimization . | Transfer Learning . | Constraints . | Optimizer . |
---|---|---|---|---|

DT-ISW-J | Joint minimization | Yes | Linear | SQP |

DT-ISW-S | Sequential minimization | Yes | Box | SL-PSO and L-BFGS-B |

ST-ISW | Joint minimization | No | Linear | SQP |

SEP | – | No | Box | SL-PSO and L-BFGS-B |

Method . | Order of Optimization . | Transfer Learning . | Constraints . | Optimizer . |
---|---|---|---|---|

DT-ISW-J | Joint minimization | Yes | Linear | SQP |

DT-ISW-S | Sequential minimization | Yes | Box | SL-PSO and L-BFGS-B |

ST-ISW | Joint minimization | No | Linear | SQP |

SEP | – | No | Box | SL-PSO and L-BFGS-B |

### Comparison of the Speed of Convergence

Fig. 5a displays the best fitness value so far for $f(obj)$, together with its two components $f(T1)$ and $f(T2)$ in Figs. 5b and 5c for the three approaches, with SEP plotted as a reference. In terms of the objective $f(obj)$, both DT-ISW-S and DT-ISW-J are more efficient in the first 60 function evaluations, with the speed of convergence almost twice as fast as ST-ISW. In the following 120 steps, DT-ISW-J maintains the same rate of convergence until Step 83, where it finds the lowest fitness value, whereas DT-ISW-S continues to minimize $f(obj)$ and find a lower $f(obj)$ at Step 152. ST-ISW maintains the same convergence speed until the optimization budget is depleted. Minimization of $f(T1)$ is very similar to the case of minimization of $f(obj)$: In the first 32 steps, DT-ISW-J immediately reduces $f(T1)$, whereas due to low priority assigned to *T*_{1} fitting, DT-ISW-S cannot reduce $f(T1)$ since a good proposal of $x(T1)$ is subjected to the valid proposal of $x(T2)$. DT-ISW-S becomes effective since Step 32, and the overall rate of convergence is comparable to DT-ISW-J. However, ST-ISW, with a lower rate of convergence, steadily minimizes $f(T1)$ to the same level as DT-ISW-S, which is slightly higher than 1×10^{−3}. All approaches identified lower $f(T1)$ than the reference methods.

In terms of $f(T2)$, DT-ISW-J, DT-ISW-S, and SEP-*T*_{2} maintain a similar speed of convergence until the fitness values fall below 1$\xd710-3$ before Step 70. In the following steps, both SEP and DT-ISW-S achieve lower fitness values because they can freely evaluate in the solution space without constraints. Though at a slower rate of convergence, ST-ISW steadily reduces $f(T2)$ to the level of 1$\xd710-3$ until its last step. It can be inferred from Fig. 5 that the speed of convergence for both DT-ISW-S and DT-ISW-J are more than twice as fast as ST-ISW. DT-ISW-J achieves good *T*_{1} and *T*_{2} distribution agreement, whereas DT-ISW-S achieves better agreement for *T*_{2} and slightly worse agreement for *T*_{1}.

### Multimodal Solution Analysis by Solution Space Partitioning

As is described in the previous section, ST-ISW caters for the multimodal nature of the objective functions, with solution space partitioning applied in the solution analysis to cluster all solutions to different UPs. Here we apply the same solution analysis procedure to both DT-ISW-S and DT-ISW-J. Table 3 shows identified parameter values of the LMs for the three approaches and the reference. LMs with $f(obj)$ above 1$\xd710-2$ are not listed. All three approaches identified a consistent value of $x$, and all are capable of achieving lower $f(T1)$ values than the reference. SEP-*T*_{2} is affected by the problem of nonunique solutions—although achieving a lower $f(T2)$, the LM 1 identified by SEP-*T*_{2} is less physically valid than LM 2. As a comparison, all approaches identified at least 2 UPs, but from Table 3 we see that for each approach there is only one UP within which the solutions are both mathematically sound and physically valid.

UP . | R . | $\rho 1,q$(μm/s) . | (μm/s) $(\mu m/s)$ | $log10$ $[T1e,c/(s)]$ . | $log10$ $[T2e,c/(s)]$ . | $log10\u2061[De,c/$ $(cm2/s)]$ . | $f(T1)$ $\xd710\u22124$ . | $f(T2)$ $\xd710\u22124$ . | $f(obj)$ $\xd710\u22124$ . | Step . |
---|---|---|---|---|---|---|---|---|---|---|

DT-ISW-J | ||||||||||

1 | 1 | 4.557 | 7.886 | −1.739 | −1.739 | −5.127 | 19.72 | 5.82 | 25.55 | 83 |

2 | 5.749 | 8.232 | −1.775 | −1.853 | −4.891 | 16.69 | 9.85 | 26.54 | 108 | |

3 | 4.667 | 7.830 | −1.811 | −1.811 | −5.035 | 10.45 | 17.52 | 27.97 | 84 | |

2 | 35 | 4.852 | 7.206 | −2.520 | −2.656 | −4.258 | 29.99 | 17.31 | 47.30 | 168 |

36 | 4.392 | 7.219 | −2.606 | −2.635 | −4.306 | 28.67 | 19.19 | 47.86 | 62 | |

40 | 4.852 | 7.193 | −2.575 | −2.701 | −4.203 | 25.73 | 25.79 | 51.51 | 171 | |

DT-ISW-S | ||||||||||

1 | 1 | 4.937 | 7.888 | −1.797 | −1.797 | −5.026 | 14.35 | 4.01 | 18.36 | 152 |

2 | 5.045 | 8.005 | −1.742 | −1.742 | −5.035 | 12.38 | 8.24 | 20.62 | 127 | |

3 | 4.899 | 7.811 | −1.773 | −1.773 | −5.058 | 16.04 | 5.49 | 21.53 | 158 | |

2 | 53 | 5.770 | 8.689 | −2.581 | −2.581 | −4.000 | 30.25 | 18.88 | 49.13 | 92 |

54 | 5.656 | 8.517 | −2.614 | −2.614 | −4.000 | 29.64 | 19.53 | 49.18 | 100 | |

55 | 5.810 | 8.725 | −2.571 | −2.571 | −4.000 | 30.53 | 18.95 | 49.48 | 93 | |

ST-ISW | ||||||||||

1 | 1 | 5.155 | 7.113 | −1.794 | −2.040 | −5.001 | 15.87 | 10.36 | 26.23 | 159 |

2 | 5.134 | 7.092 | −1.774 | −2.006 | −5.045 | 13.96 | 14.51 | 28.47 | 150 | |

3 | 4.761 | 6.996 | −1.753 | −1.966 | −5.106 | 16.02 | 13.80 | 29.82 | 180 | |

2 | 17 | 5.163 | 7.988 | −2.738 | −2.738 | −4.000 | 29.65 | 23.88 | 53.53 | 166 |

19 | 6.291 | 8.490 | −2.469 | −2.614 | −4.000 | 35.61 | 20.00 | 55.61 | 158 | |

23 | 5.976 | 7.946 | −2.599 | −2.790 | −4.000 | 35.54 | 24.76 | 60.30 | 131 | |

SEP-T_{1} | ||||||||||

1 | 1 | 3.951 | - | −2.075 | - | −4.956 | 14.90 | - | - | 162 |

2 | 4.508 | - | −1.882 | - | −5.121 | 15.23 | - | - | 154 | |

3 | 4.612 | - | −1.842 | - | −5.135 | 15.45 | - | - | 177 | |

2 | 21 | 4.720 | - | −2.518 | - | −4.275 | 25.49 | - | - | 144 |

23 | 5.509 | - | −2.672 | - | −4.000 | 30.12 | - | - | 46 | |

26 | 4.860 | - | −2.863 | - | −4.000 | 38.66 | - | - | 99 | |

SEP-T_{2} | ||||||||||

1 | 1 | - | 7.288 | - | −2.068 | −4.907 | - | 2.91 | - | 115 |

2 | - | 7.471 | - | −2.032 | −4.895 | - | 5.11 | - | 139 | |

3 | - | 7.579 | - | −2.017 | −4.883 | - | 6.47 | - | 180 | |

2 | 78 | - | 6.552 | - | −1.958 | −5.247 | - | 23.18 | - | 94 |

99 | - | 7.986 | - | −1.691 | −5.236 | - | 39.85 | - | 164 | |

117 | - | 5.546 | - | −2.321 | −5.138 | - | 95.09 | - | 6 |

UP . | R . | $\rho 1,q$(μm/s) . | (μm/s) $(\mu m/s)$ | $log10$ $[T1e,c/(s)]$ . | $log10$ $[T2e,c/(s)]$ . | $log10\u2061[De,c/$ $(cm2/s)]$ . | $f(T1)$ $\xd710\u22124$ . | $f(T2)$ $\xd710\u22124$ . | $f(obj)$ $\xd710\u22124$ . | Step . |
---|---|---|---|---|---|---|---|---|---|---|

DT-ISW-J | ||||||||||

1 | 1 | 4.557 | 7.886 | −1.739 | −1.739 | −5.127 | 19.72 | 5.82 | 25.55 | 83 |

2 | 5.749 | 8.232 | −1.775 | −1.853 | −4.891 | 16.69 | 9.85 | 26.54 | 108 | |

3 | 4.667 | 7.830 | −1.811 | −1.811 | −5.035 | 10.45 | 17.52 | 27.97 | 84 | |

2 | 35 | 4.852 | 7.206 | −2.520 | −2.656 | −4.258 | 29.99 | 17.31 | 47.30 | 168 |

36 | 4.392 | 7.219 | −2.606 | −2.635 | −4.306 | 28.67 | 19.19 | 47.86 | 62 | |

40 | 4.852 | 7.193 | −2.575 | −2.701 | −4.203 | 25.73 | 25.79 | 51.51 | 171 | |

DT-ISW-S | ||||||||||

1 | 1 | 4.937 | 7.888 | −1.797 | −1.797 | −5.026 | 14.35 | 4.01 | 18.36 | 152 |

2 | 5.045 | 8.005 | −1.742 | −1.742 | −5.035 | 12.38 | 8.24 | 20.62 | 127 | |

3 | 4.899 | 7.811 | −1.773 | −1.773 | −5.058 | 16.04 | 5.49 | 21.53 | 158 | |

2 | 53 | 5.770 | 8.689 | −2.581 | −2.581 | −4.000 | 30.25 | 18.88 | 49.13 | 92 |

54 | 5.656 | 8.517 | −2.614 | −2.614 | −4.000 | 29.64 | 19.53 | 49.18 | 100 | |

55 | 5.810 | 8.725 | −2.571 | −2.571 | −4.000 | 30.53 | 18.95 | 49.48 | 93 | |

ST-ISW | ||||||||||

1 | 1 | 5.155 | 7.113 | −1.794 | −2.040 | −5.001 | 15.87 | 10.36 | 26.23 | 159 |

2 | 5.134 | 7.092 | −1.774 | −2.006 | −5.045 | 13.96 | 14.51 | 28.47 | 150 | |

3 | 4.761 | 6.996 | −1.753 | −1.966 | −5.106 | 16.02 | 13.80 | 29.82 | 180 | |

2 | 17 | 5.163 | 7.988 | −2.738 | −2.738 | −4.000 | 29.65 | 23.88 | 53.53 | 166 |

19 | 6.291 | 8.490 | −2.469 | −2.614 | −4.000 | 35.61 | 20.00 | 55.61 | 158 | |

23 | 5.976 | 7.946 | −2.599 | −2.790 | −4.000 | 35.54 | 24.76 | 60.30 | 131 | |

SEP-T_{1} | ||||||||||

1 | 1 | 3.951 | - | −2.075 | - | −4.956 | 14.90 | - | - | 162 |

2 | 4.508 | - | −1.882 | - | −5.121 | 15.23 | - | - | 154 | |

3 | 4.612 | - | −1.842 | - | −5.135 | 15.45 | - | - | 177 | |

2 | 21 | 4.720 | - | −2.518 | - | −4.275 | 25.49 | - | - | 144 |

23 | 5.509 | - | −2.672 | - | −4.000 | 30.12 | - | - | 46 | |

26 | 4.860 | - | −2.863 | - | −4.000 | 38.66 | - | - | 99 | |

SEP-T_{2} | ||||||||||

1 | 1 | - | 7.288 | - | −2.068 | −4.907 | - | 2.91 | - | 115 |

2 | - | 7.471 | - | −2.032 | −4.895 | - | 5.11 | - | 139 | |

3 | - | 7.579 | - | −2.017 | −4.883 | - | 6.47 | - | 180 | |

2 | 78 | - | 6.552 | - | −1.958 | −5.247 | - | 23.18 | - | 94 |

99 | - | 7.986 | - | −1.691 | −5.236 | - | 39.85 | - | 164 | |

117 | - | 5.546 | - | −2.321 | −5.138 | - | 95.09 | - | 6 |

Fig. 6 shows the fit corresponding to the LMs in Table 3. From appearance, the fittings for the *T*_{1} and *T*_{2} distributions are equally good for all three methods, but DT-ISW-J achieves good fit significantly faster using only 83 steps, as shown in Fig. 5 and Table 3. Apart from the good-enough-fit criteria, the physical validity should also be considered for selection of the best solutions, so that all candidates in Table 3 are checked if the parameters reach boundaries and if the effective diffusion coefficient is larger than bulk water (we use $Dw=2.15\xd710-5cm2/s$ under ambient conditions).

### Identification of the POS

Given Table 3, it is natural to nominate the solution corresponding to LM 1 as the final solution for the inverse problem of *T*_{1}, *T*_{2} parameter estimation. However, we observe that the two objectives are slightly conflicting. Taking DT-ISW-J as an example, the LM 1 achieves the lowest overall $f(obj)$ with a significantly lower value of $f(T2)$ compared with the second-best solution; however, the $f(T1)$ value of LM 1 is inferior to the second-best solution. Consequently, there is no *best* solution as no solution simultaneously minimizes both $f(T1)$ and $f(T2)$. The situation is also true for DT-ISW-J and ST-ISW, which implies that there is slight conflict between the two objectives—improvement of $f(T1)$ cannot be achieved without harming $f(T2)$ and vice versa. The conflict is also revealed in the result of SEP from Table 3 that if $f(T1)$ and $f(T2)$ are individually optimized, $x(T1)$ and $x(T2)$ will neither recover the same value of $log10\u2061De,c$ nor satisfy the constraints Eqs. 14 and 15. As a result, in addition to providing the LM 1 from the three approaches, we analyze the trade-off between the conflicting objectives and provide the POS to further balance the preference of *T*_{1} and *T*_{2} data fits.

The slight conflicts between the two objectives, which exists for all UPs and manifest themselves in log scale, are illustrated in Fig. 7.
As expected, for DT-ISW-J and ST-ISW the $f(T1)$ and $f(T2)$ are equally minimized, whereas for DT-ISW-S the $f(T2)$ is much more preferentially minimized. The identified POS as well as the corresponding *T*_{1} and *T*_{2} fits are shown in Table 4 and Fig. 8.
Comparing Table 4 with Table 3, we see that the POS does not always coincide with the top three solutions identified using solution space partitioning, and the POS gives us more flexibility to trade-off the conflicting objectives. Practically, we place slightly higher confidence in $f(T2)$ due to the larger number of acquired points in the acquisition sequence and nominate the candidate with the lowest $f(T2)$ under the condition that the $f(T1)$ is no more than 2 $\xd710-3$ as the final solution. Final solutions are in bold in Table 3. It is observed from Fig. 7 that candidates evaluated by either DT-ISW-J or DT-ISW-S are more clustered to the bottom-left, whereas candidates evaluated by ST-ISW seem to be evenly distributed in log scale. The following subsection discusses the evolution of the clusterization pattern of POS.

UP . | R . | $\rho 1,q$ (μm/s) . | $\rho 2,q$ (μm/s) . | $log10$ $[T1e,c/(s)]$ . | $log10$ $[T2e,c/(s)]$ . | $log10\u2061[De,c/$ $(cm2/s)]$ . | $f(T1)$ $\xd710\u22124$ . | $f(T2)$ $\xd710\u22124$ . | $f(obj)$ $\xd710\u22124$ . | Step . |
---|---|---|---|---|---|---|---|---|---|---|

DT-ISW-J | ||||||||||

1 | 1 | 4.557 | 7.886 | −1.739 | −1.739 | −5.127 | 19.72 | 5.82 | 25.55 | 83 |

2 | 5.749 | 8.232 | −1.775 | −1.853 | −4.891 | 16.69 | 9.85 | 26.54 | 108 | |

3 | 4.667 | 7.830 | −1.811 | −1.811 | −5.035 | 10.45 | 17.52 | 27.97 | 84 | |

8 | 4.600 | 7.293 | −1.855 | −1.909 | −5.027 | 7.95 | 25.26 | 33.22 | 94 | |

2 | 35 | 4.852 | 7.206 | −2.520 | −2.656 | −4.258 | 29.99 | 17.31 | 47.30 | 168 |

36 | 4.392 | 7.219 | −2.606 | −2.635 | −4.306 | 28.67 | 19.19 | 47.86 | 62 | |

40 | 4.852 | 7.193 | −2.575 | −2.701 | −4.203 | 25.73 | 25.79 | 51.51 | 171 | |

70 | 4.851 | 7.191 | −2.583 | −2.730 | −4.192 | 23.34 | 46.27 | 69.61 | 180 | |

DT-ISW-S | ||||||||||

1 | 1 | 4.937 | 7.888 | −1.797 | −1.797 | −5.026 | 14.35 | 4.01 | 18.36 | 152 |

2 | 5.045 | 8.005 | −1.742 | −1.742 | −5.035 | 12.38 | 8.24 | 20.62 | 127 | |

20 | 5.046 | 7.998 | −1.730 | −1.730 | −5.046 | 11.10 | 21.15 | 32.25 | 128 | |

2 | 109 | 4.899 | 8.870 | −2.554 | −2.554 | −4.000 | 79.58 | 18.79 | 98.37 | 44 |

53 | 5.770 | 8.689 | −2.581 | −2.581 | −4.000 | 30.25 | 18.88 | 49.13 | 92 | |

54 | 5.656 | 8.517 | −2.614 | −2.614 | −4.000 | 29.64 | 19.53 | 49.18 | 100 | |

169 | 4.745 | 15.663 | −2.103 | −2.219 | −4.754 | 15.08 | 2,683.22 | 2,698.30 | 164 | |

ST-ISW | ||||||||||

1 | 4 | 4.823 | 7.025 | −1.762 | −1.979 | −5.086 | 19.84 | 10.15 | 29.99 | 179 |

1 | 5.155 | 7.113 | −1.794 | −2.040 | −5.001 | 15.87 | 10.36 | 26.23 | 159 | |

2 | 5.134 | 7.092 | −1.774 | −2.006 | −5.045 | 13.96 | 14.51 | 28.47 | 150 | |

5 | 5.137 | 7.115 | −1.813 | −2.068 | −4.974 | 11.61 | 19.12 | 30.73 | 173 | |

2 | 19 | 6.291 | 8.490 | −2.469 | −2.614 | −4.000 | 35.61 | 20.00 | 55.61 | 158 |

17 | 5.163 | 7.988 | −2.738 | −2.738 | −4.000 | 29.65 | 23.88 | 53.53 | 166 | |

29 | 5.294 | 7.092 | −2.681 | −3.000 | −4.000 | 28.68 | 47.36 | 76.05 | 85 |

UP . | R . | $\rho 1,q$ (μm/s) . | $\rho 2,q$ (μm/s) . | $log10$ $[T1e,c/(s)]$ . | $log10$ $[T2e,c/(s)]$ . | $log10\u2061[De,c/$ $(cm2/s)]$ . | $f(T1)$ $\xd710\u22124$ . | $f(T2)$ $\xd710\u22124$ . | $f(obj)$ $\xd710\u22124$ . | Step . |
---|---|---|---|---|---|---|---|---|---|---|

DT-ISW-J | ||||||||||

1 | 1 | 4.557 | 7.886 | −1.739 | −1.739 | −5.127 | 19.72 | 5.82 | 25.55 | 83 |

2 | 5.749 | 8.232 | −1.775 | −1.853 | −4.891 | 16.69 | 9.85 | 26.54 | 108 | |

3 | 4.667 | 7.830 | −1.811 | −1.811 | −5.035 | 10.45 | 17.52 | 27.97 | 84 | |

8 | 4.600 | 7.293 | −1.855 | −1.909 | −5.027 | 7.95 | 25.26 | 33.22 | 94 | |

2 | 35 | 4.852 | 7.206 | −2.520 | −2.656 | −4.258 | 29.99 | 17.31 | 47.30 | 168 |

36 | 4.392 | 7.219 | −2.606 | −2.635 | −4.306 | 28.67 | 19.19 | 47.86 | 62 | |

40 | 4.852 | 7.193 | −2.575 | −2.701 | −4.203 | 25.73 | 25.79 | 51.51 | 171 | |

70 | 4.851 | 7.191 | −2.583 | −2.730 | −4.192 | 23.34 | 46.27 | 69.61 | 180 | |

DT-ISW-S | ||||||||||

1 | 1 | 4.937 | 7.888 | −1.797 | −1.797 | −5.026 | 14.35 | 4.01 | 18.36 | 152 |

2 | 5.045 | 8.005 | −1.742 | −1.742 | −5.035 | 12.38 | 8.24 | 20.62 | 127 | |

20 | 5.046 | 7.998 | −1.730 | −1.730 | −5.046 | 11.10 | 21.15 | 32.25 | 128 | |

2 | 109 | 4.899 | 8.870 | −2.554 | −2.554 | −4.000 | 79.58 | 18.79 | 98.37 | 44 |

53 | 5.770 | 8.689 | −2.581 | −2.581 | −4.000 | 30.25 | 18.88 | 49.13 | 92 | |

54 | 5.656 | 8.517 | −2.614 | −2.614 | −4.000 | 29.64 | 19.53 | 49.18 | 100 | |

169 | 4.745 | 15.663 | −2.103 | −2.219 | −4.754 | 15.08 | 2,683.22 | 2,698.30 | 164 | |

ST-ISW | ||||||||||

1 | 4 | 4.823 | 7.025 | −1.762 | −1.979 | −5.086 | 19.84 | 10.15 | 29.99 | 179 |

1 | 5.155 | 7.113 | −1.794 | −2.040 | −5.001 | 15.87 | 10.36 | 26.23 | 159 | |

2 | 5.134 | 7.092 | −1.774 | −2.006 | −5.045 | 13.96 | 14.51 | 28.47 | 150 | |

5 | 5.137 | 7.115 | −1.813 | −2.068 | −4.974 | 11.61 | 19.12 | 30.73 | 173 | |

2 | 19 | 6.291 | 8.490 | −2.469 | −2.614 | −4.000 | 35.61 | 20.00 | 55.61 | 158 |

17 | 5.163 | 7.988 | −2.738 | −2.738 | −4.000 | 29.65 | 23.88 | 53.53 | 166 | |

29 | 5.294 | 7.092 | −2.681 | −3.000 | −4.000 | 28.68 | 47.36 | 76.05 | 85 |

### Sensitivity Analysis

The inferred parameterization of LCM is displayed in Table 5**,** corresponding to the steps when the LM 1 is identified in Table 3. Taking DT-ISW-J as an example, both $f(T1)$ and $f(T2)$ can be described as a linear combination of samples from two GPs, *u*_{1} and *u*_{2}, in addition to the constant mean and Gaussian noise *u*_{3}:

Approach . | $\theta v,L,1$ . | $\theta v,L,2$ . | $m0$ . | $\sigma n2$ . | $\u2113$ . |
---|---|---|---|---|---|

DT-ISW-J | $[0.3440.6450]$ | $[1.4353.6800]$ | $[0.7101.819]$ | $[8.991\xd710\u221276.682\xd710\u22127]$ | $[5.030,1.179,1.71710.075,2.316,3.197]$ |

DT-ISW-S | $[0.2130.2590]$ | $[1.0542.3970]$ | $[1.8190.791]$ | $[1.164\xd710\u221262.533\xd710\u22127]$ | $[3.818,1.225,1.79211.077,2.017,1.871]$ |

ST-ISW | $2.287$ | – | $1.878$ | $[14.046,11.236,1.358,2.622,2.153]$ | |

SEP-T_{1} | $0.312$ | – | $0.560$ | $1.424\xd710\u22127$ | $[5.605,0.899,2.102]$ |

SEP-T_{2} | $0.182$ | – | $0.482$ | $1.219\xd710\u22126$ | $[4.788,0.973,2.062]$ |

Approach . | $\theta v,L,1$ . | $\theta v,L,2$ . | $m0$ . | $\sigma n2$ . | $\u2113$ . |
---|---|---|---|---|---|

DT-ISW-J | $[0.3440.6450]$ | $[1.4353.6800]$ | $[0.7101.819]$ | $[8.991\xd710\u221276.682\xd710\u22127]$ | $[5.030,1.179,1.71710.075,2.316,3.197]$ |

DT-ISW-S | $[0.2130.2590]$ | $[1.0542.3970]$ | $[1.8190.791]$ | $[1.164\xd710\u221262.533\xd710\u22127]$ | $[3.818,1.225,1.79211.077,2.017,1.871]$ |

ST-ISW | $2.287$ | – | $1.878$ | $[14.046,11.236,1.358,2.622,2.153]$ | |

SEP-T_{1} | $0.312$ | – | $0.560$ | $1.424\xd710\u22127$ | $[5.605,0.899,2.102]$ |

SEP-T_{2} | $0.182$ | – | $0.482$ | $1.219\xd710\u22126$ | $[4.788,0.973,2.062]$ |

where $u1\u223cGP(0,kSE,1(x,x\u2032))$, $u2\u223cGP(0,kSE,2(x,x\u2032))$, and $u3\u223c\mathcal{N}(0,1)$, respectively, and

It is clear from Table 5 that DT-ISW-J considers the two objective functions as the combination of a major component with larger length scales (*u*_{2}) capturing the trends of the objective functions and a secondary component with smaller length scales (*u*_{1}) catering for the local variations of the objective functions. Meanwhile, the amplitude of the secondary component is less than one-fourth of the main component. This is the same for DT-ISW-S. As introduced before, since automatic relevance determination determines the sensitivity of the objective function against each component of the input $x$; we immediately conclude from Eq. 33 that an increase or decrease of 10.075 in $\rho 1,q$/(μm/s) or $\rho 2,q$*ρ*_{2}_{,}_{q}/(μm/s) is equivalent to an increase or decrease of 2.316 in $log10\u2061(T1e,c/s)$ or $log10\u2061(T2e,c/s)$ or a change of 3.197 in $log10\u2061[De,c/(cm2/s)]$ in terms of the reduction in $f(obj)$. Therefore, comparison of the sensitivity can be expressed as: $log10\u2061T1e,c$, $log10\u2061T2e,c$ > $log10\u2061De,c$ > $\rho 1,q$, $\rho 2,q$. Ideally, the unknowns should be chosen so that $f(obj)$ is equally sensitive to all unknowns. Practically, a physical quantity is considered redundant or insensitive if the associated inferred length scale is too large. For the chosen experimental settings and sample this is the case for surface relaxivities of the minority phases and the spatial distribution of susceptibility. Namely, feldspar is considered identical with quartz as the change in $f(obj)$ is negligible even though the properties of feldspar change moderately. Similarly, the volumetric magnetic susceptibilities $\chi \nu $ of the constituent materials are regarded as fixed parameters in the presence of short $tE$ and small $B0$.

### Comparing the Efficiency Dynamically

Another observation is that for DT-ISW-J and DT-ISW-S, most function evaluations are spent in the promising region where both $f(T1)$ and $f(T2)$ are low, whereas ST-ISW selects significantly more candidates in the nonpromising region for exploration. To illustrate how the three approaches efficiently minimize the two objectives by trading off exploration and exploitation, we show in Figs. 9a through 9d the top 10 candidates corresponding to each UP identified using the three approaches at 60, 100, 140, and 180 steps. The four plots clearly show that all approaches successfully identified the promising region with various speeds of convergence.

In summary, DT-ISW-S and DT-ISW-J are more than twice as efficient as ST-ISW mainly due to the following: (1) Both DT-ISW-S and DT-ISW-J simplify the regression by decomposition of the 5D problem into a dual 3D problem; (2) both DT-ISW-S and DT-ISW-J incorporate transfer learning ability, so that the *T*_{1} and *T*_{2} objectives can leverage information from each other. It is worth noting that both DT-ISW-J and ST-ISW equally minimize $f(T1)$ and $f(T2)$, whereas DT-ISW-S is more powerful in minimization of $f(T2)$ due to the higher priority assigned to $f(T2)$. For optimization problems involving multiple objectives, the multimodal search strategy together with the Pareto solution analysis gives us more flexibility in trading off the conflicting objectives.

## Discussion

### Nonunique Solution Sets

One major concern of the inverse problem is that many different sets of property estimates may provide satisfactory data fits, requiring additional knowledge to identify the most probable and realistic solution set. The number of nonunique solution sets is positively correlated to the number of main minerals or main unknowns but is usually reduced by the accompanying constraints. If the main phase physical properties are dependent [e.g., via (in)equality constraints], incorporation of multiple measurements not only significantly speeds up the convergence but also reduces the number of nonunique solution sets and increases the confidence level of the results (e.g., the workflow for simultaneous fitting of *T*_{1} and *T*_{2} presented in this work identifies to two or less unique solution sets compared with our previous work of fitting *T*_{2} distributions under various signal-to-noise ratios in which usually two to five unique solution sets are identified) (Li et al. 2021). If they are independent, the number of nonunique solution sets will increase, but there is still very high chance that the global optimal solution set is captured due to application of the multistart SQP optimizer under a larger optimization budget.

### Extension to Higher Fields

In this paper, internal magnetic field effects on NMR relaxation resulting from phase susceptibility contrast are incorporated directly into the simulation of *T*_{2} relaxation responses. Furthermore, their influence on relaxation is small due to the chosen small interecho time ($tE=200$ μs) and the relatively low NMR proton frequency of 2 MHz. At higher fields, *T*_{2} is sensitive to internal field inhomogeneities, necessitating an accurate calculation of the internal magnetic field given the spatial distribution of the volumetric magnetic susceptibility. A first step toward this goal was recently presented by Cui et al. (2022), enabling a high-resolution multiscale calculation of the internal magnetic field. The latter may require an even more sophisticated characterization workflow to assign mineral phases in segmented images and the direct measurement of the susceptibility of some of the individual phases to constrain modeling. The advantage would be the access to *T*_{1} frequency dispersion [in particular $\rho 1(f)$] for wettability characterization and the fact that multiple *T*_{1} measurements at different field strength or *T*_{2} measurements with varying interecho time can further constrain the inversion for unknown intrinsic surface properties.

### Application to Other Rocks

The samples considered in this work are outcrops of the Bentheimer sandstone formation, which contains a relatively minute portion of clay minerals and is relatively homogeneous. Many sandstone reservoir formations exhibit a higher degree of heterogeneity and thus wider distribution in pore sizes but also a more complex mineralogy. This poses challenges in terms of balancing field of view and resolution in the micro-CT or generally the 3D image acquisition and processing part of the presented workflow. None of these are insurmountable however: for NMR responses, the self-diffusion coefficient of relevant fluids and gases, and hence traveled distance for relevant relaxation times, implies that responses of domains larger than about 1 mm^{3} are independent. Thus, heterogeneity in pore and grain sizes can be dealt with by a decomposition of larger cores (full core or core plugs) into smaller domains, potentially combined with a microstructure typing regime to enable a multiscale image acquisition workflow, for example, Jiang and Arns (2020a,-2021). NMR responses can then be derived as ensemble averages over respective domains. Regional variation in clay content or mineralogy may cause local apparent surface relaxivities; decomposing these into intrinsic mineral-specific ones as demonstrated in this work and accounting for local variations in clay distribution should provide an efficient pathway for samples with more complex mineralogy. Given the very efficient workflow presented, multiscale solvers such as Cui et al. (2022) can be utilized in this approach despite their high computational cost.

It is clear that for reservoir rocks the wettability state and wettability alteration for future enhanced oil recovery scenarios are important considerations. The presented workflow, in particular when combined with pore-scale modeling of fluid flow, may provide significant additional insight into the distribution of surfaces with different wetting properties. This may lead to enhanced interpretations of NMR *T*_{1} and *T*_{2} spectra or higher-dimensional NMR measurements like *T*_{1}-*T*_{2} correlation maps at partial fluid saturations.

The application of the presented technique to long cored sections of reservoirs would benefit from transferring knowledge from already processed cores when similarities between core samples exist. This leads to the concept of transfer learning for study of rock heterogeneity, which is out of scope for the work presented here (see Li et al. 2022).

Finally, we comment on future applications to carbonate rocks, which are often mono-mineralic but exhibit a complex multiscale pore structure. Here the spatial distribution of wettability and connectivity of the pore space may be considered the major target variables in an inverse solution workflow.

## Conclusions

In this paper, we introduced a DT-ISW enabling the extraction of multiple intrinsic physical properties associated with *T*_{1} and *T*_{2} relaxation responses by joint minimization of the deviation between simulated and measured *T*_{1} and *T*_{2} distributions. An LCM is used in the implementation of transfer learning. This, together with the adaption of the NMR simulator providing both *T*_{1} and *T*_{2} responses simultaneously, makes the implementation highly efficient.

The workflow is demonstrated on Bentheimer sandstone, identifying five parameters by computationally producing *T*_{1} and *T*_{2} distributions almost identical with experiments using the dual-output simulator. We compared the performance of a joint optimization (DT-ISW-J) of the L2 norm of *T*_{1} and *T*_{2} fitting residuals with a sequential implementation (DT-ISW-S) and a single-task optimization (ST-ISW) and draw the following specific conclusions:

DT-ISW-J and DT-ISW-S achieve twice the speed of convergence of ST-ISW due to the statistical modeling of the two highly correlated objectives aided by transfer learning using LCM.

DT-ISW-J and ST-ISW equally minimize the L2 norm of

*T*_{1}and*T*_{2}fitting residuals, whereas DT-ISW-S preferentially minimizes the objective assigned higher importance.A POS allows balancing

*T*_{1}and*T*_{2}data fits as fitting*T*_{1}and*T*_{2}responses can result in conflicting objectives.For both

*T*_{1}and*T*_{2}, measured and simulated relaxation time distributions are in excellent agreement over the entire distribution function including short components. This suggests that the conventional interpretation of NMR relaxation time distributions can be carried out on simulated responses, including cutoff analysis for permeability correlations or estimation of bound water. Given the excellent agreement to experiment and availability of micro-CT images, more complex mappings to petrophysical parameters may be introduced.The method extracts the intrinsic surface relaxivities associated with mineral-fluid contacts, not the heuristic surface relaxivities typically extracted when mapping to external length scale measurements (e.g., mercury intrusion capillary measurements). It also accounts for internal gradients directly. This can be of significant interest when interpreting

*T*_{1}and*T*_{2}measurements at different field strength.While the study is focused on identification of parameters governing

*T*_{1}and*T*_{2}relaxation, the method can be easily extended to correlation measurements (e.g.,*T*_{1}-*T*_{2}and $D$-*T*_{2}). In turn, respective simulations should provide significant insights into the cause of various features of these distributions and thus improve their interpretation.

## Nomenclature

*$aEI,T1andaEI,T2$*EI acqusitions function related to the

*T*_{1}and*T*_{2}objectives, respectively**B**or**B**(**r**)total magnetic field (at location

**r**), Gauss**B**_{0}applied static magnetic field, Gauss

**B**_{dip}dipole field, Gauss

**B**_{i}or**B**_{i}(**r**)inhomogeneous (offset) internal magnetic field (at location

**r**), Gauss**B**_{q′}*Q*×*Q*covariance matrix measuring the relationship between the*Q*outputs of the objective*q*′*B*_{z}*z*component of the total magnetic field, Gauss*D*length of the input

**x**, equivalent to the number of unknowns*D*_{e,c}effective diffusion coefficient of water in the clay region, cm

^{2}/s*D*_{e}or*D*_{e,pq}diffusion coefficient (from phase

*p*to phase*q*), cm^{2}/s*D*_{obs}observation set

*f*^{(obj)}or*f*^{(obj)}(**x**)objective function (as a function of

**x**)*f*_{p},*f*_{c},*f*_{q},*f*_{f},*f*_{h}phase fraction of bulk pore, clay, quartz, feldspar, and high-density phase, respectively, calculated from CT images

- $f(T1)andf(T2),orf(T1)(x(T1))andf(T2)(x(T2))$
objective functions associated with

*T*_{1}and*T*_{2}(as a function of $x(T1)$ and $x(T2)$, respectively *f*(**x**)scalar-valued GP

**f**(**x**)vector-valued GP

*G*average or effective internal gradient strength, Gauss

*$GP$*Gaussian process; for scalar-valued $GP,f(x)\u223cGP(m(x),k(x,x\u2032))$, the function is distributed as a Gaussian process with mean function

**m**(**x**) and covariance function $k(x,x\u2032)$; for vector-valued GP, $f(x)\u223cGP(m(x),K(x,x\u2032))$, the function is distributed as a Gaussian process with mean function**m**(**x**) and covariance function $K(x,x\u2032)$*I*identity matrix

*k*_{SE}, or*k*_{SE}(**x**,_{p}**x**)_{q}parameterized squared exponential kernel (for candidates

**x**and_{p}**x**)_{q}- $k(x,x\u2032)$
covariance function for scalar-valued GP

**K**or**K**_{f}(*X, X*)covariance matrix for the (noise-free) latent function values

**f**for vector-valued GP*$KforKf(X,X)$*covariance matrix for the (noise-free) latent function values

**f**for scalar-valued GP- $Kf,q\u2032orKf,q\u2032(X,X)$
covariance matrix for the (noise-free) latent function values

**f**for the objective*q*′ for vector-valued GP - $Kf(x\u2217,x\u2217)$
variance of predicted candidate in scalar-valued GP

- $Kf(x\u2217,X)$
covariance between predicted and observed candidates in scalar-valued GP

**K**_{f}(**x**^{∗},**x**^{∗})covariance of predicted candidate for vector-valued GP

**K**(_{f}**x**^{∗},*X*)covariance between predicted and observed candidates for vector-valued GP

- $K(x,x\u2032)$
covariance function for vector-valued GP, with its entries $(K(x,x\u2032))q,q\u2032$ expressing the correlation not only between the input pairs

**x**and**x**′ but also between the outputs pairs $fq(x)$ and $fq\u2032(x\u2032)$ *l*_{s}grid refinement factor

*m*_{0}scalar-valued constant mean function

**m**_{0}vector-valued constant mean function

- $m0\u2217$
constant mean for a single predicted candidate

- $m0\u2217$
vector of constant mean for a single predicted candidate

**m**_{1,exp}and**m**_{t,exp}measured longitudinal and transverse relaxation decay, respectively

**m**_{1,sim}and**m**_{t,sim}simulated longitudinal and transverse relaxation decay, respectively

- $mT1(t)andmT2(t)$
*T*_{1}and*T*_{2}magnetization decay as function of*t*, respectively *m*(**x**)scalar-valued mean function

**$m(x)$**vector-valued mean function, equivalent to ${mq(x)}q=1Q$

*M*_{0}initial signal strength for a random walker

- $MT1(t)andMT2(t)$
overall

*T*_{1}and*T*_{2}magnetization decays, respectively - $MT1,w(tNs)andMT2,w(tNs)$
recorded

*T*_{1}and*T*_{2}decays for a random walker after*N*time steps, respectively_{s} *n*number of observed candidates

*n*_{E}echo count

*N*number of phases

*N*_{E}total number of echoes

*N*_{s}total number of steps carried out by a random walker

- $N(\mu (X\u2217),\u2211(X\u2217))$
normal distribution: a Gaussian (normal) distribution with mean vector

**μ**and covariance matrix Σ *p*_{mov}or*p*_{mov,pq}movement probability (traversing from phase

*p*to phase*q*)*Q*number of correlated objective functions

- $Q\u2032$
total number of seperable kernels

**r**_{0}and**r**_{j}location of the start and later positions of a random walker, respectively, µm

**r**_{s}and**r**_{s+1}position of a random walker at step count

*s*and*s*+ 1, respectively, µm*s*step count

*S*surface area of an individual pore, µm

^{2}- $ST1andST2orST1,pqandST2,pq$
*T*_{1}and*T*_{2}signal damping factors (traversing from phase*p*to phase*q*), respectively *t*elapsed time, seconds

*t*_{E}echo time, seconds

*t*_{s}and*t*_{s+1}internal clock for a random walker at step count

*s*and*s*+ 1, respectively, seconds*T*_{1b}and*T*_{2b}*T*_{1}and*T*_{2}relaxation time, respectively, seconds*T*_{1e,c},*T*_{1b,w}effective

*T*_{1}of the clay region and bulk water, respectively, seconds*T*_{2e,c},*T*_{2b,w}effective

*T*_{2}of the clay region and bulk water, respectively, seconds*V*volume of an individual pore, μm

^{3}*w*an individual walker

**x**input for the objective function, $x=(x1,x2,...xD)T$

**x***a single predicted candidate

**x**_{l}and**x**_{u}lower and upper bounds of the pre-specified search domain

- $xq\xaf$
the current best observed candidate for the

*q*th objective function **x**^{(shared)}shared component of $x(Ti)$

- $x(Ti)$
lower-dimension component of

**x**,*i*$\u2208$ {1, 2} - $x(Ti,unique)$
unique component of $x(Ti)$

- $x(T1)andx(T2)$
input for the objective function associated with

*T*_{1}and*T*_{2}, respectively *X*matrix of observed candidates

- $X$
solution space

*χ*_{ν}volumetric magnetic susceptibility, µSI

*χ*_{ν,w},*χ*_{ν,c},*χ*_{ν,q},*χ*_{ν,f},*χ*_{ν,h}volumetric magnetic susceptibility of water, clay, quartz, feldspar and high-density phase, respectively, µSI

**y**vector of observations

*y*or*y*_{q}prediction (for the objective

*q*)**y***vector of predictions

*y**prediction for a single candidate

*γ*gyromagnetic ratio, 2.6753×10

^{4}, rad / (Gauss⋅s)*ε*lattice resolution, or voxelsize, µm

*ε*_{w}step length of a random walk, µm

- Δ
*t*or Δ*t*_{pq}timestep for a random walker (traversing from phase

*p*to phase*q*), seconds *$\Delta \varphi D,w$*total phase shift for a random walker, rad

- $\theta s\u2032or\theta s$
vector of model hyperparameters for scalar-valued GP (in log

_{10}scale) - $\theta v\u2032or\theta v$
vector of model hyperparameters for vector-valued GP (in log

_{10}scale) - $\u2211x\u2217$
*Q*×*Q*predictive covariance matrix for a single observed candidate using vector-valued GP - $\u2211n$
*Q*×*Q*diagonal matrix accommodating task-dependent noise levels, $\u2211n=diag(\sigma n2)$ - $\zeta sim(T1),\zeta exp(T1),\zeta sim(T2)and\zeta exp(T2)$
probability density function of

*T*_{1}and*T*_{2}, inverted from simulated and experimental decay, respectively - $\zeta sim(T1|x(T1))and\zeta sim(T2|x(T2))$
probability density function of

*T*_{1}and*T*_{2}, inverted from simulated decay and conditioned on the input $x(T1)andx(T2)$, respectively *η*(_{n}*t*)white noise

**Λ**matrix of characteristic length scales,

**Λ**= diag (ℓ)^{−2}*λ*regularization parameter

- $\mu (x\u2217)$
predictive mean for a single predicted candidate using scalar-valued GP

- $\mu (x\u2217)$
predictive mean for a single predicted candidate using vector-valued GP

*ρ*_{1,q}and*ρ*_{2,q}*T*_{1}and*T*_{2}surface relaxivity of quartz, respectively, µm/ s- $\rho 1\u2032and\rho 2\u2032$
longitudinal and transverse surface relaxivities derived by adjusting the dominant mode of the NMR relaxation time measurements to an external length scale reference measurement, respectively, µm/ s

*ρ*_{1}and*ρ*_{2}, or*ρ*_{1,pq}and*ρ*_{2,pq}longitudinal and transverse surface relaxivity (traversing from phase

*p*to phase*q*), respectively, µm/s- $\sigma f2$
signal amplitude of covariance function

- $\sigma n2$
noise variance of covariance function

- $\sigma n2$
vector of noise variance,$\sigma n2=(\sigma n,12,...,\sigma n,Q2)T$

- $\sigma 2(x\u2217)$
predictive variance for a single predicted candidate using scalar-valued GP

*τ*time delay in inversion-recovery pulse sequence, seconds

*τ*_{E}interecho time, seconds

*φ*(*x*)probability density function

- Φ(
*x*)cumulative distribution function

- $\varphi $
porosity from experiment

- $\varphi D$
accumulated phase shift due to diffusion in

**B**_{i}, rad - $\varphi sim$
porosity from simulation

- $\varphi sim,pand\varphi sim,q$
calculated porosity from simulation, related to phase

*p*and*q*, respectively

## Data Statement

The tomogram data that support the findings of this study are made available online at the Digital Rocks Portal (https://doi.org/10.17612/1J6K-SH07).

## Acknowledgments

CHA acknowledges the Australian Research Council for funding through ARC discovery project DP200103548. This work was supported by computational resources provided by the Australian Government through gadi under the National Computational Merit Allocation Scheme (grant m65).

### Appendix A—Derivation of LCM

Given the data set $\mathcal{D}obs$, the *j*th outputs are indeed expressed as the sum of the mean function and linear combinations of independent random functions, that is,

where the latent functions $uq(x)$ are independent samples from a specific GP following $\mathcal{N}(0,kq(x,x\u2032))$, $aj,q$ is the scalar coefficient specifying relative contribution associating with each $uq(x)$ for the *j*th task. The independence assumption can be relaxed when the objective functions are expressed through a linear combination of nonorthogonal latent functions (Vargas-Guzmán et al. 2002). The significance of nonorthogonality is that in some real cases, a linear combination may be required to couple several correlated attributes in a single model to include negative cross-covariance between components. Since the modeled objectives are usually considered positively correlated, expressing the objective function by nonorthogonal latent functions are not thoroughly discussed here and we leave it to the interested readers for in-depth studies. Here we implement automatic relevance determination to the covariance function $kq(x,x\u2032)$. Under independency assumption, the covariance between $uq(x)$ is given by $cov\u2061[uq(x),uq\u2032(x\u2032)]=kq(x,x\u2032)$ if $q=q\u2032$, and Eq. (A.1) can be rewritten by grouping together the latent functions $uq(x)$ which share the same covariance, that is,

Eq. (A-2) can be viewed as linear combinations of $Q$ groups of functions $uqi(x)$ within which the independent functions $uqi(x)$ share the same covariance. Under independency assumption, the covariance between any pair of $uqi(x)$ can expressed as

which exactly specifies the entries of $(\mathbf{K}(x,x\u2032))j,j\u2032$. The multitask kernel can be then expressed as

where $bj,j\u2032q=\u2211i=1Rqaj,qiaj\u2032,qi$ specifies entries of the coregionalization matrix $Bq$.

### Appendix B—Parameterization of LCM

Compared with parameterization of the scalar-valued GP in Li et al. (2021), the vector-valued GP requires inference on the additional correlation matrix $Bq\u2032$. To enforce positive definiteness and also suggested by Swersky et al. (2013), we represent $Bq\u2032$ using its Cholesky factor $Lq\u2032$, where $Bq\u2032=Lq\u2032Lq\u2032\u22a4$, as

whereas enforcing that the diagonal of $Lq\u2032$ is nonnegative. Assuming constant mean functions for each task with a full-rank LCM, we give the parameterization of the vector-valued GP in Table B-1.

Task . | $\theta v,L$ . | $\sigma n2$ . | $m0$ . | $\u2113$ . |
---|---|---|---|---|

1 | $L1,1,1,\u2026,LQ,Q,1$ | $\sigma n,12$ | $m0,1$ | $l1,12,\u2026l1,D2$ |

2 | $L1,1,2,\u2026,LQ,Q,2$ | $\sigma n,22$ | $m0,2$ | $l2,12,\u2026l2,D2$ |

… | … | … | … | … |

$Q$ | $L1,1,Q,\u2026,LQ,Q,Q$ | $\sigma n,Q2$ | $m0,Q$ | $lQ,12,\u2026lQ,D2$ |

Task . | $\theta v,L$ . | $\sigma n2$ . | $m0$ . | $\u2113$ . |
---|---|---|---|---|

1 | $L1,1,1,\u2026,LQ,Q,1$ | $\sigma n,12$ | $m0,1$ | $l1,12,\u2026l1,D2$ |

2 | $L1,1,2,\u2026,LQ,Q,2$ | $\sigma n,22$ | $m0,2$ | $l2,12,\u2026l2,D2$ |

… | … | … | … | … |

$Q$ | $L1,1,Q,\u2026,LQ,Q,Q$ | $\sigma n,Q2$ | $m0,Q$ | $lQ,12,\u2026lQ,D2$ |

In addition to the $Q$ mean and $Q$ noise variance hyperparameters, LCM calls for $Q2(Q+1)/2$ hyperparameters in the Cholesky factor and $DQ$ in length scales. In summary, the parameterization of the vector-valued GP using LCM can be expressed as $\theta v\u2032=[\theta v,L,\sigma n2,m0,\u2113],$ and practically the following parameterization scheme is adopted, that is,

The full-rank LCM ($Q\u2032=Q$) and reduced-rank LCM ($Q\u2032\u2264Q$) are suitable for objectives with moderate and high similarities respectively, and in this study the full-rank LCM is adopted given no information about the intertask correlation.

In ST-ISW, we adopted a fully Bayesian treatment on model hyperparameters (Snoek et al. 2012; Li et al. 2021). However, such treatment soon becomes intractable due to the cubic growth of the computational burden with the number of observed candidates. As a result, the model hyperparameters are determined by maximum likelihood estimation. We use the multistart L-BFGS-B to ensure that the hyperparameter settings is globally optimum.

## Article History

Original SPE manuscript received for review 22 December 2021. Revised manuscript received for review 26 April 2022. Paper (SPE 210563) peer approved 28 April 2022.

## References

_{1}- T

_{2}Response of Moveable and Non-Moveable Fluids in Conventional and Unconventional Rocks