# Introduction

# #1.1Research context

One of the goals of the applied geosciences is to gain an understanding of subsurface structures and processes. These understandings are often used to make predictions and decisions associated with commercial and environmental challenges, including contaminant delineation, resource exploration, reservoir optimization, and watershed characterization. The accuracy of these predictions can have far-reaching economic and environmental implications. There are many disciplines and skills that are involved in providing predictions, and increasingly these disciplines must collaborate and integrate their domain-specific knowledge. In a managed aquifer recharge project, for example, the goal is to infiltrate water into the subsurface for storage and subsequent recovery. Throughout the lifetime of the project, monitoring and management of the infiltration site is necessary (e.g. Racz *et al.*, 2011Daily *et al.*, 1992Park, 1998). Such projects require input from geology, hydrology, and geophysics in order to map the hydrostratigraphy, collect and interpret time-lapse geophysical measurements, and integrate all results to make predictions and decisions about fluid movement at the site. The quantitative integration of the geosciences is far from trivial as each discipline has differing descriptive terminology, as well as software tools that are domain specific with limited interoperability.

In the following two subsections, I will independently introduce two disciplines within applied geoscience: (a) geophysical inversions, and (b) hydrogeophysics in the vadose zone. The current state of these disciplines gives context to the work that follows and motivates research into a computational framework that improves the quantitative communication between methodologies and researchers. The subsequent two sections expand on these ideas and identify a significant computational challenge of hydrogeophysics in the vadose zone.

## #1.1.1Geophysical inverse problems

Geophysical methods involve making measurements at or above the earth's surface, or in boreholes. The data acquired with these methods are then used to create models of the subsurface; this is similar to non-invasive medical imaging, but the spatial and temporal scales are typically much larger. The models, which can be 1D, 2D, or 3D distributions of various physical properties, are used for monitoring and extracting information about fluid flow and subsurface structures. The physical properties are linked to the data through various partial differential equations. The task of generating a quantitative understanding of the data requires the ability to carry out forward simulations of these equations and, in many situations, inverting the data to estimate a static or time-lapse model of the subsurface. Forward simulations use the physics of the underlying measurement approach to simulate the response of a given distribution of physical properties. Inversion is a mathematical, algorithmic, and occasionally heuristic process that constructs a model consistent with the field measurements and *a priori* geologic, geophysical, and hydrologic information.

Many of these geophysical methods (e.g. electromagnetics, magnetotellurics, gravity, direct current resistivity) have mature solutions for both simulation and inversion in three dimensions and through time Oldenburg, 2016. There is, of course, continued research into improving computational efficiency for large-scale geophysical surveys (cf. Haber & Schwarzbach, 2014Yang *et al.*, 2014Haber & Heldmann, 2007). In parallel to this effort, there is ongoing work to *integrate* these geophysical methodologies to create more informed interpretations of the subsurface from multiple data types and surveys (e.g. Devriese *et al.* (2017)Kang *et al.* (2017)Fournier *et al.* (2017)). This research trend is true in exploration geophysics as well as in cross-disciplinary fields such as hydrogeophysics where hydrologic simulations and geophysical simulations can be combined to better inform predictions about groundwater flow.

The development of new methodologies to address the evolving challenges in quantitative geoscience integration will build upon and extend standard practices. These extensions and integrations will require experimentation with, and recombination of, existing techniques. This presupposes that researchers have access to consistent, well-tested tools that can be extended, adapted, and combined. One of the main goals of my thesis is to organize the components of geophysical simulations and inverse problems into a comprehensive, modular framework in order to support this *combinatorial* experimentation and exploration.

## #1.1.2Hydrogeophysics in the vadose zone

The majority of groundwater recharge is derived through water that percolates through the vadose zone, the region between the earth's surface and the fully saturated zone. As such, studying the processes that occur in the vadose zone is of critical importance for understanding our groundwater resources. Much attention has been given to monitoring, describing and predicting processes that occur in this region of the earth. Traditionally, monitoring has been conducted by taking point-measurements of saturation or pressure, or laboratory measurements of soil hydraulic properties. More recently, geophysical methods are being used in conjunction with hydrologic data to create more informed models of and predictions about the subsurface Linde & Doetsch, 2016. The advantages of employing geophysics to hydrogeology problems are numerous; geophysical methods allow data to be gathered remotely, and the data can then be used to create an image of a distributed physical property of interest (e.g. electrical conductivity) in the subsurface. However, the geophysical problem is inherently non-unique and when viewed independently, images produced by a geophysical inversion often lack the detail necessary to make informed hydrogeologic predictions. Some level of prediction can be offered by hydrogeologic simulations within the structural geologic context; however, these simulations are difficult to verify due to lack of constraining hydrologic data. Taken separately, each methodology involved in this monitoring challenge yields distinct interpretations and predictions that are often dissonant or actually conflicting.

Fluid flow in the vadose zone is described by the Richards equation and is parameterized by hydraulic conductivity, which is a nonlinear function of pressure head. Hydraulic conductivity defines how fluids move in the subsurface, and is an important physical property to estimate for accurate predictions Pollock & Cirpka, 2012Šimunek & Senja, 2012. It is not possible to directly image hydraulic conductivity with geophysical data, however, geophysical electromagnetic methods are sensitive to bulk electrical conductivity, which changes significantly depending on the saturating fluid (e.g. gas or water) Archie (1942)Liang *et al.* (2012)Mendelson & Cohen (1982). Changes in saturation over time, as fluids move, can be related to changes in electrical properties, and can be observed by electromagnetic geophysical methods. Knowing where and how the fluids move can subsequently be related to hydraulic conductivity (or other hydraulic properties). This technique has been used to estimate hydraulic properties directly from geophysical data. For example, in Binley *et al.* (2002), a cross-well tomography experiment was conducted using radar and direct current resistivity methods. The movement of a vadose zone tracer was tracked and a single parameter was estimated using the Richards equation, through trial and error, for homogeneous hydraulic conductivity. Both, the quality, and the spatial and temporal density of geophysical data available for monitoring vadose zone processes will continue to proliferate (e.g. Pidlisecky *et al.* (2013)). The increased data density and quality opens up the possibility to estimate many more distributed hydraulic parameters.

Time-lapse estimation problem presents a significant conceptual and computational challenge Pollock & Cirpka, 2012Haber & Gazit, 2013Towara *et al.*, 2015Linde & Doetsch, 2016. It requires large-scale, time-lapse hydrogeologic simulations that must be efficiently solved and then integrated with geophysical methods. This multiphysics integration of geophysical and hydrologic simulation can be completed in a variety of ways. For example, this integration can be through direct coupling of the simulations or through qualitative observations and uncoupled workflows. Hinnell *et al.* (2010) presents uncoupled integrations as: (a) using the geophysical data to estimate a physical property, such as electrical conductivity; (b) using an empirical relation, such as Archie's equation (eq. eq:archies-law), to transform the geophysical estimate into a hydrological parameter, such as water content; and (c) using hydrological estimates to help inform or test a hydrogeologic simulation. A coupled inversion formulates the entire process as a single forward model and uses stochastic or deterministic parameter estimation to directly update the hydrogeologic and empirical parameters Finsterle & Kowalsky, 2008Ferré *et al.*, 2009. Increasingly, there are instances of these sorts of collaborations and studies in near surface hydrogeophysics (cf. Linde & Doetsch (2016) and references within). The integration of geophysical and hydrologic data increases the scale of simulations and inversions that must be considered -- hundreds of thousands to millions of hydrologic parameters must be estimated. Currently, this is not computationally feasible for large 3D inversions of vadose zone parameters using the Richards equation. For example, the relatively few parameters that can be estimated by stochastic inversions may not be sufficient for 3D inversions Linde & Doetsch (2016). Alternatively, deterministic inversions can be used, but will need to draw on improvements across the field of geophysical inverse problems. For example, regularization techniques developed in other areas of exploration geophysics (e.g. Paasche & Tronicke (2007)Sun & Li (2012)), have potential to be helpful in introducing known parameter distributions into a vadose zone inversion. In Hinnell *et al.* (2010), the authors conclude that, "the coupled approach [for hydrogeophysics] requires that the hydrologic and geophysical models be merged, [which] forces the hydrologist and the geophysicist to formulate a consistent framework." This consistent framework was identified as "*the primary limit* to the routine implementation of coupled inversion[s]. The formulation of common solution grids, time steps, and simulation accuracies requires an uncommon level of collaboration during scientific analysis."

## #1.1.3Research motivation

The quantitative integration of hydrology, geophysics, and geology remains an open problem Liu & Gupta, 2007Ferré *et al.*, 2009Pollock & Cirpka, 2012Knight *et al.*, 2013Linde & Doetsch, 2016. This task is being worked on from many different perspectives in various research communities, and much progress has been made in case studies, new algorithms, and novel integrations. The complexity of this integration "intertwines various disciplines/subjects including geophysics, hydrology, petrophysics, geostatistics, [and] inverse theory" Knight *et al.*, 2013. Although each subdiscipline (e.g. flow modelling, electromagnetic simulation) invokes many of the same concepts and numerical pieces for solving simulation and inversion problems, the approaches developed and applied are not easily shared between subdisciplines. This is due to differing terminology, organization of methodologies, differing data densities and sensitivities, model conceptualizations, as well as software implementations. For example, in geophysics a *model* is often taken to be a volumetric distribution of physical properties (e.g. Oldenburg & Li (2005)); in geology a *model* is often more qualitative, represented by a sketch, description, 3D surfaces, or a cross section that opaquely embeds knowledge about geologic processes (e.g. Harder *et al.* (2009)Porwal & Kreuzer (2010)); in hydrology a *model* often refers to the representation of a physical simulation or empirical equation, frequently containing simplifying assumptions of homogeneity or dimensionality (cf. Devi *et al.* (2015)). As another example, in hydrogeology data is often collected as high precision point measurements with low spatial and/or temporal coverage; in geophysics, however, physical property recoveries are often less precise and are averaged over a larger spatial scale.

The inclusion of relevant information from one sub-discipline into another is difficult due to these differences in terminology, knowledge representation (e.g. quantitative or qualitative), knowledge mapping (e.g. through empirical or structural relations), model conceptualization (e.g. volumetric or surfaces or parametric), data sensitivities (e.g. point or bulk measurements), and simplifying or implicit assumptions (e.g. one dimensional or homogeneous). These disconnects are exceptionally apparent in software implementations, even though software is precisely where quantitative integration must occur! Software is often developed *ad hoc* for specific outcomes, and the algorithmic components, which are conceptually generic and could be shared with others, are deeply embedded and not easily transferred to other applications. Within a given subdiscipline this can create challenges, as the system under consideration can potentially embed hard-coded, tacit assumptions. Furthermore, this lack of transportability and interoperability severely hinders the advancement of novel geophysical *applications* since geoscientists in different subgroups often find themselves having to develop a complete software solution from scratch prior to investigating their scientific questions of interest. Overcoming these bottlenecks, and establishing a simulation and inversion framework that works across many subdisciplines of (hydro)geophysics, is the overarching goal of this thesis.

Based on the current state of the geoscience inversion and hydrogeophysics community and the observations outlined above, I have arranged this thesis to address two research topics:

- the development of an extensible framework for geophysical inversions, and
- formulation of the Richards equation inversion for computational scalability.

The overarching goal is to promote both quantitative integration and collaboration between geoscience disciplines and communities. Interdisciplinary integration requires dissemination, reproducibility, accessibility, and collaboration; as such these are crucial to my work and demonstrated throughout the following thesis.

# #1.2A framework for geophysical inversions

Geophysical inversions are the mathematical process of creating subsurface models to fit measured data and geophysical simulations. The language, workflows, and resulting software implementations of geoscience inversions vary across disciplines. These inconsistencies are among the large barriers to sustained cross-disciplinary integrations. One research approach to addressing these interdisciplinary barriers is the development of a framework that organizes, synthesizes and abstracts diverse methodologies. A framework should (a) serve as a means of organizing an approach to simulation and inverse problems, (b) facilitate quantitative communication between researchers and geophysics methodologies, and (c) act as a blueprint for both ideation and software implementations. The disciplines and methodologies that have been used to inform the research of this framework include: vadose zone flow using the Richards equation, direct current resistivity, time and frequency domain electromagnetics, magnetotellurics, potential fields including gravity and magnetics, and using geologic parameterizations to inform model conceptualization. Figure 1a shows a conceptual layout of several geoscience methodologies as largely 'black box' inversions that take data and produce models. For example, in vadose zone flow, pressure or saturation data can be captured and many forward simulations completed to estimate a simplified model that reproduces the observed data. As noted in Oldenburg (2016), many of the geophysical inversion techniques can now be completed in three dimensions using computationally efficient inversion algorithms. This is significant as geological structures and processes such as electromagnetics and fluid flow often require treatment in three dimensions. In both geophysics and hydrogeology the data is a field or a flux, sampled at various locations, times, or frequencies. Additionally, point samples of physical properties or (hydro)stratigraphy can be inferred or tested from borehole cores and geologically interpolated between wells and surface observations. These can be included into the inverse problem formulation either implicitly through weightings and reference models (cf. Williams (2008)) or more explicitly by forcings of geologic priors (cf. Linde *et al.* (2015)). The geologic observations can also be modelled, for example, using radial basis functions (RBFs), to implicitly reproduce the geologic contacts and drillhole data; this results in geologically interpreted interpolations dividing the subsurface into lithological units Hillier *et al.*, 2014.

The initial integration of all of these methodologies is conceptually shown in Figure 1b. Each geophysical technique is sensitive to different physical properties and/or different spatial scales. The differing sensitivities of these techniques motivates combination of methodologies to better understand and image the subsurface and time-lapse processes. This is an active field of study, for example, (a) investigating cooperative electromagnetics inversions in realistic settings by externally combining existing tools through custom workflows McMillan & Oldenburg (2014), (b) joint inversion and model fusion algorithms for direct current resistivity and borehole tomography Haber & Gazit (2013), (c) integrating multiple types of airborne geophysical data into a consistent geologic model for mineral exploration Kang *et al.*, 2017Fournier *et al.*, 2017Devriese *et al.*, 2017, and (d) combining one dimensional vadose zone flow and direct current resistivity to invert for hydrological parameters Hinnell *et al.* (2010). Many of these studies rely on externally integrating existing software tools through purpose-built scripts and workflows; limiting the transferability to other disciplines. However, recent work has seen an increased focus by the geophysical community on a framework approach that targets multiple geophysical and hydrogeologic methodologies (e.g. JInv Ruthotto *et al.*, 2016 and PyGIMLi Rücker *et al.*, 2017). In many electromagnetic geophysical applications, for example, a common model for electrical conductivity can be produced through cycling a common model through the relevant problems until a sufficient misfit is achieved. In hydrogeophysics, however, the model from a geophysical simulation is the data for a hydrogeologic simulation. As such, for a deterministic large-scale inversion the sensitivity from one problem is empirically coupled to another problem and must be efficiently calculated. In hydrogeophysics, coupled hydrologic and geophysical interpretations are moving into three dimensions, and standard probabilistic and finite difference techniques are becoming "computationally infeasible" Linde & Doetsch, 2016. In order to support the custom parameterizations, couplings, and integrations that are necessary for a new application, a general framework must provide combinatorial building blocks that are independently accessible and extensible, while maintaining computational efficiencies. The PEST framework for model independent parameter estimation and uncertainty analysis is a concrete example of where parts of this have been done with success Doherty, 2004. The software is widely cited in academia ($>2$K citations) especially in hydrology and hydrogeophysics, and is heavily used in industry. The advantage of being model independent has given this technique wide application due to the flexibility to adapt to *new scientific questions*. However, this also comes at quite a cost because the structure of the simulation and modelling cannot be used to the advantage of the algorithm. As with vadose zone flow or electromagnetics, when moving to three dimensions there may be hundreds of thousands to millions of parameters to estimate. Not taking the structure of the problem into account severely limits types and size of problems that can be considered. In the context of geophysical simulations and inversions there are two significant challenges/opportunities for such a framework:

- to explicitly expose the components of geophysical simulations and inversions to interdisciplinary manipulation in a
*combinatorial*manner; and - maintaining computational scalability, especially with respect to efficient calculation of sensitivities.

Adapting interdisciplinary methodologies to formalize geophysical simulations and inversions inherently requires that a diverse suite of methods and applications be considered across geophysics, hydrogeology, and geology. This process will take the form of deriving, from the existing body of literature, a consistent conceptual and computational framework, which supports reproducible inversion workflows. By formalize, I do not mean mathematically, rather taking practices of ontology and framework development in biology and other more mature interdisciplinary fields and applying them to geophysical inversions. A Nature Review paper in genetics by Bard & Rhee (2004) concludes: *"An ontology makes explicit knowledge that is usually diffusely embedded in notebooks, textbooks and journals or just held in academic memories, and therefore represents a formalization of the current state of a field."*

## #1.2.1Approach

Throughout my thesis, I have looked to more mature, interdisciplinary fields to guide and motivate my approach to this interdisciplinary research. For example, the use of ontologies in the sciences to formally describe domain knowledge has exploded in recent decades, especially in domains of artificial intelligence, chemistry, and biology, but also more recently in the geosciences Sharman *et al.*, 2004Ma, 2011. A computational ontology (rather than the underlying discipline of philosophical ontology) is a "formal explicit specification of a shared conceptualization" Gruber, 1993. Noy & McGuinness (2002) summarizes the purpose of computational ontologies as enabling a shared understanding of the structure of information and systematically enabling knowledge and information reuse. Practically ontologies can (a) provide access and discoverability to heterogeneous information; (b) act as a common language to lower the barrier to transfer of ideas; and (c) act as a specification for interoperability, for example, as a communication protocol or application programming interface Sharman *et al.*, 2004. The techniques for building ontologies amount to capturing, synthesizing, organizing, and digitizing the relationships between concepts, conceptual inheritance patterns, and behaviour. Ontologies are most commonly used for storing and organizing data, for example, connecting genetic data with phenotypic data in bioinformatics. However, ontologies are also used in defining tasks, workflows, and problem solving methods Fensel *et al.*, 1997Bard & Rhee, 2004. In more mature interdisciplinary fields, this research is becoming core to scientists' day to day research; for example, Stein (2008) notes that "all current biomedical cyberinfrastructure efforts use ontologies." As a result of successes in other fields, geoscience integration is currently the target of major funding initiatives across the world (e.g. EarthCube - 11 year NSF project, $35M in 2015; CIMIC Footprints Project - NSERC Project, 24 Universities, 30 Industry, $13M). Many of the current efforts are focused on software infrastructure, formally describing geoscientific data (using ontologies), and formally describing methods of integrating disciplines.

The growth in complexity of geophysical data and analysis and the necessity for cross-disciplinary integrations is coincident with the revolution of open source software communities, largely enabled through web-based interactions. Other research communities, for example `Astropy`

in astronomy and `SciPy`

in numerical computing, have embraced the open source approach for collaboration and research Astropy Collaboration *et al.*, 2013Jones *et al.*, 2001. These pioneering efforts are now complemented by easy-to-use, ubiquitous web-based repositories and version-control systems (e.g. GitHub), that have removed many of the barriers associated with management and collaboration. The growth of such systems, coupled with the maturity of individual geophysical subdisciplines (e.g. potential fields, electromagnetics), presents an opportunity to develop a conceptual framework and computational ontology for geophysical simulation and inversion. An ontology is an embodiment of concepts, relationships, and behaviours in a discipline and can be (a) captured in special purpose languages (e.g. Web Ontology Language, Resource Description Framework), or (b) captured in general purpose computer programming languages (e.g. Python, Java, C++) Sharman *et al.*, 2004. To research a geophysical simulation and inversion framework, I have chosen the latter approach for the purposes of utility, testing, and creating a dynamic ontology that can be openly used and evolved by the geoscience community.

## #1.2.2Take home points

Sustained, reproducible integration of geophysical simulations and inversions requires that methodologies be accessible, consistent, numerically documented, interoperable, and extensible. This can be enabled by a comprehensive framework that is validated and rigorously tested against reality and leading edge research. To do this, research is required to:

- identify the composable components of geophysical inversions and simulations, as well as the interfaces between the components;
- abstract commonalities between methodologies to a consistent, supporting subset; and
- capture geoscience inversion heuristics in a reproducible manner.

The output of this will be a conceptual framework and computational ontology that is numerically tested and demonstrates the capability to support existing and future research directions. Ideally this framework will catalyze and accelerate interdisciplinary collaborations.

# #1.3Application to vadose zone parameter estimation

The development of a geophysical framework requires considering a number of disciplines and geophysical problems to ensure generality as well as extensibility. I am working with collaborators in many of these geophysical methods (electromagnetics, direct current resistivity, magnetotellurics, magnetics, gravity) and am ensuring that the framework that I am researching supports these applications. However, the goal is also to have the framework work outside of geophysics and most notably in hydrogeology, as such, I have chosen vadose zone fluid flow as a model problem.

Fluid flow in the vadose zone is described by the Richards equation (eq eq:richards-mixed) and parameterized by hydraulic conductivity, which is a nonlinear function of pressure head Richards, 1931Celia *et al.*, 1990. Investigations in the vadose zone typically require identification of distributed hydraulic properties. This is increasingly being done through an inversion approach, which is also known as data assimilation, model calibration, or history matching Liu & Gupta, 2007. These methods use changes in water content or pressure head to infer hydraulic parameters Binley *et al.*, 2002Deiana *et al.*, 2007Hinnell *et al.*, 2010. Hydrogeophysics allows many more proxy measurements, such as direct current resistivity data, to be taken to help characterize these sites spatially, as well as through time. As such, the number of distributed hydraulic parameters to be estimated in a Richards equation inversion will continue to grow. Figure 2a shows this conceptually as taking the output of a (time-lapse) direct current resistivity inversion, and using this estimate of electrical conductivity as a proxy for water content data in hydraulic parameter estimation. The proxy data can be directly incorporated through an empirical relation (e.g. Archie (1942)) or time-lapse estimations can be structurally incorporated through some sort of regularization technique Haber & Gazit, 2013Haber & Oldenburg, 1997Hinnell *et al.*, 2010. Previous studies have either estimated homogeneous soil profiles estimating less than five parameters (e.g. Binley *et al.* (2002)Deiana *et al.* (2007)) or heterogeneous soil profiles, estimating less than thousands of parameters (c.f. Irving & Singha (2010)Jardani *et al.* (2013)Orgogozo *et al.* (2014)). Parameter estimation is currently completed by trial and error or using stochastic techniques, neither of which scale to the scenario that requires estimation of hundreds of thousands to millions of parameters Linde & Doetsch (2016). This scalability, especially in the context of hydrogeophysics has been explicitly noted in the literature (cf. Binley *et al.* (2002)Deiana *et al.* (2007)Towara *et al.* (2015)Linde & Doetsch (2016)).

There has been much research into the scalability of the inverse problem in geophysical applications (e.g. electromagnetics) that allow the calculation of an optimization step in the inversion *without* explicitly calculating or storing the sensitivity matrix (cf. Haber *et al.* (2000)). This is extremely important in large problems as the computer memory available to store this large, dense matrix can often be a limitation. For example, although there have been significant advances for massively parallel forward simulations of the Richards equation (cf. Orgogozo *et al.* (2014)), the computational "memory may simply not be large enough" to run the inverse problem using automatic differentiation Towara *et al.*, 2015. Previous work uses either automatic differentiation or finite difference in order to explicitly compute the sensitivity matrix (e.g. PEST) Finsterle & Zhang, 2011Bitterlich & Knabner, 2002Towara *et al.*, 2015. Finite difference is computationally slower and can generate inaccuracies in the sensitivity computation and tarry convergence of the optimization algorithm. With regard to implicit sensitivity calculations, there is an opportunity to apply some of the learnings from the geophysical inversion literature to this hydrogeologic problem. Note that the implicit sensitivity calculation is necessary in *any* gradient based technique as well as modern stochastic methods Bui-Thanh & Ghattas, 2015.

The application of the implicit sensitivity calculation to the Richards equation, however, is not straightforward. Hydraulic conductivity is the *function* we are inverting for - it is empirically determined and depends on pressure head; pressure head is the field that is calculated using the Richards equation. This nonlinear coupling requires iterative optimization methods in the forward simulation between each time step (e.g. Picard or Newton). This makes the implicit calculation of the effect of the sensitivity on a vector rather involved and intricate. Furthermore, the nonlinear relationship of hydraulic conductivity is empirically determined and depending on the relation used could involve the estimation of up to ten spatially-distributed parameters from a finite dataset. The implicit use of the sensitivity matrix should have the ability to calculate the sensitivity to any of these model parameters; however, any inversion algorithm must be tested as to the limits of estimating all distributed parameters at once. One goal of the work in this thesis is to tackle the sensitivity calculation *implicitly*. This would further allow for exploration into different inversion methodologies and parameterizations of the empirical relationships. By not storing the sensitivity, and instead computing its effect on a vector, the size of the problem that we can invert will become much larger. This will allow large 3D hydraulic parameter inversions using the Richards equation to be run on modest computational resources. However, directly jumping into a 3D inversion for heterogeneous hydraulic properties, with many parameters per cell in the isotropic case, is highly non-unique. As conceptually indicated in Figure 2b, I will explore some inversion schemes, model conceptualizations, and ways to explore interfacing to *a priori* information. Unsaturated hydraulic conductivity as well as the water retention curve are both empirically described functions. The parameterization and estimation of these functions in the context of collecting proxy saturation data will be explored in a number of numerical experiments.

## #1.3.1Take home points

With advances in geophysical data collection as well as spatial and temporal density, time-lapse water content estimates can be made with increasing accuracy. These proxy time-lapse measurements can be used to directly estimate hydraulic properties from non-invasive geophysical methods. This is increasingly being completed in field studies, however, conceptual and computational simplifications are consistently made. Part of the bottleneck is the scalability of current inverse methods applied to the Richards equation. Research is required to:

- reframe the Richards equation inversion for computational scalability when moving to large 3D distributed parameter estimation,
- ensure that any parameter in the empirically determined hydraulic conductivity function and water retention curve can be estimated, and
- investigate and explore the effectiveness of distributed joint inversions for hydraulic parameters from a water content dataset.

This research will inform the conceptual framework and contribute an implicit sensitivity calculation for the Richards equation inverse problem that can be coupled to other geophysical methods.

# #1.4Thesis outline

The content of this thesis is divided into three chapters and two appendices; these are shown visually in Figure 3. Each chapter and appendix provides a stand alone introduction and conclusion to the specific topic under consideration. Chapter \ref{ch:framework} presents a comprehensive framework for geophysical simulations and inversions. This includes an overview of current research and outlines a modular, object-oriented approach for structuring deterministic, large-scale inversion methodology in geophysics that has application to hydrogeology and can incorporate and interface to geologic information. A direct current resistivity forward simulation and inversion are used throughout this chapter as an example. An overview of finite volume discretization techniques, which are heavily used throughout this thesis, has been included as Appendix \ref{ch:discretize}. In this appendix, I examine and comment on the organization, structure, and formulation of three different classes of mesh: (a) tensor product orthogonal mesh, including formulation in cylindrical coordinates; (b) quadtree and octree meshes; and (c) logically rectangular, non-orthogonal meshes. These meshes in 1D, 2D, 3D, and 4D are used throughout the thesis as well as extensions to my work. The software used to inform my work and refine my interdisciplinary approach to simulation and parameter estimation in geophysics is open source and available under the permissive MIT license (https://github.com/simpeg/simpeg). This repository includes forward and inverse software by me and my colleagues of the framework for vadose zone flow, time and frequency domain electromagnetics, direct current resistivity, induced polarization, magnetotellurics, magnetics, gravity, and a number of example linear inverse problems; these are described in the online documentation (https://docs.simpeg.xyz).

Chapter \ref{ch:richards} focuses on the Richards equation, which is the partial differential equation that describes vadose zone flow. Using the previously developed framework and finite volume tools tailored specifically for inverse problems (Appendix \ref{ch:discretize}), I have reframed the Richards equation to be scalable with respect to large-scale inverse problems. The majority of this work is focused on enabling access to the sensitivity implicitly, through multiplication with a vector. The validity of this technique as well as comments on numerical performance are provided. The scalability of the algorithm developed is shown with comparison to other techniques. This work has built upon as well as informed the research into the organization of the framework. The Richards equation is more complex than many other geophysical methods analyzed because of the nonlinear, time dependent forward problem and multiple empirical relationships that may or may not require estimation.

Chapter \ref{ch:applications} explores the estimation of hydraulic parameters from water content data. This is motivated by the hydrogeophysical problem where there is an availability of proxy saturation data. This chapter explores a set of empirical relations that inform both the hydraulic conductivity function and the water retention curve. A joint inversion is formulated to estimate for five spatially distributed hydraulic parameters. The number of spatially distributed unknowns are experimented with, and the response of the inversion algorithm is tested under various levels of *a priori* knowledge. The efficacy of these approaches is commented upon, which may help inform laboratory or field based experiments of this kind. Finally, a three dimensional inversion is completed using the Richards equation. Due to the scalability issues that were addressed in Chapter \ref{ch:richards}, an inversion for distributed hydraulic parameters at this scale is not possible with standard techniques. This work is enabled both by the framework introduced in Chapter \ref{ch:framework} and the implicit sensitivity calculation that allows large-scale parameter estimation problems to be tackled (Chapter \ref{ch:richards}). Other examples, extensions, and applications of the framework including other geophysical methodologies, case studies, and geoscience integrations are included in Appendix \ref{ch:extensions}; much of this work is collaborative in nature and focuses on the parameterization of the forward problem. As the focus of this thesis is on the geophysical inversion framework, I have distilled my observations from these case studies and provided these learnings in a general form.

Finally, Chapter \ref{ch:conclusions} provides a brief discussion on the thesis contributions and summarizes some opportunities for future research and collaborations. These research areas may seem disparate, but collectively they are united by a common theme of addressing the complexities of bringing together the disciplines of geophysics, hydrology, geology, and inverse theory in a computationally tractable manner that is accessible and extensible by other researchers.

- Racz, A. J., Fisher, A. T., Schmidt, C. M., Lockwood, B. S., & Huertos, M. L. (2011). Spatial and Temporal Infiltration Dynamics During Managed Aquifer Recharge.
*Ground Water*, 1–9. 10.1111/j.1745-6584.2011.00875.x - Daily, W., Ramirez, A., LaBrecque, D., & Nitao, J. (1992). Electrical resistivity tomography of vadose water movement.
*Water Resources Research*,*28*(5), 1429–1442. 10.1029/91WR03087 - Park, S. (1998). Fluid migration in the vadose zone from 3-D inversion of resistivity monitoring data.
*Geophysics*,*63*(1), 41–51. 10.1190/1.1444326 - Oldenburg, D. W. (2016). Looking back and moving forward.
*Banff International Research Station*,*16*(2695). - Haber, E., & Schwarzbach, C. (2014). Parallel inversion of large-scale airborne time-domain electromagnetic data with multiple OcTree meshes.
*Inverse Problems*,*30*(5), 55011.