Comparative Study of Geospatial Techniques for Interpolating Groundwater Quality Data in Agricultural Areas of Punjab, Pakistan

Tayyab, Muhammad; Aslam, Rana Ammar; Farooq, Umar; Ali, Sikandar; Khan, Shahbaz Nasir; Iqbal, Mazhar; Khan, Muhammad Imran; Saddique, Naeem

doi:10.3390/w16010139

Open AccessArticle

Comparative Study of Geospatial Techniques for Interpolating Groundwater Quality Data in Agricultural Areas of Punjab, Pakistan

¹

Department of Structures and Environmental Engineering, University of Agriculture Faisalabad, Faisalabad 38040, Pakistan

²

Department of Irrigation and Drainage, University of Agriculture Faisalabad, Faisalabad 38040, Pakistan

³

University of Agriculture Faisalabad, Sub-Campus Burewala, Verhari 61010, Pakistan

^*

Authors to whom correspondence should be addressed.

Water 2024, 16(1), 139; https://doi.org/10.3390/w16010139

Submission received: 9 August 2023 / Revised: 31 August 2023 / Accepted: 5 September 2023 / Published: 29 December 2023

(This article belongs to the Special Issue Effects of Drought on Agriculture Water Resources and Crop Productivity)

Download

Browse Figures

Versions Notes

Abstract

:

Groundwater Arsenic (As) data are often sparse and location-specific, making them insufficient to represent the heterogeneity in groundwater quality status at unsampled locations. Interpolation techniques have been used to map groundwater As data at unsampled locations. However, the results obtained from these techniques are affected by various inherent and external factors, which lead to uncertainties in the interpolated data. This study was designed to determine the best technique to interpolate groundwater As data. We selected ten interpolation techniques to predict the As concentration in the groundwater resources of Punjab, Pakistan. Two external factors, the spatial extent of the study area and data density, were considered to assess their impact on the performance of interpolation techniques. Our results show that the Inverse Distance Weighting (IDW) and Spline interpolation techniques demonstrate the highest accuracy with the lowest RMSE (13.5 ppb and 16.7 ppb) and MAE (87.8 ppb and 89.5 ppb), respectively, while the Natural Neighbor technique shows the lowest accuracy with the highest RMSE (2508.7 ppb) and MAE (712.1 ppb) to interpolate groundwater As data. When the study area’s extent was modified, IDW showed the best performance, with errors within ±1.5 ppb for 95% of the wells across the study area. While data density has a positive correlation with interpolation accuracy among all techniques, the IDW remained the best method for interpolation. It is therefore concluded that IDW should be used to interpolate groundwater quality data when observed data are sparse and randomly distributed. The utilization of IDW can be useful for As monitoring and management in groundwater resources.

Keywords:

groundwater quality; geostatistics; interpolation techniques; cross validation; As in groundwater

1. Introduction

Groundwater is widely recognized as a vital and dependable source of freshwater worldwide [1,2,3]. It plays a crucial role in providing drinking water to both rural and urban populations, especially in areas where surface water availability is limited [2,4,5,6]. Groundwater is also essential for sustainable agriculture to meet crop water requirements, ensuring food security [3,4]. Moreover, groundwater serves as a lifeline for terrestrial ecosystems, supporting the health and biodiversity of rivers, wetlands, and lakes [7,8,9]. Therefore, protecting groundwater resources from contamination by adopting effective water management strategies directly affects food security, livelihoods, and overall socio-economic development [3,10].

Pakistan ranks 14 out of the 17 extremely high water-stressed countries in the world [11]. The mean annual per capita water availability in the country has plummeted from 5229 m³ in 1962 to 930 m³ in 2023 [12], and thereby, approximately 80 percent of the country’s population is facing severe water scarcity [11]. Consequently, about 90 percent of Pakistan’s population relies on groundwater for drinking purposes [13,14,15,16]. However, it is estimated that 70 percent of the surface and groundwater sources in Pakistan are contaminated with organic, inorganic, and biological pollutants, particularly As [17,18,19]. Waterborne diseases due to the use of contaminated water account for about 30 percent of all diseases and 40 percent of deaths [13,14,15,16]. Additionally, waterborne diseases occupy 20 to 40 percent of the available hospital beds in the country. The situation is further aggravated by the rapid population growth, which increases the extraction of groundwater [20]. Hence, determining the spatial distribution of water quality is critical to supplying good quality water to domestic users.

As is a naturally occurring element in the earth’s crust and is widely found in the environment (i.e., air, water, and soil) in the form of various minerals and ores [21,22]. As is considered highly toxic in water due to its ability to disrupt cellular functions and interfere with essential biogeochemical processes in the human body [21]. When ingested, As can cause serious health consequences such as cardiovascular damage, neurological effects, oxidative stress, enzyme inhibition, and development abnormalities [21]. Several studies have reported elevated levels of As in the country’s groundwater resources, ranging from 10 ppb to 600 ppb, which significantly exceed the standards set by the World Health Organization [23,24,25,26,27]. For instance, ref. [25] discovered that 95 percent of the groundwater wells in the Vehari district of Punjab are unsuitable for drinking purposes due to their higher levels of As. Since the As concentration in drinking water is widely used to estimate water toxicity indices, such as Average Daily Dose, Hazard Quotient, and Carcinogenic Risk [25,28,29], it is critically important to map the spatial distribution of groundwater As levels that will serve as an important factor in controlling waterborne diseases.

The sparce and location-specific nature of groundwater quality data make them insufficient to represent the spatial heterogeneity in water quality at unsampled points. While the Geographic Information System (GIS) has been widely used for mapping, monitoring, and modeling groundwater quality across large areas to overcome this challenge, the choice of interpolation methods within the GIS can lead to significant discrepancies in the results [25,30,31,32,33]. Comparing the IDW, kriging, and Cokriging interpolation techniques, ref. [30] determined that Cokriging outperformed the other methods to determine water quality. Ref. [31] assessed the IDW, Ordinary Kriging, Universal Kriging, and Cokriging methods for rainfall spatial analysis and found that Ordinary Kriging yielded the best results. In a study by [32], the kriging method demonstrated higher accuracy in predicting groundwater levels, while Ordinary Kriging was considered the most suitable technique for the spatial analysis of As concentration. While these studies focused on a limited range of interpolation methods available in the GIS, comprehensive comparative analyses of all interpolation methods of the GIS to interpolate the As concentration in groundwater remains less studied.

The focus of this study is to quantify the uncertainties in mapping the As concentration in the groundwater of the province of Punjab using different interpolation techniques. The province of Punjab was chosen due to the following factors: it is the most populus province of Pakistan, with an average population density of approximately 536 persons per square kilometers; its surface freshwater resources are very limited, and therefore a majority of the population mainly relies on groundwater resources; and it is most susceptible to higher As levels due to anthropogenic activities [33,34]. Specifically, we aim to answer the following questions: (a) what is the magnitude of uncertainties in groundwater As concentration when it is interpolated using deterministic and stochastic interpolation techniques?; (b) which interpolation technique yields the best performance for interpolating the As concentration in groundwater?; and (c) which factors contribute to uncertainties in mapping As concentrations using different interpolation techniques?

2. Materials and Methods

2.1. Study Area

This study was conducted in the province of Punjab, which is the northeast region of Pakistan (Figure 1a,b). The province shares a border with the province of Sindh in the south, the province of Khyber Pakhtunkhwa in the northwest, Azad Jammu Kashmir and Gilgit-Baltistan in the north, and India in the east. The province covers about 205,344 km² of land area, making it the country’s second largest region by land. Punjab has a diverse topography, with a flat and fertile land in the eastern part and mountains in the western part (Figure 1c). In the western part of this province, the elevation ranges from 300 to 1500 m above sea level, whereas in the central and eastern regions, the elevation is generally low, ranging from 150 to 300 m above sea level (Figure 1c).

2.2. Data Collection

The As concentration data for the province of Punjab were obtained from [23]. In their study, Podgorski et al. (2017) measured the As concentration of 84 observation wells of Punjab between 2013–2015 (Figure 1b). These observation wells mainly included hand and motor pumps, while data from municipal and agricultural tube wells were also considered. For the collection of representative groundwater samples, hand pumps were purged with one stork per 30 cm of depth, and electric pumps were run for 10 min. Precleaned polyethylene bottles of a one-liter capacity were used to collect water samples. Before sampling, the water bottles were rinsed with deionized water. Water samples were filtered on-site using 0.45 mm cellulose acetate filters. To analyze trace metals (e.g., As), a few drops of nitric acid were added to the samples to reduce the pH of the water samples to less than two. The analysis of As in the acidified samples was performed using an Agilent 7500cx inductively coupled plasma mass spectrometer. A detailed description of the groundwater data collection, analysis, and quality control standards is given in [23].

2.3. Interpolation Techniques

We evaluated the performance of ten interpolation techniques to attribute an interpolation bias to their driving parameters. Based on these results, we determined the optimum technique for the interpolation of groundwater quality data. The interpolation techniques considered in this study are categorized into two types: deterministic methods and stochastic methods. Deterministic methods use simple statistical models to calculate unknown data points using the known surrounding points. However, it is not possible to determine errors in the forecasted values using these techniques. The deterministic methods evaluated in this study are Inverse Distance Weighting (IDW), Spline interpolation, Radial Basis Function, the Trend Surface Analysis, Natural Neighbor interpolation, Diffuse with Barrier, global polynomial, and Local polynomial. Stochastic techniques use complex models to forecast data points and associated biases based on known data points. The two stochastic techniques evaluated in this study are Empirical Bayesian Kriging and Ordinary Kriging (Figure 2). A brief description of all interpolation techniques is provided in the following section.

2.3.1. Inverse Distance Weighting (IDW)

IDW uses a linear combination of known data points, weighted by an inverse of the distance between the known and unknown data points, to estimate the unknown data points. Points closer to the unknown data points are considered more like the unknowns compared to known points located farther away. The weight is expressed by the following equation [35]:

λ_{i} = \frac{\frac{1}{d_{i}^{p}}}{\sum_{i - 1}^{n} \frac{1}{d_{i}^{p}}}

(1)

where λ is the unknown data point, d is the distance between the known and unknown points, p represents the power parameter, and n is the number of the known data points used for interpolation. Besides distance, p also affects the accuracy of the IDW technique. An increase in distance (i.e., p) represents a decrease in the weights of the known points and thus their influence [36]. The size of p and neighborhood (i.e., number of known data points considered for interpolation) is arbitrary. The value of p (e.g., a positive number) is selected based on the minimum mean absolute error; however, most often, the default value of p (i.e., two) is assumed [37].

2.3.2. Spline Interpolation

The Spline interpolation technique estimates unknown data points by calculating a smooth surface that passes through the known data points while minimizing the curvature of the surface. This technique divides the known data points into subgroups to interpolate a smooth surface or a polynomial function. The interpolated polynomials from the subgroups are then fitted together, forming one smooth polynomial. The Spline interpolation technique is categorized into three categories depending on the degree (p) of the polynomial. If the value of p is one, it represents a Linear Spline; if it is two, it represents a Quadratic Spline; and if it is three, represents a Cubic Spline [38]. In this study, Cubic Spline interpolation was used to interpolate groundwater quality data, and its mathematical form is as follows [39]:

s (x_{i}) = f (x_{i}) for i = 1, 2, 3, \dots \dots, n

(2)

where s(x) is a smooth surface of degree three, represented by s_i on each subinterval x_i, x_i₊₁, …, x_i_+n.

2.3.3. Radial Basis Function

Radial Basis Function approximates or smooths a function from a set of known and scattered data points. The approximation depends on the distance from a known center point or origin. The Radial Basis Function interpolation technique is suitable for a small number of known data points distributed non uniformly. It is also useful when confining boundaries are absent. However, it does not work well when there are too many known data points, for example, several thousand. A function is said to be a radial function if it satisfies the following condition [40]:

Φ (X) = Φ ‖X‖

(3)

where ‖X‖ is the Euclidean norm of input vector X.

2.3.4. Trend Interpolation

The Trend interpolation technique applies a global polynomial function to the known data points to determine the unknown data points. The values of the interpolated points, when combined, form a smooth surface with a coarse-scale pattern. This technique allows for up to ten polynomials based on bends in the interpolated surface. An interpolated flat surface is a first-order polynomial, one bend (valley) in the surface is a second-order polynomial, two bends result in a third-order polynomial, and so forth. This technique is useful when the known data points are uniformly scattered and equally spaced. However, it does not work well when the known points represent an undulating surface, for example, a land containing valleys and slopes, etc. The following equation represents the Trend interpolation polynomial in its general form [41]:

P (x) = a_{0} + a_{1 x} + a_{{2 x}^{2}} + \dots + a_{{n x}^{n}}

(4)

where a₀, a₁, …, a_n are the coefficients of the polynomial to be determined.

2.3.5. Natural Neighbor Technique

The Natural Neighbor interpolation technique constructs Theisen polygons of the known data points. These polygons are unique except when the known data points are distributed on a regular rectangular grid. This technique uses the highest corners of polygons as sample points to interpolate the unknown points. It is achieved by inserting the unknown points into the existing polygons and then calculating the areas of neighboring known points overlapped by the polygon of the unknown points. The calculated areas are then scaled to sum to one, and these values are used as weights for the known data points. This technique is considered useful when the unknown data points are scattered irregularly.

2.3.6. Diffusion with Barrier

The Diffusion with barrier interpolation technique uses the heat transfer equation to represent how particles diffuse through a barrier and redefines the distance between the known data points by using raster and elemental barriers. When barriers are absent, its interpolated results become identical to the results of kernel interpolation, which are obtained by using Gaussian kernels. The unique feature of Diffusion with barrier interpolation is that, unlike other models, it interpolates the unknown data points by using automatically selected grids. The contours of the kernel, especially near barriers, vary according to the diffusion equation.

2.3.7. Global Polynomial Technique

The Global polynomial interpolation technique applies a mathematical function to the known data points to determine the unknown data points. The values of the interpolated points, when combined, form a smooth surface with a coarse-scale pattern. This technique allows for up to ten polynomials based on bends in the interpolated surface. An interpolated flat surface is a first-order polynomial, one bend (valley) in the surface is a second-order polynomial, two bends result in a third-order polynomial, and so forth. This technique is useful when the known data points are uniformly scattered and equally spaced. However, it does not work well when the known points represent an undulating surface, for example, a land containing valleys and slopes, etc. The following equation represents the Trend interpolation polynomial in its general form [41]:

P (x) = a_{0} + a_{1 x} + a_{{2 x}^{2}} + \dots + a_{{n x}^{n}}

(5)

where a₀, a₁, …, a_n are the coefficients of the polynomial to be determined.

2.3.8. Local Polynomial Technique

The Local polynomial technique chooses a set of neighboring known data points around an unknown point in a local region and then constructs a best-fit polynomial (i.e., one having a polynomial degree equal to or less than the number of contributing known neighboring data points) [42]. The following equation represents a local polynomial in its general form:

f (x) = c_{0} + c_{1} (x - x_{0}) + c_{2} {(x - x_{0})}^{2} + c_{n} {(x - x_{0})}^{n}

(6)

where f(x) is the interpolated function value at the unknown data point; x₀ is the known center point around which a polynomial is constructed; c₀, c₁, c₂, …, c_n are the coefficients of the fitted polynomial; and (

x - x_{0}

),

{(x - x_{0})}^{2}

, and

{(x - x_{0})}^{n}

are the power of the difference between the input unknown points and known center point. This technique is useful when the known data points within the searching neighborhood are uniformly scattered, equally spaced, and normally distributed.

2.3.9. Empirical Bayesian Kriging

Empirical Bayesian Kriging is a stochastic interpolation technique that automatically optimizes the parameters required to build a valid kriging model. It allows for the accurate predictions of moderately nonstationary data and works well for small datasets. For large datasets, it first divides known data points into subsets and fits the first semivariogram. Then, it uses this semivariogram as the prediction model and calculates a new dataset at the unknown points in each subset. Subsequently, it simulates a second semivariogram for each subset from the newly calculated datasets, followed by the unknown data point calculations by merging semivariograms from the neighboring subsets. It assigns weights to each semivariogram depending on the number of known data points in each subset. Subsets having more neighboring known data point carry more weight (influence) on the predicted value.

2.3.10. Ordinary Kriging

Ordinary Kriging is another stochastic technique that estimates unknown data points by considering the average of the subsets of neighboring data points. The interpolation involves four steps: (a) calculate the spatial autocorrelations to determine if the nearby known data points are similar; (b) determine the semivariogram (variance) to understand the dependance of the known data points as a function of the distance between points; (c) fit a semivariogram model that describes the spatial correlation structure of the known data points; and (d) use the semivariogram model and the weight of the known data points to predict the unknown values. During the prediction process, Ordinary Kriging takes care of the weights by assigning more weight to the known data points that are closer to the unknown data points. Higher weights indicate a greater influence of the known data points.

All of the interpolation techniques considered in this study are embedded in the GIS environment. Therefore, ArcGIS version 10.5 was used to prepare the input files, to forecast the missing water quality data, and to evaluate the performance of all interpolation techniques.

2.4. Performance Evaluation of Interpolation Techniques

The performance of the selected interpolation techniques was assessed using the cross-validation method (Figure 2). In this approach, the interpolation techniques were evaluated at 15 existing observation wells. The observed As data of 15 wells were temporarily removed, and new data at those wells were estimated using the remaining 70 wells’ data. Subsequently, the new As data was compared with the observed data. This procedure was repeated with all ten interpolation techniques. This study also investigated the effects of various factors, such as the working principle of the interpolation techniques, data density, and the spatial extent of the study area, on the interpolation results. Such effects were quantified using the following scenarios:

The interpolation techniques were replaced, and the data density and spatial extent of the study area were kept constant.

The data density was varied, and the spatial area extent was kept constant. The effect of the data density on the performance of the interpolation techniques was quantified under five scenarios. In the first scenario, As data from 90% of the wells (i.e., 63 wells) were used to calculate the data of the missing wells. Likewise, in the other scenarios, 80%, 70%, 60%, and 50% of the total wells were used, respectively.

The spatial extent was changed, and the data density was kept constant. The effect of the spatial extent on the performance of the interpolation techniques was quantified by using the processing extent function in ArcGIS, considering two scenarios. In the first scenario, the interpolation was performed without imposing boundary conditions, while, for the second scenario, the boundary conditions were imposed using the shapefile of the province of Punjab.

The performance of the interpolated As data against observations is compared using descriptive statistics. The Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) were used to estimate the bias in the interpolation results. The RMSE and MAE were calculated using the following equations (Chai & Draxler, 2014) [43]:

M A E = \frac{1}{n} \sum_{i = 1}^{n} |e_{i}|

(7)

R M S E = \frac{1}{n} \sum_{i = 1}^{n} |e_{i}^{2}|

(8)

where n is the number of data points and e is the model error.

3. Results and Discussion

3.1. As Concentration in the Groundwater of Punjab

The results of the descriptive statistical parameters, which were selected to explore the spatial characteristics of the As concentration in Punjab, are presented in Table 1. The mean As concentration in the groundwater is 86.2 ppb, ranging from 0.1 to 530.0 ppb. Notably, only 29% of the wells have As concentration levels below the permissible threshold value set by the World Health Organization (i.e., 10 ppb), while the concentration in the remaining wells is significantly higher (Figure 3). The spatial distribution of the mean As concentration shows higher levels around rivers (Figure 3). The presence of As in the groundwater of Punjab primarily roots from high organic materials in surface waters, which infiltrate with water and depletes the oxygen concentration in soils. This process leads to the release of As from oxy(hydr)oxides in groundwater. In the eastern and central areas of Punjab, the elevated As concentration is attributed to higher soil pH levels (8.0–8.5), which are prevalent throughout the Indus Plane [44]. Due to the soil’s higher pH, As desorption is triggered in the upper soil layers, and subsequently, leaching occurs down to the groundwater [23,27]. In Southern Punjab, however, aridity is the dominating cause of the elevated As levels, resulting from higher evaporation rates [45]. Irrigation also has a strong influence on As concentration, and this strong influence can be related to the role of irrigation in evaporative concentrations and the associated As desorption [46]. Organic waste from domestic and industrial sources as well as intensive agricultural practices also causes As enrichment by reductive dissolution. As is among the most common and naturally existing pollutants in the world. When As-contaminated groundwater is used, it can adversely affect public health. Long-term exposure to As can cause cancer, cardiovascular problems, diabetes, and skin lesions [21,47]. The groundwater in Punjab is predominantly used for drinking, posing a severe threat to 20% of the public’s health at 10 ppb and 3% at 50 ppb [21,48]. The use of As-contaminated groundwater for irrigation is also prevalent in Punjab [34], exposing plants to elevated concentrations of As, causing morphological, physiological, and biochemical damages to plants [49,50]. Small quantities of As accumulate in plant tissues and reach humans through the food chain [49,51].

3.2. Prediction Accuracy of Interpolation Techniques under Default Spatial Extent

The cross-validation results of the interpolation techniques based on descriptive statistics are shown in Table 2. Among the selected interpolation techniques, IDW and Spline interpolation demonstrate the highest accuracy, with RMSE (MAE) values of 13.5 ppb (87.8 ppb) and 16.7 ppb (89.5 ppb), respectively, whereas the Natural Neighbor interpolation technique produced the least accurate results, with RMSE = 2508.7 ppb and MAE = 712.1 ppb. The remaining techniques show moderate accuracies, with RMSE (MAE) values ranging from 78.9 ppb (129.7 ppb) to 89.7 ppb (140.8 ppb). Radial Basis Function, however, stands out, with its RMSE and MAE at 30.6 ppb and 96.7 ppb, respectively. IDW has shown superior results over RBF, OK, and kriging in predicting As concentrations in some studies (e.g., [52,53]).

The spatial distribution of the difference between the observed and interpolated As concentration data is shown in Figure 4. Consistent with the cross-validation results (Table 3), IDW shows the lowest difference and thereby has the highest accuracy (Figure 4d). The superior performance of IDW in interpolating nitrate concentrations has also been reported by [54]. Natural Neighbor and Spline interpolation show high accuracies for 43% and 38% of the wells, respectively, in the central and southern areas of Punjab where the observation wells were closely spaced (Figure 4a,c). While Natural Neighbor has shown satisfactory results in predicting nitrate concentrations in some studies (e.g., [55]), others (e.g., [54]) have reported its poor performance in predicting groundwater quality.

However, Spline interpolation underestimates, and the Natural Neighbor technique completely fails to accurately estimate As concentrations of wells that are sparsely located in the northern and southern areas of Punjab (Figure 4a,c). This failure in predictions could be attributed to the fact that both techniques perform well only for high-density datasets [56]. Moreover, the data sampling design (spatial distribution) also affects the performance of interpolation techniques [37,56]. Since the As data used in this study was irregularly spaced, this might have decreased the prediction accuracy of the Spline interpolation technique. Ref. [57] also reported low performances of the Spline and Natural Neighbor interpolation techniques for a highly dense and irregularly spaced dataset. Furthermore, all the remaining techniques tend to either overestimate or underestimate As concentrations for 43 to 56% of the wells across Punjab (Figure 4). The interpolation techniques of Diffusion with barrier, Global polynomial, the Trend Surface Analysis, Local polynomial, Ordinary Kriging, and Empirical Bayesian Kriging overestimate (underestimate), where the As concentration is low (high). Overestimated (underestimated) predicted As values vary from 102.2 ppb (410.7 ppb) to 127.0 ppb (453.6 ppb). The bias in the prediction values could be the result of the failure of the interpolation techniques to account for the static fluctuations in the dense As data. The observed As data has high variance which has a strong effect on the performance of the interpolations. Previous studies, e.g., ([37,58,59]), have reported a decrease in their performance with an increase in covariance.

3.3. Prediction Accuracy of Interpolation Techniques under Varying Boundary Conditions

The impact of the spatial extent or imposing boundary conditions on the prediction accuracy of the interpolation techniques was quantified using the processing extent tool in ArcGIS. Two scenarios were considered: the first scenario used the default setting “Without Boundary Conditions”, while in the second scenario, the shapefile of Punjab was used to impose boundary conditions. The results, presented in Table 3, clearly demonstrate that imposing boundary conditions significantly affects the prediction accuracy of the interpolation techniques. In the first scenario, where no boundary conditions were imposed, the RMSE (MAE) varies from 13.5 ppb (183.2 ppb) to 90.3 ppb (8153.9 ppb), except for Natural Neighbor for which the RMSE (MAE) is 2524.3 ppb (6.4 × 106 ppb).

When boundary conditions were imposed, the RMSE (and MAE) increases and varies from 55.7 ppb (3112.2 ppb) to 4484.2 ppb (2.1 × 107 ppb) (Table 3). Among the ten interpolation techniques, four show an increase in error when boundary conditions were imposed, while six techniques show a decrease in RMSE and MAE. The increase in RMSE (MAE) for IDW, Spline interpolation, Natural Neighbor, and Radial Basis Function is 71.0 ppb (6965.8 ppb), 1612.2 (2.7 × 106 ppb), 1959.8 ppb (1.4 × 106 ppb), and 37.8 ppb (3771.3 ppb), respectively (Table 3), whereas the Trend Surface Analysis, Diffusion with barrier, Global polynomial, Local polynomial, Empirical Bayesian Kriging, and Ordinary Kriging show a decrease in RMSE and MAE. The decrease in RMSE (MAE) for the Trend Surface Analysis and Global polynomial is 34.3 ppb (5008.1 ppb), for Diffusion with barrier, it is 29.8 ppb (4493.1 ppb), for Local polynomial, it is 14.4 ppb (2083.0 ppb), for Empirical Bayesian Kriging, it is 22.7 ppb (3498.9 ppb), and for Ordinary Kriging, the error change is from 22.6 ppb (3477.9 ppb). The impact of the spatial extent or boundary conditions is significant at 95% confidence interval except for the Trend Surface Analysis and Local polynomial (Table 3). Interpolation techniques are widely being used to create spatial maps from data points, and their use is always restricted to some limited area, ranging from a hundred to a few thousand square kilometers. ArcGIS requires the boundary shapefile of the study area to recognize the spatial extent by using the processing extent tool. Astonishingly, none of the earlier studies evaluate its effect. The current study explores its effect on the prediction accuracy of the interpolation techniques, and it shows mixed effects on their accuracy, i.e., some techniques show an increase while others show a decrease in error with imposing boundary conditions. The possible causes of this effect are yet unknown.

Table 3. Effects of the spatial extent or imposing boundary condition on the prediction of As concentration in groundwaters of Punjab. Statistical parameters significant at 95% CI and 99% CI are highlighted in bold and italic-bold, respectively. The statistical parameters are reported in units of ppb.

Type	Sr. #	Interpolation Technique	Without Boundary		With Boundary
Type	Sr. #	Interpolation Technique	RMSE	MSE	RMSE	MSE
Deterministic Techniques	1	Inverse Distance Weighting (IDW)	13.5	1.8 × 10²	84.6	7.1 × 10³
	2	Spline interpolation	16.8	2.8 × 10²	1629.0	2.7 × 10⁶
	3	Radial Basis Function	90.1	8.1 × 10³	55.8	3.1 × 10³
	4	Trend Surface Analysis	2524.4	6.4 × 10⁶	4484.2	2.0 × 10⁷
	5	Natural Neighbor Interpolation	90.3	8.2 × 10³	60.5	3.7 × 10³
	6	Diffusion with barrier	90.1	8.1 × 10³	55.8	3.1 × 10³
	7	Global polynomial	79.4	6.3 × 10³	64.9	4.2 × 10³
	8	Local polynomial	30.8	9.5 × 10²	68.7	4.7 × 10³
Stochastic Techniques	9	Empirical Bayesian Kriging	88.3	7.9 × 10³	65.6	4.3 × 10³
Stochastic Techniques	10	Ordinary Kriging	88.2	7.8 × 10³	65.6	4.3 × 10³

3.4. Prediction Accuracy of Interpolation Techniques under Data Density Scenarios

The effect of the data density on errors in the prediction accuracy of the selected interpolation techniques is shown in Figure 5 and Table 4. Five scenarios were developed by considering 90%, 80%, 70%, 60%, and 50% of the total data points. Except for Natural Neighbor, an increase in error is observed in the predicted As concentration with a decrease in the data density. Previous studies have reported a negative correlation between the data density and accuracy of interpolated results (e.g., [57,59,60]). Ref. [57] reports an increase in the RMSE for IDW, Natural Neighbor, Spline with Barrier, and some other techniques with a decrease in the data density. Contradicting this study, these authors also show a decrease in RMSE with a decrease in the data density. To further understand the relationship between data density and bias, a multiple linear regression model or logarithmic regression model based on data distribution was fitted in all the interpolation techniques (Figure 6). The coefficient of determination for the regression model varies from 0.39 to 0.80, except for Natural Neighbor, which is 0.06. Among the interpolation techniques fitted by linear models, spline shows the highest correlation (0.71), while Radial Basis Function shows the lowest correlation (0.46). For the techniques fitted by logarithmic models, Local polynomial shows the highest coefficient of determination (0.81), while Kriging exhibits the least (0.39).

For IDW, Spline interpolation, and Radial Basis Function, changing the data density by up to 60% has little effect on errors; however, it increases rapidly by a further decrease in data density. For IDW, the error from 90% to 60% of data density varies from 0.0 to 0.03 ppb, for Spline interpolation, it varies from 0.05 to 2.0 ppb, and it varies from 0.5 to 1.5 ppb for Radial Basis Function. It is argued that when data density is high, its effect on the performance of the interpolation techniques decreases [61]. This argument is not supported by the results of this study which suggest that besides data density, the sampling design or spatial distribution of data points also affect the magnitude of error in the interpolated results [56]. In addition, the data show a random distribution for all the data density scenarios except at 50% data density, where it shows a dispersed distribution. Other techniques, however, show a gradual increase in bias with increasing data density (Figure 5).

4. Limitations and Directions for Future Research

This study covers only the province of Punjab which has a flat topography. Future studies should include the entirety of the country so that the effects of all types of topography (i.e., flat, rugged, and hilly) can be evaluated. Urban areas, due to their smaller spatial extents and high As concentrations, are hotspots, while rural areas have larger spatial extents and low As concentrations, as is the case of this study. A combined consideration of both types of areas may affect the prediction accuracy of the interpolation techniques. Future studies should consider urban and rural areas separately and then evaluate the performance of the interpolation techniques.

5. Conclusions

This study conducted a comparative analysis of ten well-known interpolation techniques to quantify the errors in predicting the As concentration in the groundwater of Punjab, Pakistan. This study considered various factors such as the type of interpolation technique, spatial extent, and data density to draw the following conclusions:

The As concentration in a majority of the wells is higher than the threshold limit set by the World Health Organization. Among both deterministic and stochastic interpolation techniques, the best performing technique is IDW, while the Natural Neighbor technique has the lowest performance. At the spatial scale, IDW demonstrates the highest accuracy, whereas Spline interpolation and Natural Neighbor fail to predict As concentrations in areas where observation wells are sparsely located.
The change in spatial extent shows a significant impact on the prediction accuracy of the interpolation techniques. The IDW, Spline interpolation, Natural Neighbor, and Radial Basis Function techniques show an increase in the error magnitude. Meanwhile, the Trend Surface Analysis, Diffusion with barrier, Global polynomial, Local polynomial, Empirical Bayesian Kriging, and Ordinary Kriging show a decrease in error. The effect of the spatial extent or boundary conditions is significant for all techniques except for the Trend Surface Analysis and Local polynomial at 95% confidence interval.
The data density, except for Natural Neighbor, exhibits a negative correlation with the prediction error (i.e., the error increases with decreasing data density). All the interpolation techniques, except for Natural Neighbor, show an increase in error in predicted As concentrations as the data density decreases. For the IDW, Spline interpolation, and Radial Basis Function interpolation techniques, the data distribution patterns also influence accuracy.

Author Contributions

Conceptualization, R.A.A.; methodology, R.A.A. and M.T.; software, M.T.; validation, R.A.A. and M.T.; formal analysis, M.T.; resources, R.A.A. and S.N.K.; data curation, M.T.; writing—original draft preparation, R.A.A., U.F. and M.T.; writing—review and editing, R.A.A., U.F., S.A., S.N.K., M.I.K., M.I. and N.S.; visualization, R.A.A.; supervision, R.A.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data can be made available on a reasonable request to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Megdal, S.B. Invisible Water: The Importance of Good Groundwater Governance and Management. NPJ Clean Water 2018, 1, 15. [Google Scholar] [CrossRef]
Foster, S.; Chilton, J.; Nijsten, G.-J.; Richts, A. Groundwater—A Global Focus on the ‘Local Resource’. Curr. Opin. Environ. Sustain. 2013, 5, 685–695. [Google Scholar] [CrossRef]
United Nations. The United Nations World Water Development Report 2022: Groundwater: Making the Invisible Visible; UNESCO: Paris, France, 2022; p. 246. [Google Scholar]
Amanambu, A.C.; Obarein, O.A.; Mossa, J.; Li, L.; Ayeni, S.S.; Balogun, O.; Oyebamiji, A.; Ochege, F.U. Groundwater System and Climate Change: Present Status and Future Considerations. J. Hydrol. 2020, 589, 125163. [Google Scholar] [CrossRef]
Döll, P.; Hoffmann-Dobrev, H.; Portmann, F.T.; Siebert, S.; Eicker, A.; Rodell, M.; Strassberg, G.; Scanlon, B.R. Impact of Water Withdrawals from Groundwater and Surface Water on Continental Water Storage Variations. J. Geodyn. 2012, 59, 143–156. [Google Scholar] [CrossRef]
Nguyen, T.G.; Phan, K.A.; Huynh, T.H.N. Application of Integrated-Weight Water Quality Index in Groundwater Quality Evaluation. Civ. Eng. J. 2022, 8, 2661–2674. [Google Scholar] [CrossRef]
Jin, K.; Rao, W.; Tan, H.; Song, Y.; Yong, B.; Zheng, F.; Chen, T.; Han, L. H-O Isotopic and Chemical Characteristics of a Precipitation-Lake Water-Groundwater System in a Desert Area. J. Hydrol. 2018, 559, 848–860. [Google Scholar] [CrossRef]
Karan, S.; Sebok, E.; Engesgaard, P. Air/Water/Sediment Temperature Contrasts in Small Streams to Identify Groundwater Seepage Locations. Hydrol. Process. 2017, 31, 1258–1270. [Google Scholar] [CrossRef]
Winter, T.C. Relation of Streams, Lakes, and Wetlands to Groundwater Flow Systems. Hydrogeol. J. 1999, 7, 28–45. [Google Scholar] [CrossRef]
Alley, W.; Alley, R. High and Dry: Meeting the Challenges of the World’s Growing Dependence on Groundwater; Yale University Press: New Haven, CT, USA, 2017; ISBN 978-0-300-22038-4. [Google Scholar]
Rizvi, O. Pakistan’s Water Crisis. Available online: https://thediplomat.com/2022/06/pakistans-water-crisis/ (accessed on 27 August 2023).
WWF Freshwater|Initiatives|WWF. Available online: https://www.worldwildlife.org/initiatives/freshwater (accessed on 27 August 2023).
Akbar, A.; Sitara, U.; Khan, S.; Muhammad, N.; Khan, M.; Khan, Y.; Kakar, S. Drinking Water Quality and Risk of Waterborne Diseases in the Rural Mountainous Area of Azad Kashmir Pakistan. Int. J. Biosci. 2013, 3, 245–251. [Google Scholar] [CrossRef]
Daud, M.K.; Nafees, M.; Ali, S.; Rizwan, M.; Bajwa, R.A.; Shakoor, M.B.; Arshad, M.U.; Chatha, S.A.S.; Deeba, F.; Murad, W.; et al. Drinking Water Quality Status and Contamination in Pakistan. BioMed Res. Int. 2017, 2017, e7908183. [Google Scholar] [CrossRef]
Qureshi, R.H.; Ashraf, M. Water Security Issues of Agriculture in Pakistan. PAS Islamabad Pak 2019, 1, 41. [Google Scholar]
WWF-Pakistan-2007-Pakistans. Pakistan’s Water at Risk: Water and Health-Related Issues in Pakistan and Key Recommendations: A Special Report. Available online: https://www.ircwash.org/resources/pakistans-water-risk-water-and-health-related-issues-pakistan-and-key-recommendations (accessed on 8 August 2023).
Azizullah, A.; Khattak, M.N.K.; Richter, P.; Häder, D.-P. Water Pollution in Pakistan and Its Impact on Public Health—A Review. Env. Int. 2011, 37, 479–497. [Google Scholar] [CrossRef] [PubMed]
Khalid, S.; Shahid, M.; Dumat, C.; Niazi, N.K.; Bibi, I.; Gul Bakhat, H.F.S.; Abbas, G.; Murtaza, B.; Javeed, H.M.R. Influence of Groundwater and Wastewater Irrigation on Lead Accumulation in Soil and Vegetables: Implications for Health Risk Assessment and Phytoremediation. Int. J. Phytoremediation 2017, 19, 1037–1046. [Google Scholar] [CrossRef] [PubMed]
Khalid, S.; Shahid, M.; Niazi, N.; Rafiq, M.; Bakhat, H.; Imran, M.; Abbas, T.; Bibi, I.; Dumat, C. Arsenic Behaviour in Soil-Plant System: Biogeochemical Reactions and Chemical Speciation Influences. In Enhancing Cleanup of Environmental Pollutants; Springer International Publishing: Cham, Switzerland, 2017; ISBN 978-3-319-55422-8. [Google Scholar]
Dars, G.; Lashari, B.; Soomro, M.; Strong, C.; Ansari, K. Pakistan’s Water Resources in the Era of Climate Change. In Water Resources of Pakistan: Issues and Impacts; Springer International Publishing: Cham, Switzerland, 2021; pp. 95–108. ISBN 978-3-030-65678-2. [Google Scholar]
WHO Arsenic. Available online: https://www.who.int/news-room/fact-sheets/detail/arsenic (accessed on 21 August 2023).
Arsenic|Definition, Symbol, Uses, & Facts|Britannica. Available online: https://www.britannica.com/science/arsenic (accessed on 27 August 2023).
Podgorski, J.E.; Eqani, S.A.M.A.S.; Khanam, T.; Ullah, R.; Shen, H.; Berg, M. Extensive Arsenic Contamination in High-PH Unconfined Aquifers in the Indus Valley. Sci. Adv. 2017, 3, e1700935. [Google Scholar] [CrossRef]
Qurat-ul-Ain; Farooqi, A.; Sultana, J.; Masood, N. Arsenic and Fluoride Co-Contamination in Shallow Aquifers from Agricultural Suburbs and an Industrial Area of Punjab, Pakistan: Spatial Trends, Sources and Human Health Implications. Toxicol. Ind. Health 2017, 33, 655–672. [Google Scholar] [CrossRef]
Shahid, M.; Khalid, M.; Dumat, C.; Khalid, S.; Niazi, N.K.; Imran, M.; Bibi, I.; Ahmad, I.; Hammad, H.M.; Tabassum, R.A. Arsenic Level and Risk Assessment of Groundwater in Vehari, Punjab Province, Pakistan. Expo. Health 2018, 10, 229–239. [Google Scholar] [CrossRef]
Shakoor, A.; Khan, Z.M.; Farid, H.U.; Sultan, M.; Ahmad, I.; Ahmad, N.; Mahmood, M.H.; Ali, M.U. Delineation of Regional Groundwater Vulnerability Using DRASTIC Model for Agricultural Application in Pakistan. Arab. J. Geosci. 2020, 13, 195. [Google Scholar] [CrossRef]
Ullah, Z.; Rashid, A.; Ghani, J.; Nawab, J.; Zeng, X.-C.; Shah, M.; Alrefaei, A.F.; Kamel, M.; Aleya, L.; Abdel-Daim, M.M.; et al. Groundwater Contamination through Potentially Harmful Metals and Its Implications in Groundwater Management. Front. Environ. Sci. 2022, 10, 1–12. [Google Scholar] [CrossRef]
Wu, J.; Sun, Z. Evaluation of Shallow Groundwater Contamination and Associated Human Health Risk in an Alluvial Plain Impacted by Agricultural and Industrial Activities, Mid-West China. Expo. Health 2016, 8, 311–329. [Google Scholar] [CrossRef]
Zeng, Y.; Zhou, Y.; Zhou, J.; Jia, R.; Wu, J. Distribution and Enrichment Factors of High-Arsenic Groundwater in Inland Arid Area of P. R. China: A Case Study of the Shihezi Area, Xinjiang. Expo. Health 2018, 10, 1–13. [Google Scholar] [CrossRef]
Babiker, I.S.; Mohamed, M.A.A.; Hiyama, T. Assessing Groundwater Quality Using GIS. Water Resour. Manag. 2007, 21, 699–715. [Google Scholar] [CrossRef]
Balakrishnan, P.; Saleem, A.; Mallikarjun, N.D. Groundwater Quality Mapping Using Geographic Information System (GIS): A Case Study of Gulbarga City, Karnataka, India. Afr. J. Environ. Sci. Technol. 2011, 5, 1069–1084. [Google Scholar] [CrossRef]
Nas, B.; Berktay, A. Groundwater Quality Mapping in Urban Groundwater Using GIS. Env. Monit. Assess 2010, 160, 215–227. [Google Scholar] [CrossRef]
Rabah, F.; Ghabayen, S.; Salha, A. Effect of GIS Interpolation Techniques on the Accuracy of the Spatial Representation of Groundwater Monitoring Data in Gaza Strip. J. Environ. Sci. Technol. 2011, 4, 579–589. [Google Scholar] [CrossRef]
Population Census|Pakistan Bureau of Statistics. Available online: https://www.pbs.gov.pk/content/population-census (accessed on 27 August 2023).
Qureshi, A.S. Groundwater Governance in Pakistan: From Colossal Development to Neglected Management. Water 2020, 12, 3017. [Google Scholar] [CrossRef]
Chen, F.-W.; Liu, C.-W. Estimation of the Spatial Rainfall Distribution Using Inverse Distance Weighting (IDW) in the Middle of Taiwan. Paddy Water Env. 2012, 10, 209–222. [Google Scholar] [CrossRef]
Hohn, M.E. An Introduction to Applied Geostatistics. Comput. Geosci. 1991, 17, 471–473. [Google Scholar] [CrossRef]
Collins, F.C. A Comparison of Spatial Interpolation Techniques in Temperature Estimation. 1995. Available online: http://hdl.handle.net/10919/38139 (accessed on 8 August 2023).
Burrough, P.A.; McDonnell, R.A.; Lloyd, C.D. Principles of Geographical Information Systems, 3rd ed.; Oxford University Press: New York, NY, USA, 2015; ISBN 978-0-19-874284-5. [Google Scholar]
McKinley, S.; Levine, M. Cubic Spline Interpolation. Coll. Redw. 1998, 45, 1049–1060. [Google Scholar]
Adhikary, P.P.; Dash, C.H.J. Comparison of Deterministic and Stochastic Methods to Predict Spatial Variation of Groundwater Depth. Appl. Water Sci. 2017, 7, 339–348. [Google Scholar] [CrossRef]
Chapter_3.Pdf. Available online: www1.maths.leeds.ac.uk/~kersale/2600/Notes/chapter_3.pdf (accessed on 17 August 2023).
Schaum, A. Principles of Local Polynomial Interpolation. In Proceedings of the 2008 37th IEEE Applied Imagery Pattern Recognition Workshop, Washington, DC, USA, 15–17 October 2008; IEEE: Washington, DC, USA, 2008; pp. 1–6. [Google Scholar]
Farooqi, A.; Masuda, H.; Siddiqui, R.; Naseem, M. Sources of Arsenic and Fluoride in Highly Contaminated Soils Causing Groundwater Contamination in Punjab, Pakistan. Arch. Env. Contam. Toxicol. 2009, 56, 693–706. [Google Scholar] [CrossRef] [PubMed]
Rasool, A.; Farooqi, A.; Masood, S.; Hussain, K. Arsenic in Groundwater and Its Health Risk Assessment in Drinking Water of Mailsi, Punjab, Pakistan. Hum. Ecol. Risk Assess. Int. J. 2016, 22, 187–202. [Google Scholar] [CrossRef]
Nicolli, H.B.; Suriano, J.M.; Gomez Peral, M.A.; Ferpozzi, L.H.; Baleani, O.A. Groundwater Contamination with Arsenic and Other Trace Elements in an Area of the Pampa, Province of Córdoba, Argentina. Environ. Geol. Water Sci. 1989, 14, 3–16. [Google Scholar] [CrossRef]
Nickson, R.T.; McArthur, J.M.; Shrestha, B.; Kyaw-Myint, T.O.; Lowry, D. Arsenic and Other Drinking Water Quality Issues, Muzaffargarh District, Pakistan. Appl. Geochem. 2005, 20, 55–68. [Google Scholar] [CrossRef]
Ahmad, T.; Kahlown, M.A.; Tahir, A.; Rashid, H. Arsenic an Emerging Issue: Experiences from Pakistan; Loughborough University: Loughborough, UK, 2004. [Google Scholar]
Abbas, G.; Murtaza, B.; Bibi, I.; Shahid, M.; Niazi, N.K.; Khan, M.I.; Amjad, M.; Hussain, M. Natasha Arsenic Uptake, Toxicity, Detoxification, and Speciation in Plants: Physiological, Biochemical, and Molecular Aspects. Int. J. Env. Res. Public Health 2018, 15, 59. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.; Hamza, A.; Xie, Z.; Hussain, S.; Brestic, M.; Tahir, M.A.; Ulhassan, Z.; Yu, M.; Allakhverdiev, S.I.; Shabala, S. Arsenic Transport and Interaction with Plant Metabolism: Clues for Improving Agricultural Productivity and Food Safety. Environ. Pollut. 2021, 290, 117987. [Google Scholar] [CrossRef] [PubMed]
Fu, Z.; Li, W.; Xing, X.; Xu, M.; Liu, X.; Li, H.; Xue, Y.; Liu, Z.; Tang, J. Genetic Analysis of Arsenic Accumulation in Maize Using QTL Mapping. Sci. Rep. 2016, 6, 21292. [Google Scholar] [CrossRef]
Gong, G.; Mattevada, S.; O’Bryant, S.E. Comparison of the Accuracy of Kriging and IDW Interpolations in Estimating Groundwater Arsenic Concentrations in Texas. Environ. Res. 2014, 130, 59–69. [Google Scholar] [CrossRef]
Manjarrez-Domínguez, C.B.; Prieto-Amparán, J.A.; Valles-Aragón, M.C.; Delgado-Caballero, M.D.R.; Alarcón-Herrera, M.T.; Nevarez-Rodríguez, M.C.; Vázquez-Quintero, G.; Berzoza-Gaytan, C.A. Arsenic Distribution Assessment in a Residential Area Polluted with Mining Residues. Int. J. Env. Res. Public Health 2019, 16, 375. [Google Scholar] [CrossRef] [PubMed]
Kazemi, E.; Karyab, H.; Emamjome, M.-M. Optimization of Interpolation Method for Nitrate Pollution in Groundwater and Assessing Vulnerability with IPNOA and IPNOC Method in Qazvin Plain. J. Env. Health Sci. Eng. 2017, 15, 23. [Google Scholar] [CrossRef] [PubMed]
Merwade, V.M.; Maidment, D.R.; Goff, J.A. Anisotropic Considerations While Interpolating River Channel Bathymetry. J. Hydrol. 2006, 331, 731–741. [Google Scholar] [CrossRef]
Li, J.; Heap, A.D. A Review of Comparative Studies of Spatial Interpolation Methods in Environmental Sciences: Performance and Impact Factors. Ecol. Inform. 2011, 6, 228–241. [Google Scholar] [CrossRef]
Garnero, G.; Godone, D. Comparisons between Different Interpolation Techniques. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. ISPRS Arch. 2013, 40, 139–144. [Google Scholar] [CrossRef]
Schläpfer, F.; Schmid, B. Ecosystem Effects of Biodiversity: A Classification of Hypotheses and Exploration of Empirical Results. Ecol. Appl. 1999, 9, 893–912. [Google Scholar] [CrossRef]
Martínez-Cob, A. Multivariate Geostatistical Analysis of Evapotranspiration and Precipitation in Mountainous Terrain. J. Hydrol. 1996, 174, 19–35. [Google Scholar] [CrossRef]
Long, J.; Liu, Y.; Xing, S.; Qiu, L.; Huang, Q.; Zhou, B.; Shen, J.; Zhang, L. Effects of Sampling Density on Interpolation Accuracy for Farmland Soil Organic Matter Concentration in a Large Region of Complex Topography. Ecol. Indic. 2018, 93, 562–571. [Google Scholar] [CrossRef]
Li, J.; Heap, A.D. A Review of Spatial Interpolation Methods for Environmental Scientists; Geoscience Australia: Canberra, Australia, 2008. [Google Scholar]

Figure 1. Map of study area. (a) Geographic location of Punjab. (b) Location of observation wells in Punjab. (c) Elevation map of Punjab.

Figure 2. Methodological flowchart.

Figure 3. Spatial distribution of As concentration in the groundwater of Punjab. Each data point represents the concentration of As at an observation well.

Figure 4. Spatial distribution of the difference between observed and interpolated As concentration in the groundwater of Punjab. Subplots (e) and (i) show results for stochastic interpolation techniques, while the remaining subplots are for deterministic interpolation techniques.

Figure 5. Effect of data density on the accuracy of interpolation techniques. Subplots (e) and (i) show results for stochastic interpolation techniques, while the remaining subplots are for deterministic interpolation techniques.

Figure 6. Type of data distribution (Clustered, Random, and Dispersed) at 90% (a), 80% (b), 70% (c), 60% (d), and 50% (e) data density.

Table 1. Descriptive statistics of As concentration in the groundwater of Punjab.

Sr. #	Descriptive Statistics	As [ppb]
1	Arithmetic Mean	86.2
2	Median	60.0
3	Mode	100.0
4	Standard Deviation	95.2
5	Sample Variance	9066.1
6	Kurtosis	5.2
7	Skewness	1.8
8	Range	529.9
9	Minimum	0.1
10	Maximum	530.0

Table 2. Statistical analysis of the bias in interpolation techniques. Statistical parameters significant at 95% CI are highlighted in bold.

Type	Sr.	Interpolation Technique	RMSE [ppb]	MAE [ppb]
Deterministic Techniques	1	Inverse Distance Weighting (IDW)	13.5	87.8
	2	Spline interpolation	16.7	89.5
	3	Radial Basis Function	30.6	96.7
	4	Trend Surface Analysis	89.6	137.8
	5	Natural Neighbor Interpolation	2508.7	712.1
	6	Diffusion with barrier	89.7	140.9
	7	Global polynomial	89.6	137.8
	8	Local polynomial	78.9	129.7
Stochastic Techniques	9	Empirical Bayesian Kriging	87.8	138.2
Stochastic Techniques	10	Ordinary Kriging	87.7	136.7

Table 4. Effect of data density on the prediction of As concentration in the groundwater of Punjab. Statistical parameters significant at 95% CI and 99% CI are highlighted in bold and italic-bold, respectively. The statistical parameters are reported in units of ppb.

Sr. #	Interpolation Technique	90%		80%		70%		60%		50%
Sr. #	Interpolation Technique	RMSE	MSE	RMSE	MSE	RMSE	MSE	RMSE	MSE	RMSE	MSE
1	Inverse Distance Weighting (IDW)	0.0	0.0	0.04	0.0	0.2	0.0	0.1	0.0	0.2	0.0
2	Spline interpolation	0.5	0.3	1.2	1.3	3.1	9.7	2.8	7.7	4.1	16.8
3	Radial Basis Function	37.2	1380.7	55.4	3075.0	69.5	4830.3	94.4	8902.0	83.0	6893.2
4	Trend Surface Analysis	7119.4	5.1 × 10⁷	5645.5	3.2 × 10⁷	4596.8	2.1 × 10⁷	4728.5	2.2 × 10⁷	3896.6	1.5 × 10⁷
5	Natural Neighbor Interpolation	43.6	1902.7	58.3	3408.8	71.5	5113.2	87.5	7649.4	81.1	6573.3
6	Diffusion with barrier	49.5	2454.2	55.8	3114.6	68.9	4750.1	88.1	7766.9	80.2	6427.5
7	Global polynomial	37.1	1380.7	55.4	3075.0	69.5	4830.3	88.5	7839.6	83.0	6893.2
8	Local polynomial	30.0	901.0	44.5	1988.5	55.5	3085.2	67.7	4588.4	64.6	4167.3
9	Empirical Bayesian Kriging	0.3	0.1	0.4	0.1	0.8	0.6	0.9	0.8	1.2	1.4
10	Ordinary Kriging	47.6	2262.6	39.8	1581.9	73.7	5427.2	49.2	2421.7	79.1	6254.2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tayyab, M.; Aslam, R.A.; Farooq, U.; Ali, S.; Khan, S.N.; Iqbal, M.; Khan, M.I.; Saddique, N. Comparative Study of Geospatial Techniques for Interpolating Groundwater Quality Data in Agricultural Areas of Punjab, Pakistan. Water 2024, 16, 139. https://doi.org/10.3390/w16010139

AMA Style

Tayyab M, Aslam RA, Farooq U, Ali S, Khan SN, Iqbal M, Khan MI, Saddique N. Comparative Study of Geospatial Techniques for Interpolating Groundwater Quality Data in Agricultural Areas of Punjab, Pakistan. Water. 2024; 16(1):139. https://doi.org/10.3390/w16010139

Chicago/Turabian Style

Tayyab, Muhammad, Rana Ammar Aslam, Umar Farooq, Sikandar Ali, Shahbaz Nasir Khan, Mazhar Iqbal, Muhammad Imran Khan, and Naeem Saddique. 2024. "Comparative Study of Geospatial Techniques for Interpolating Groundwater Quality Data in Agricultural Areas of Punjab, Pakistan" Water 16, no. 1: 139. https://doi.org/10.3390/w16010139

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Comparative Study of Geospatial Techniques for Interpolating Groundwater Quality Data in Agricultural Areas of Punjab, Pakistan

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Data Collection

2.3. Interpolation Techniques

2.3.1. Inverse Distance Weighting (IDW)

2.3.2. Spline Interpolation

2.3.3. Radial Basis Function

2.3.4. Trend Interpolation

2.3.5. Natural Neighbor Technique

2.3.6. Diffusion with Barrier

2.3.7. Global Polynomial Technique

2.3.8. Local Polynomial Technique

2.3.9. Empirical Bayesian Kriging

2.3.10. Ordinary Kriging

2.4. Performance Evaluation of Interpolation Techniques

3. Results and Discussion

3.1. As Concentration in the Groundwater of Punjab

3.2. Prediction Accuracy of Interpolation Techniques under Default Spatial Extent

3.3. Prediction Accuracy of Interpolation Techniques under Varying Boundary Conditions

3.4. Prediction Accuracy of Interpolation Techniques under Data Density Scenarios

4. Limitations and Directions for Future Research

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI