1. Introduction
Malaria is one of the longest-known diseases affecting humankind [
1], yet the numbers of malaria cases and deaths remain high, with an estimated 1.7 billion cases and 10.6 billion deaths recorded from 2000 to 2020 [
2]. Clarifying the global impact of malaria is crucial, as it remains one of the most severe public health problems, with its burden predominantly borne by poorer regions, particularly in Africa. In 2020 alone, an estimated 241 million malaria cases and 627,000 deaths occurred, with substantial economic costs exceeding
$12 billion annually. Despite major progress, including a 36% reduction in malaria mortality from 2010 to 2020, challenges remain, particularly in sub-Saharan Africa, where reductions in incidence and mortality rates have slowed. These statistics underscore the urgent need for improved malaria control and eradication strategies supported by crucial political and scientific initiatives [
3,
4]. The World Health Organization (WHO) has set the goal of a world free of malaria by 2030, defined as reductions in the mortality rate and incidence of malaria by at least 90% compared to 2015 [
2]. To achieve this goal, the development of a high-quality and sustainable malaria risk forecasting system is essential [
5].
Numerous researchers have attempted to map malaria risk using either geographic information system (GIS)-based spatial analysis or machine learning techniques. For example, Ferrao et al. developed a model for GIS-based spatial analysis and produced a map of malaria risk in Mozambique using climatic, sociodemographic, and clinical data [
6]. Wieland et al. used machine learning to combine climate data with mosquito samples collected in the field to model mosquito habits [
7].
We report the creation of a model using the multi-criteria evaluation (MCE) method for predictions with high temporal and spatial resolution. MCE is a standard method used with GIS for various purposes, including the mapping of malaria risk. Rakotoarison et al. used this method to estimate malaria risk in Madagascar using land-cover classifications based on satellite images, evaluation data, temperature, rainfall, and population density [
8]. Adeola et al. used it in combination with the normalized difference vegetation index (NDVI), the normalized difference water index (NDWI), and land surface temperature (LST) data from satellite images of South Africa [
9]. Researchers have extensively studied the estimation of malaria risk, considering various factors, including temperature, humidity, hydrogeomorphology, land usage, and precipitation. Notably, Asgarian et al. highlighted the complex influence of meteorological parameters on mosquito abundance, emphasizing a strong positive correlation between temperature and mosquito populations [
10]. Minale and Alemu developed a detailed malaria risk map for Bahir Dar, Ethiopia, considering factors like rainfall, temperature, and land use, which are crucial determinants of malaria hazard levels in different areas [
11]. Furthermore, Bhatt and Joshi utilized an integrated geospatial and MCE approach to identify malaria risk zones in Vadodara district, demonstrating the utility of geospatial tools in public health planning for malaria [
12]. However, little research has focused on the use of meteorological data with the MCE model, despite meteorological factors having strong impacts on mosquitoes [
13].
Malaria cases depend on the number of female
Anopheles mosquitoes. According to the U.S. Centers for Disease Control and Prevention, these insects fly no further than 2 km and generally live for 2–4 weeks (1–2 weeks of the egg, larva, and pupa stages and 1–2 weeks of the adult stage [
14,
15]. Therefore, a model that can predict those time intervals is needed. Kim et al. developed a model to forecast these stages from 1 to 16 weeks in advance, while Panzi et al. forecast malaria morbidity up to the year 2036, and attempts have been made to develop a model that can make such predictions for a short time [
5,
16]. These studies have achieved good accuracy but the spatial resolution of the predictions remains quite low.
In this study, we aim to address the research question: How can we improve the temporal and spatial resolution of malaria risk prediction models using meteorological data? The motivation behind these efforts is not only to enhance malaria mapping but also to enable more targeted and effective malaria control interventions, optimize resource allocation, and ultimately contribute to the global goal of malaria eradication. By increasing the temporal and spatial resolution of the predictions, we can better characterize the dynamics of mosquito populations and their habitats, leading to more precise identification of high-risk areas and facilitating timely and targeted interventions to reduce the burden of malaria.
Therefore, the objective of this research is to develop a sustainable prediction model based on mosquito habitat and life span information obtained from meteorological data that have higher spatial resolution and shorter time intervals than existing methods.
3. Methods
3.1. Research Outline
The research methodology of this study is illustrated in
Figure 2. Following data collection, preprocessing steps were undertaken, including conversion of file types if required, resampling, and splitting into grids of the selected size. In this study, the term malaria risk refers to an index ranging from 0 to 1, with higher values indicating a greater risk of malaria occurrence. We developed this index to quantify and analyze the malaria risk within each grid. The number of malaria cases within each grid was computed using the population distribution data and total malaria cases in each health zone.
The MCE technique employed in this study includes the analytical hierarchy process (AHP) and multiple linear regression (LR method). AHP is a structured technique for organizing and analyzing complex decisions based on mathematics and psychology. This method was developed by Saaty [
25] and is particularly useful for group decision-making. AHP can capture both subjective and objective aspects of a decision, and allows the intuitive judgments of the decision-maker to be quantified, ensuring consistency in comparisons among factors [
26].
Meanwhile, the LR method is a statistical technique that allows a dependent variable to be predicted based on the values of multiple independent variables. This method extends the simple linear regression model to include multiple predictors and its general form is represented by Equation (
1) [
27], as follows:
In our study, the MCE technique was used to determine the malaria risk in each grid by summing the weighted variables that indicate the risk for malaria associated with each factor. The risk for each factor was calculated using the probability density of each factor, which was estimated using the beta distribution. The weight of each factor was determined using either the AHP or LR method for comparison of the results of manual and mathematical methods. The accuracy of the model was evaluated through a comparison of the training and testing periods as well as a comparison of models that used the AHP technique versus the LR technique.
3.2. Preprocessing Data
The complete preprocessing workflow for the data described in this section is illustrated in the orange section of
Figure 2. Distributed ERA5 data (meteorological data) were in the form of a GRIB file and SRTM data (elevation data) were in an HGT file. Both the GRIB file and the HGT file were translated into GeoTIFF files. CIESIN population density data were distributed as a GeoTIFF file, and this file was used without translation. After translation of the necessary files, resampling was conducted as outlined in
Table 3; this process was selected to simplify the subsequent splitting of the data into 2 km × 2 km grids. km grids. Resampling was conducted using a bilinear method with the translator library GDAL [
28]. Meteorological data were collected hourly, and daily data for each pixel were calculated as the average hourly values over each day. After resampling, the data were split into 2 km × 2 km grids.
The meteorological and elevation data were processed by averaging the values of all pixels within each 2 km × 2 km grid. Humidity data were calculated using the August–Roche–Magnus formula (Equation (
2)), which incorporates the 2 m temperature (
T), 2 m dewpoint temperature (
), coefficient
(equal to 17.625), coefficient
(equal to 243.05), and the value of
(equal to 273.15 K, which is equivalent to 0
C) [
29]. This formula was used to calculate relative humidity (
).
Wind speed was calculated from the 10 m u and vs. components of wind using Equation (
3), where
is wind speed (m/s),
U is the 10 m u component of wind (m/s), and
V is the 10 m vs. component of wind (m/s). Then, 2-week average factor values were calculated for each day. To obtain population density data, the total population within each 2 km × 2 km grid was calculated.
3.3. Calculation of Malaria Data per 2 km × 2 km Grid
The workflow described here is depicted in red in
Figure 2. The malaria case number for each 2 km × 2 km grid was calculated using the total number of malaria cases in the health zone where the grid is located. Population percentage was calculated using the total population within the 2 km × 2 km grid and the total health zone population, which was calculated by summing the population density data of each health zone (Equation (
4), where ak is the malaria case number in grid
k,
M is health zone
M,
is the total malaria case number in health zone
M,
is the total population of location
k, and
is the total population of health zone
M).
3.4. Outline of the Model
We developed a model, shown in Equation (
5), using the MCE method [
30], where
S is the risk for malaria with a range of 0 to 1 for the 2 km × 2 km grid at location
k,
is constraint
i,
n is the number of constraints,
is the weight of factor
j,
is a function that indicates the risk for malaria for factor
j,
is the normalized factor
j value in location
k, and
m is the number of factors.
3.5. Estimation of the Parameters in Function
3.5.1. Calculation of the Probability Distribution
The green section in
Figure 2 depicts the process detailed in this section. The
f function is derived using Equation (
6), wherein
is the probability density function of factor
j (Equation (
6)), and
and
are beta distribution parameters of factor
j. To derive the parameters
and
, the open-source Python library SciPy maximum likelihood estimation (MLE) algorithm (version 1.9.3) was used with data for the training period (1 January 2018–31 December 2020).
After each beta distribution parameter was estimated, the maximum of each beta distribution within the range 0–1 was calculated and the beta distribution was normalized to 0–1, creating the
f function (Equation (
6)). This expresses the malaria risk associated with each factor over the range 0–1.
3.5.2. Calculation of the Mean Value of Each Range
To determine the parameter values (
and
) of the
f function, we calculated the average number of malaria cases per 2 km × 2 km grid.
Table 4 presents selected parameter values for each factor. For example, to analyze the impact of elevation, the mean number of malaria cases was computed for each interval of
m. The workflow described here is indicated by the green portions of
Figure 2.
3.5.3. Normalizing Factors and Deriving Constraints
Each factor was adjusted using a threshold and then normalized to the range 0–1. The threshold value for each factor is listed in
Table 5. The thresholds were selected based on locations where mosquitoes can survive and the relationships between malaria case numbers and the factors described in
Section 4.1 [
31]. Data falling outside of the threshold line were normalized to 0 and the constraint (
in Equation (
5)) was set to 0; meanwhile, for values within the range, the constraint was set to 1. Using this normalized factor and the mean number of malaria cases, we estimated the
f function parameter values using the MLE algorithm. The workflow described in this section is shown in green in
Figure 2.
3.6. Calculation of Weights
The weight () of each factor was calculated using two methods, one based on the answers to a questionnaire from eight specialists on malaria or mosquitoes working in the United States and Japan (AHP method), and the other based on the LR method. These two methods were used to compare manual and mathematical derivation methods.
3.6.1. AHP Method
For the method based on specialist knowledge, the questionnaire was collected using Saaty’s Continuous Rating Scale [
25]. All experts were asked to compare the importance of two factors to the occurrence of mosquitoes and the resulting scores for each factor are presented in
Table 6 with respect to all combinations of paired factors. In the MCE process, the primary criterion was the professional expertise of the respondents. All participants in the study were required to be affiliated with universities or research institutes, ensuring that their responses were grounded in specialized knowledge and experience in the field. The data collected in the questionnaire were input into the AHP formula to obtain weights. Then, those weights were normalized to the range 0–1 such that the sum of the weights equals 1, thereby enabling comparison with the LR method. The workflow steps noted in this section are marked in yellow in
Figure 2. The answers recieved from each experts are shown as the
Table A1.
3.6.2. LR Method
To determine factor weights for the model, we extracted meteorological and elevation data associated with the top 1–20% of malaria cases. Then, we utilized the LR method, applying Equation (
8), where
represents the number of malaria cases. Subsequently, we derived the weight (
) by normalizing the explanatory variables (
) to the range 0–1 using Equation (
9). The top 1–20% was selected to prioritize cases with higher percentages, emphasizing areas where malaria poses a significant risk over areas of lower risk. The workflow outlined in this section is visualized in the purple portion of
Figure 2.
3.7. Accuracy Evaluation
We evaluated the accuracy of our model from three perspectives. First, we compared the selected ranges of both models. Second, we evaluated model performance by selecting the top 1, 5, 10, and 20 percentiles of malaria cases for each risk range and comparing results between the training and validation periods. Finally, we compared the plotted risk and malaria case numbers on two dates, representing the rainy and dry seasons.
To compare the models using AHP and LR, we calculated the top 1–5% of malaria infection numbers for every 0.1 increment of risk and plotted those values on the grid. This process allowed us to determine whether an overall increase in malaria cases occurs with increasing risk.
For the percentile comparisons, a linear function was derived from the data points corresponding to each percentile. Finally, for the seasonal comparison, we compared the plotted risk and malaria cases across all 34 health zones on two dates, 1 April 2021 and 21 June 2021, representing the rainy and dry seasons, respectively. We selected these dates because they align with distinct seasons and the DES provided malaria case counts for all 34 health zones on these dates.
5. Conclusions
Despite continuing demands and efforts to develop sustainable systems for predicting malaria risk, such systems have remained elusive [
5]. To address this challenge, we developed a short-term prediction model for sustainable malaria risk forecasting. Malaria transmission is directly affected by mosquito lifespan and habitat, which are in turn influenced by climate and other factors [
32,
33]. Therefore, our model uses climate and evaluation data with a resolution of 2 km and considers the behavior of mosquitoes. Future research will address the model’s sensitivity to various timeframes of climate data, such as data from a month or less prior to the time of analysis, as the impact of such data on prediction accuracy remains unknown. This study represents an important step toward the development of a sustainable malaria risk forecasting system, which has been a long-standing challenge. By providing a robust tool for predicting malaria outbreaks, this research can aid governmental and public health agencies in devising preemptive strategies, optimizing resource allocation, and enhancing community engagement and education efforts to mitigate the impact of the disease. More benefit will be obtained from targeted and effective disease prevention measures, and planners can incorporate the model’s findings into broad health and urban development policies. For example, the predictive power of our model enables health agencies to identify areas at elevated risk of malaria outbreaks with unprecedented precision. This capability allows for more strategic deployment of limited resources, such as mosquito nets, insecticides, and antimalarial drugs. This model can support targeted campaigns for mosquito control and community health education, especially during high-risk periods identified by the model. In conclusion, the integration of our model into the public health policy framework can serve as a catalyst for more dynamic and evidence-based strategies for malaria prevention and treatment, ultimately decreasing malaria incidence and improving public health outcomes.
Overall, our findings provide insights into the development of a model for predicting malaria risk with high temporal and spatial resolution, thereby supporting malaria control and management efforts. Although this study focused on South Kivu, DRC, the model is designed to be versatile and not inherently region-specific; therefore, further studies conducted on a region-by-region basis are essential to fully ascertaining the applicability of the model to distinct climate patterns and environmental factors that influence malaria transmission, thereby supporting adaptation of the model to other regions. Moreover, as the LR method showed promising results but has not been widely applied, exploring the development of novel, user-friendly methodologies remains an important avenue of future research. Collaborative efforts between modelers, public health officials, and local governments can enhance the effectiveness of malaria interventions across diverse regions.