Optimizing Sporting Actions Effectiveness: A Machine Learning Approach to Uncover Key Variables in the Men’s Professional Doubles Tennis Serve

Vives, Fernando; Lázaro, Javier; Guzmán, José Francisco; Martínez-Gallego, Rafael; Crespo, Miguel

doi:10.3390/app132413213

Open AccessArticle

Optimizing Sporting Actions Effectiveness: A Machine Learning Approach to Uncover Key Variables in the Men’s Professional Doubles Tennis Serve

¹

Department of Sport and Physical Education, University of Valencia, 46010 Valencia, Spain

²

Independent Researcher, 46100 Valencia, Spain

³

Development Department, International Tennis Federation, London SW15 5XZ, UK

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(24), 13213; https://doi.org/10.3390/app132413213

Submission received: 5 November 2023 / Revised: 3 December 2023 / Accepted: 8 December 2023 / Published: 13 December 2023

(This article belongs to the Special Issue Analytics in Sports Sciences: State of the Art and Future Directions)

Download

Browse Figures

Versions Notes

Abstract

:

This study used a novel machine learning approach to uncover key serve variables that maximize effectiveness in men’s professional doubles tennis. A large dataset of 14,146 serves from 97 Davis Cup doubles matches played between 2010 and 2019 was analyzed using explainable AI techniques. The angle and distance from the bounce to the sidelines of the serves were found to best distinguish the points won with aces from rallies lasting more than three strokes. Optimal serve angle ranges of 5.7–8.7° substantially increased the probability of serving an ace by over 80%, compared to around 30% when serving used more central angles. Lateral bounce distances of 0–28 cm from the sidelines also boosted the ace probability by over 50%. The serve speed was shown to have less influence on serve effectiveness as compared to singles tennis, with velocities above 187 km h⁻¹ only increasing the probability of serving an ace by 10%. These findings have important practical implications for the tactical decision-making and technical training of serves in men’s professional doubles tennis. The data highlight that the angle and placement of serves are more important than velocity for attaining effective serves in doubles. Coaches and players can use this knowledge to pay special attention to the most important variables in the effectiveness of serves, such as the line distance and angle, in order to maximize the performance of the doubles serve. The novel methodology used in this study provides a valid and reliable way to calculate the efficiency of actions in various sport disciplines using tracking data and machine learning approaches.

Keywords:

racquet sports; tactics; performance analysis; sport analytics; tracking technology

1. Introduction

The serve has become the most decisive stroke in modern professional tennis, particularly in the doubles game [1]. It has been shown to be a key shot for controlling the game and even winning points directly through aces, thus being a critical technical and tactical skill for success at the professional levels of the game [2,3].

Previous research has established that the serve provides a key advantage in professional tennis, and this effect is amplified in doubles matches. Servers demonstrate clear dominance over returners in singles tennis, winning approximately 60% of points lasting fewer than four strokes [4]. Moreover, serve metrics like the second serve points won are among the strongest predictors of overall success [5,6]. The positional dynamics and rules of doubles tennis further accentuate the serve’s impact, with data showing an even greater serving advantage for doubles compared to singles, especially in short points [1,7,8]. Additional factors related to the surface speed [9] and team experience [10] also influence professional doubles servers’ effectiveness. Apart from the heightened serve advantage in doubles, service tactics are also shaped by contextual information like the score and the returner position, which influence players’ decision-making [11]. By leveraging such situational data, servers can increase serve variety, target specific opponent weaknesses, maintain consistency, and increase unpredictability [12]. Collectively, these findings underscore the serve as a critically decisive stroke with major tactical and strategic implications in high-level doubles tennis.

The introduction of tracking technology like Hawk-Eye in tennis has enabled access to rich datasets, which researchers have utilized to predict game indicators and parameters especially related to the serve [6,13]. For singles tennis, an analysis of serving patterns shows accuracy measures are more impactful than speed [11]. For doubles, tracking data reveals that the serving placement and direction strongly influence the effectiveness of and anticipation by returners [14,15].

Machine learning techniques have also been applied in recent tennis literature leveraging these large datasets [16]. Broad applications include the stroke classification [17], an automated recognition of movement patterns [18,19], evaluating player performance [20,21], and modeling shot variety and effectiveness [22]. Significant attention has focused on predicting serve characteristics and overall match outcomes. The different modeling approaches explored include regression, point-based comparisons, neural networks, random forests, and ensemble methods [23,24,25,26,27,28,29,30,31,32,33,34].

In summary, ample research has investigated match prediction in tennis using diverse machine learning techniques on increasing volumes of tracking data. However, doubles tennis has received less specific focus, representing an open area for deeper investigation, particularly around service analytics. Therefore, the aim of this study was to uncover the variables associated with a greater serve effectiveness in men’s professional doubles matches using a large dataset from Davis Cup doubles matches and machine learning techniques. The results obtained highlight the angle and the distance of the bounce of the ball from the line as the most important and decisive variables related to the effectiveness of serves.

2. Materials and Methods

2.1. Sample

A total of 14,146 serves were analysed from 97 full men’s doubles matches played during the Davis Cup (qualifying ties) between 2010 and 2019. Our study featured the participation of 123 teams and 160 players from 34 different countries, with an average age of 30.03 ± 4.73 years.

2.2. Instruments

The data utilized in this study were obtained from the Hawk-Eye system [35] deployed during Davis Cup events. It consists of a group of ten cameras which are placed around the circumference of the court and capture and record the trajectory of the ball and the players’ movement on the court [14]. The Hawk-Eye System is well-known for its high accuracy and reliability in tracking and analyzing tennis matches. It is widely used in professional tennis and has provided data for numerous research papers [6,11,14,15].

2.3. Procedure

A complete E2E AI pipeline was built to experiment with data processing and explainable artificial intelligence (XAI):

Experimental setup.
Data processing.
Training the deep learning model.
Feature importance algorithms.
Probabilistic and statistical analysis. Values selection based on maximizing the desired effectiveness and minimizing the undesired one.
Synthetic dataset generation, and simulation based on the predictions of this dataset and an analysis of improvements in effectiveness.

2.3.1. Experimental Setup

The XAI process can be summarized in steps 2 to 6. A pipeline that included all xAI processes from data processing to training the deep learning model was built to calculate the feature, to select values, and to test the results on a synthetic dataset. The whole process is summarized in Figure 1.

2.3.2. Data Processing

Dataset

The dataset consisted of the following variables:

Independent variables:

Court side: Side of the court from where the serve is played. This variable was used to divide the dataset and perform the analysis for each side of the court.
Effectiveness:
-
Type 1: The point finishes with one shot.
-
Type 2: The point finishes with two shots.
-
Type 3: The point finishes with three shots.
-
Type 4: The point finishes with four shots.

Dependent variables:

Speed: Mean speed of the serve.
Position: Position of the server when hitting the serve (A in Figure 2).
Time: Time between ball impact and ball bounce.
Speed loss: Loss of speed of the ball after its bounce.
Impact Z: Height of the ball at impact.
Net clearance: Height of the ball when passing over the net.
δ: Serve angle (Figure 2).
β: Vertical projection angle.
dL: Distance from ball bounce to the sideline of the service box

EDA and Data Processing

After some exploratory data analysis (correlation matrices, boxplots, histograms, kernel density estimation plots (KDE), etc.), the data were prepared to apply machine learning classification algorithms. Some variables were not informative as their variance was equal or close to 0. The main data processing steps were the following:

Oversample the weak target samples to solve the dataset “class unbalancing” problems.
Feature selection was conducted using a combination of tennis knowledge criteria together with correlation analysis and feature selection algorithms.
Outliers were detected based on statistic methods and were filtered afterwards.

Furthermore, some features did not follow a simple normal distribution but a mixture of gaussians. The collinearity issues between variables, which did not affect the model performance but affected the interpretability of the results, were also solved. Thus, feature importance was applied only with uncorrelated features. This process was conducted separately for δ and dL.

2.3.3. Training the Deep Learning Model

The dataset was divided into train (70%), validation (15%) and test (15%) before applying any data processing, keeping the test and validation subsets completely independent from data leaks from the training process. The validation dataset was used to apply callbacks such as early stopping, the model check pointer, etc. Cross-validation was used to validate the model. For validation, the balanced accuracy score and the loss of each p were monitored. For the final evaluation of the model, the accuracy and balanced accuracy (as well as F1 score, precision and recall) on the test subset were calculated.

AutoGluon v0.5.2 and FastAI v2.5.6 were utilized in this study for this purpose. AutoGluon presents commendable tools for explanation and automated feature enhancement. However, it exhibited a lack of the flexibility required for this research, as certain outcomes were inconsistent. Consequently, it was decided to employ FastAI for the analysis of tabular data.

The implementation of AutoGluon was straightforward, as this tool undertakes the preprocessing of the dataset, trains multiple models, selects the optimal one, assesses feature importance, and employs various XAI algorithms to deliver global explanations for the models.

On the other hand, FastAI is a tool that offers functionalities complimentary to those of PyTorch and AutoGluon. It provides certain data processing functionalities while facilitating the fine-tuning training process. Although it demands a greater degree of software development, it offers advantageous high-level flexibility options. These include the ability to select the number of hidden layers, determine the loss function, and automate the training process, commonly known as “babysitting”, through the application of beneficial callbacks such as a state-of-the-art learning rate finder, early stopping, and model checkpointing.

The backbone architecture for the deep learning model and the loss function included in the following steps are the following:

Backbone Network: A tabular model with 5 layers of feature size: 256, 128, 128, 128, 64. The basic layer block (named LinBnDrop) is formed by the following transformations:

Linear Layer (torch.nn.linear);
Rectified Linear Unit—ReLU (torch.nn.relu);
Batch Normalization—BatchNorm1d (torch.nn.BatchNorm1d).

The determination of the optimal number of layers and feature size were driven by experimentation. A standard backbone for tabular data was initially employed in the investigation. For the dataset under consideration, it was observed that a shallow deep neural network (DNN) possessed an adequate capacity to extract relevant patterns. The introduction of additional layers resulted in overfitting the model, whereas a reduction in the number of layers led to underfitting. Similar considerations were applied to feature sizes, commencing with a standard size and subsequently adjusting them iteratively until the size yielding optimal performance on the dataset was identified.

Loss function: Focal loss flat [36]. The focal loss works especially well with imbalanced data as it adapts its weights to focus learning on hard misclassified examples. The primary rationale behind its utilization stems from the presence of imbalanced classes in the dataset. Additionally, an exploration of the flat cross entropy loss was conducted, revealing a comparable performance to the focal loss. However, it was observed that the flat cross entropy loss exhibited a slightly diminished robustness in addressing class imbalances when compared to the focal loss on our dataset.

Optimization algorithm: The Adam optimization method [37] was employed in this study. Adam is characterized as an adaptive learning rate optimization algorithm, with its principal hyperparameters being the learning rate and momentum. The rationale behind incorporating momentum in conjunction with re-scaling lacks clear theoretical motivation. Nevertheless, Adam is widely acknowledged for its robustness across a range of hyperparameter choices.

Callbacks: Several callbacks were implemented to enhance training performance in this study. Early stopping was employed to halt training when the model ceased to exhibit improvement over the last epochs. The model checkpoint callback was also utilized to create copies of the model at specific intervals during training. This allowed for the subsequent selection of the best-performing model among the saved copies.

2.3.4. Feature Importance Algorithms for Feature Explanation

Feature importance algorithms were used to select the variables that have a higher impact on the outcome of the models. Feature permutation importance and SHAP summary plots are the two algorithms commonly accepted in the scientific community [38,39].

Feature permutation importance [39] is based on repeated permutations of the outcome vector to estimate the distribution of measured importance for each variable in a non-informative setting.

SHAP assigns each feature an importance value for a particular prediction [40]. SHAP values attribute to each feature the change in the expected model prediction when conditioning on that feature.

2.3.5. Probabilistic and Statistical Analysis: Values Selection Based on Maximizing the Desired Effectiveness and Minimizing the Undesired One

Shapley values and its variants are adequate “local model-agnostic methods”, which provide interesting insights of a single prediction. Global explanation methods can help understand the model through visualizations and explanations of the final model weights [40]. As the aim of the study was to understand how to maximize the serve effectiveness (how and why specific values of the selected features affect the predicted target variables), the goal was to calculate which values threshold of the selected features maximize the likelihood of effectiveness type 1 and minimize the likelihood of effectiveness type 4. For this purpose, a novel semiautomated algorithm based on classical statistics was developed. The first steps of this process were the following:

First, a smooth probability distribution of the data using the kernel density estimation algorithm (KDE), together with statistical estimators, was calculated for each selected feature. The estimators were mainly the mean, the MLE, and percentiles 5 and 95 for each type of effectiveness.
Then, the values were mostly identified by selecting the area of the plot which combined a higher density of the points with the desired result (type 1) and a lower density of the undesired result (type 4). The statistical estimators also helped to understand the probability distribution.

2.3.6. Synthetic Dataset Generation and Simulation Based on the Predictions on the Dataset and Analysis of Improvements in Effectiveness

In the final step, a synthetic dataset using the selected values was generated, and the prediction was simulated. Since the model reached 93% f1-score, the predictions were reliable, at least for this specific dataset. By doing this, the approach was validated, and values could be adjusted to improve the rate of effectiveness. The next steps in the process were the following:

The values outside the desired selected threshold were substituted by random values inside it and, therefore, a synthetic dataset was generated.
Then, predictions on the whole dataset were performed using the deep learning model previously trained. Then, the final rates of Ef1 and Ef4 were calculated.

It is important to comment that the synthetic dataset assumes a role in the validation of the explanatory algorithm. Following the application of “feature importance” algorithms to identify the most significant variables, an assessment is conducted to determine values that contribute to increased efficiency, relying on insights derived from probability density plots. The values selected are oriented towards maximizing the probability of efficiency 1 within the confines of the real dataset.

To validate this optimization, the original dataset undergoes modification, wherein the values of each variable are adjusted in accordance with the proposed recommendations. Specifically, the dataset is ‘clipped’, entailing the substitution of values outside the established range limits with randomly generated values falling within the recommended limits. Subsequently, the trained model is employed to generate new predictions on this adjusted dataset. This methodology serves to validate the potential impact on efficiency, providing insights into how the metric might evolve when utilizing the recommended values for the selected variables.

Also, it should be noted that the most relevant part is the one described in step 2, where the threshold values for each variable were selected. Steps 3 and 4 also helped with the validation and fine-tuning of the selection but were mainly an evaluation of the final results to quantify the success of the process.

To efficiently perform multiple experimentation scenarios, all the processes were automated, except one aspect of the second step, in which the areas that maximize and minimize the target variable in the desired way were selected.

3. Results

3.1. Training of a Deep Neural Network

Table 1 displays the testing data metrics of the dataset for different types of effectiveness. As it can be observed, the model achieved high performance metrics for the dataset of type 1 and of type 4, which were f1-score 0.95 and 0.94 for each class. The overall classification accuracy of the evaluation test set was 0.94. Therefore, this model, which serves as the basis for the other algorithms, has been shown to be reliable for defining the characteristics of the serve when points end in an ace and those that last four or more shots.

3.2. Calculation of the Most Relevant Variables

Once the reliability of the model in distinguishing between points that end with an ace and points that last for four or more strokes was established, the next step was to calculate the serve variables that had the greatest relevance for each type of effectiveness. Figure 3, Figure 4 and Figure 5 depict the mean SHAP values, which indicate the average impact on the model outputs for each variable on the deuce and advantage sides, respectively. As it can be observed, the results of the most important variables were very similar for both sides. Regardless of the serving side, the variables that had the most relevance in the model were δ and dL. Additionally, “Speed” was the other variable that had significant importance in the developed model.

The mean SHAP values show the average impact on the model outputs for each variable. The results of the most important variables were very similar for the deuce and advantage sides. After separating highly correlated variables (δ and dL) the SHAP values summary, it was shown that in both cases, δ and dL were the most important variables. The algorithm showed that the Speed is the next most important variable for the model.

3.3. Values That Maximize the Effectiveness of Type 1 (and Minimize the Effectiveness of Type 4): That Is, Recommended Values

Table 2 shows the distributions of the variables dL, δ, and Speed of the serves for points ending with an ace or with more than three shots for both sides of the court. Regarding dL, aces showed a similar distribution on both sides of the court. However, the serves for points with four or more shots showed significant differences in the distribution of their population. It can be observed that on the advantage side, a greater number of serves with higher values of dL were found.

Regarding δ, the distribution of serves where the point lasted more than three strokes was similar on both sides of the court. However, it can be observed that aces have a wider range on the deuce side compared to the advantage side which indicates that, on the deuce side, larger angles would maximize the probability of achieving an ace to a greater extent than on the advantage side.

Regarding Speed, again, the distribution of serves where the point lasted more than three strokes was similar on both sides of the court. However, the recommended values to maximize the probability of serving an ace were slightly lower on the deuce side than on the advantage side.

The recommended values to maximize the probability of serving an ace would be somewhat lower for the deuce than for the advance side. For the other efficiencies, no significant differences were found.

3.4. Generation of a Dataset Substituting Original Values for Recommended Values of the Variable in Question and Prediction of Efficiencies (the Model Trained in Point 1 Was Used to Make the Predictions)

Using the variables associated with the population density graphs, a statistical estimation was performed on the values that would, a priori, maximize the probability of serving an ace and minimize the probability of the point lasting four or more strokes. Table 3 shows the results of the predictions using the trained model described earlier and the synthetic dataset, which resulted from replacing values that fell outside the recommended range with random values within the recommended range. As can be observed, dL values between 0 and 0.28 m increase the probability of achieving an ace to values close to 90%. Similarly, δ values between 5.7° and 8.7° increase the possibility of achieving an ace to values above 80%. On the other hand, Speed values between 187 and 220 km h⁻¹ only increase the probability of achieving an ace by approximately 40–45%.

4. Discussion

Using a novel methodological proposal, this study sought to identify which characteristics of serves determine their efficiency in men’s professional doubles tennis. The results showed that the characteristics of the serves in which the point ended in two, three or more shots were very similar. However, it was possible to identify different characteristics for the aces, compared to those in which the point lasted more than three shots. Therefore, these results allow us to establish the variables and values which are associated with a higher probability of serving an ace in men’s professional doubles tennis. In addition, the methodology used has been shown to be a novel, valid and reliable way that can be replicated for the calculation of the efficiency of actions in different sports disciplines.

The results have shown that the service angle is one of the variables that best determines the probability of serving an ace in men’s professional doubles tennis. These results are consistent with a previous study of the serve in singles tennis, which indicated that if the serve angle was less than 5.88°, the serve will be returned with a probability of 92.52% [11]. In doubles tennis, according to the data obtained in the present study, an angle of between 5.7° and 8.7° theoretically increases the probability of serving an ace from 29.77% to 88.21% on the advantage side and from 33.70% to 88.68% on the deuce side. These results support those obtained by previous studies, where Whiteside and Reid [11] reported that there is a higher probability of getting an ace when the serve is played wide than to the T zone, with probabilities of 42% on the advantage side and 52% on the deuce side. Wide serves generate a greater angle than those played to the T, so it is logical, according to the data obtained, that they are also more effective. Furthermore, from a technical point of view, and specifically related to the different types of serve spin used, it has been shown that players use the slice serve to generate a greater serve angle on the deuce side, whereas they opt for flat or topspin serves for a greater angle when serving to the T on the advantage side [6,19,41].

Another of the variables found to have a greater relevance in determining the probability of serving an ace has been the distance from the bounce of the ball to the sideline of the service box. This reinforces the idea that one of the main objectives of the server is to gain an advantage over the receiver, trying to get the ball as far away from the receiver as possible [6,14,15]. Whiteside and Reid [11] determined that in 77.53% of cases, if the distance to the line was less than 15.27 cm and, in addition, the serve angle was greater than or equal to 5.88°, the serve would be an ace. In the case of men’s professional doubles tennis, in this study, it has been shown that, if the serve distance is between 0 and 28 cm, the probability of a serve being an ace would increase from 30.64% to 81.26% on the advantage side and from 33.05% to 83.40% on the deuce side.

The results on the importance of the angle of the serve and the distance of the bounce of the ball to the line also support what has been found in previous studies on the professional doubles game. According to Vives et al. [14], players try to take the initiative of the point by moving the receiver out of the court, both on the advantage and deuce sides. This study reported a high efficiency of the serve when players serve wide on both sides of the court, with values of 40.71% on the deuce side and 37.54% on the advantage side. Similarly, the percentages of the effectiveness of serves to the T were considerably high, with values of 38.45% and 39.28%, respectively. The information provided in this study regarding the most important variables and the values that maximize the probabilities of direct serves could be very valuable for coaches and players in order to further increase the effectiveness of this type of service.

In singles, the serve speed has been shown to be one of the most important variables when serving an ace. Previous studies have reported that the probability of serving an ace if the serve speed is greater than 198 km h⁻¹ is 58.68% [11]. However, the data obtained in our study show that the serve speed is less important in doubles tennis. Thus, it was found that once 187 km h⁻¹ is exceeded, the probability of serving an ace would only increase by about 10%, regardless of the side of the serve.

Therefore, considering the results from this study and the values found for the most important variables that determine the execution of serving an ace, it can be stated that, although speed is an important variable, very high speeds do not considerably increase the probability of winning the point directly with the serve. However, the angle of the serve and the bounce of the ball close to the sidelines do significantly increase the probability of serving an ace.

The primary advantages of this method lie in its capacity for variable explainability. Feature importance scores are computed not only to discern the most pivotal variables, but also to propose optimal value ranges for these variables to maximize efficiency. It is crucial to note, however, that this process is not readily automated and necessitates human intervention for analysis. A future direction is proposed, aiming to establish mathematical foundations to formulate the selection of value ranges for selected variables that maximize the probability of the desired variable. Furthermore, noteworthy correlations between variables, indicative of collinearity, were identified in the study. Although these correlations did not adversely impact model prediction performance, they significantly influenced feature importance algorithms. Moving forward, addressing such collinearity could be pivotal for more refined analyses.

On the other hand, the exclusivity of the dataset to Davis Cup hard court matches introduces a limitation to the generalizability of the findings, precluding straightforward extrapolations for all tennis surfaces, levels and genders. While augmenting the dataset with diverse surfaces, levels and genders could enhance the dataset variance, it concurrently poses challenges in deriving specific and universally applicable conclusions. Furthermore, the study underscores the prominence of a player’s playing style over physical attributes concerning the maximization of serve efficiency. For future endeavors, an exploration of player grouping based on similar characteristics, facilitated by clustering algorithms, could be pursued to train distinct models for each group.

These results have very relevant practical implications related to the training regime for serve practices in men’s professional doubles. From a tactical perspective, the insights found in this study can assist tennis players in their decision-making to maximize their serve advantage. In addition, technically, physically, and mentally, it is also important to prepare doubles tennis players for efficient serve executions with respect to the values of the indicated variables. Therefore, the effective application of the values found for variables such as the serve angle, distance to the line, or speed will determine the design of specific training tasks and help define specific patterns of play [11,14].

5. Conclusions

In this study, key characteristics for a direct serve in men’s professional doubles tennis were identified. The angle of the serve was found to be a crucial variable, while the distance from the ball bounce to the sideline was also relevant. These findings are consistent with previous research on doubles tennis. It was observed that an optimal serve angle between 5.7° and 8.7° significantly increased the probability of a direct serve. In addition, maintaining a distance from the ball bounce to the sideline between 0 and 28 cm increased the probability of success. The speed of the serve in doubles tennis showed less influence compared to singles tennis.

The novel methodology employed here utilizing tracking data and explainable AI techniques provides a valid and reliable way to model service efficiency in tennis. The major contribution of this work is the extraction of specific parameter values to optimize the doubles serve through data-driven analytics. From a practical perspective, the serve angle and placement recommendations offer directly implementable guidelines for players and coaches. Technically, physically and mentally targeting these service metrics should become integral in training programs and match tactics.

This research focused exclusively on elite men’s hard court doubles matches, so an area for future investigation is expanding dataset diversity to include different surfaces, levels and genders. Furthermore, grouping players by style and developing individualized models could provide more tailored strategic recommendations. The advanced analysis framework presented can be extended to other facets of tennis performance or different sports relying on spatiotemporal tracking data. As machine learning approaches, wearable sensors and tracking tools continue to proliferate in sports, ample opportunities exist to generate actionable intelligence maximizing success.

Author Contributions

Conceptualization, F.V., J.F.G., M.C. and R.M.-G.; methodology, F.V., J.F.G., M.C. and R.M.-G.; formal analysis, J.L., F.V. and R.M.-G.; investigation, F.V., J.F.G., M.C. and R.M.-G.; data curation, J.L. and F.V.; writing—original draft preparation, F.V., J.F.G., M.C. and R.M.-G.; writing—review and editing, F.V., J.F.G., M.C. and R.M.-G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Restrictions apply to the availability of these data. Data was obtained from the International Tennis Federation and are available from the authors with the permission of the International Tennis Federation.

Acknowledgments

The authors wish to acknowledge the contributions of the International Tennis Federation Participation and Education Department.

Conflicts of Interest

The authors declare no conflict of interest.

References

Martínez-Gallego, R.; Crespo, M.; Ramón-Llin, J.; Micó, S.; Guzmán, J.F. Men’s doubles professional tennis on hard courts: Game structure and point ending characteristics. J. Hum. Sport Exerc. 2020, 15, 633–642. [Google Scholar] [CrossRef]
Kostoglou, G.T.; Tsitas, K.; Kostoglou, V.; Kostoglou, P. The importance of the serve in tennis: A systematic review. Int. J. Perform. Anal. Sport 2017, 17, 505–518. [Google Scholar]
Filipcic, A.; Zecic, M.; Reid, M.; Crespo, M.; Panjan, A.; Nejc, S. Differences in performance indicators of elite tennis players in the period 1991–2010. J. Phys. Educ. Sport 2015, 15, 671–677. [Google Scholar] [CrossRef]
O’Donoghue, G.P.; Brown, E. The importance of service in Grand Slam singles tennis. Int. J. Perform. Anal. Sport 2008, 8, 70–78. [Google Scholar] [CrossRef]
Reid, M.; Whiteside, D.; Elliott, B. Serving to different locations: Set-up, toss, and racket kinematics of the professional tennis serve. Sports Biomech. 2011, 10, 407–414. [Google Scholar] [CrossRef] [PubMed]
Mecheri, S.; Rioult, F.; Mantel, B.; Kauffmann, F.; Benguigui, N. The serve impact in tennis: First large-scale study of big Hawk-Eye data. Statistical Anal. Data Min. ASA Data Sci. J. 2016, 9, 310–325. [Google Scholar] [CrossRef]
Kocib, T.; Carboch, J.; Cabela, M.; Kresta, J. Tactics in tennis doubles: Analysis of the formations used by the serving and receiving teams. Int. J. Phys. Educ. Fit. Sport 2020, 9, 45–50. [Google Scholar] [CrossRef]
Martínez-Gallego, R.; Salvador, S.M.; Luján, J.F.G.; Reid, M.; Ramón-Llin, J.; Crespo, M. Challenging serve myths in doubles tennis. Int. J. Sports Sci. Coach. 2021, 16, 1305–1311. [Google Scholar] [CrossRef]
Carboch, J.; Kočíb, T. A comparison of service efficiency between players of male and female doubles at professional tennis tournaments. Auc Kinanthropol. 2016, 51, 56–62. [Google Scholar] [CrossRef]
Martínez-Gallego, R.; Vives, F.; Guzmán, J.F.; Ramón-Llin, J.; Crespo, M. Time structure in men’s professional doubles tennis: Does team experience allow finishing the points faster? Int. J. Perform. Anal. Sport 2021, 21, 215–225. [Google Scholar] [CrossRef]
Whiteside, D.; Reid, M. Spatial characteristics of professional tennis serves with implications for serving aces: A machine learning approach. J. Sports Sci. 2017, 35, 648–654. [Google Scholar] [CrossRef] [PubMed]
Giampolo, F.; Levey, J. Serves. In Championship Tennis; Human Kinetics: Champaign, IL, USA, 2013; p. 59. [Google Scholar]
Wei, X.; Lucey, P.; Morgan, S.; Carr, P.; Reid, M.; Sridharan, S. Predicting serves in tennis using style priors. In Proceedings of the KDD ’15: The 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, 10–13 August 2015; pp. 2207–2215. [Google Scholar] [CrossRef]
Vives, F.; Crespo, M.; Guzmán, J.F.; Martínez-Gallego, R. Effective serving strategies in men’s doubles Davis cup matches: An analysis using tracking technology. Int. J. Perform. Anal. Sport 2022, 22, 638–648. [Google Scholar] [CrossRef]
Martínez-Gallego, R.; Crespo, M.; Jiménez, J. Analysis of the differences in serve effectiveness between Billie Jean King Cup (former Fed Cup) and Davis Cup doubles tennis matches. Int. J. Sports Sci. Coach. 2021, 16, 777–783. [Google Scholar] [CrossRef]
Mahesh, B. Machine learning algorithms—A review. Int. J. Sci. Res. 2020, 9, 381–386. [Google Scholar]
Mlakar, M.; Luštrek, M. Analyzing tennis game through sensor data with machine learning and multi-objective optimization. In Proceedings of the UbiComp ’17: The 2017 ACM International Joint Conference on Pervasive and Ubiquitous Computing, Maui, HI, USA, 11–15 September 2017; pp. 153–156. [Google Scholar] [CrossRef]
Giles, B.; Kovalchik, S.; Reid, M. A machine learning approach for automatic detection and classification of changes of direction from player tracking data in professional tennis. J. Sports Sci. 2020, 38, 106–113. [Google Scholar] [CrossRef] [PubMed]
Giles, B.; Peeling, P.; Kovalchik, S.; Reid, M. Differentiating movement styles in professional tennis: A machine learning and hierarchical clustering approach. Eur. J. Sport Sci. 2021, 23, 44–53. [Google Scholar] [CrossRef] [PubMed]
Bayram, F.; Garbarino, D.; Barla, A. Predicting Tennis Match Outcomes with Network Analysis and Machine Learning. In SOFSEM 2021: Theory and Practice of Computer Science; Bureš, T., Ed.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2021; Volume 12607, pp. 525–536. [Google Scholar] [CrossRef]
Perri, T.; Reid, M.; Murphy, A.; Howle, K.; Duffield, R. Prototype Machine Learning Algorithms from Wearable Technology to Detect Tennis Stroke and Movement Actions. Sensors 2022, 22, 8868. [Google Scholar] [CrossRef]
Kovalchik, S.; Reid, M. A shot taxonomy in the era of tracking data in professional tennis. J. Sports Sci. 2018, 36, 2096–2104. [Google Scholar] [CrossRef]
Sipko, M.; Knottenbelt, W. Machine Learning for the Prediction of Professional Tennis Matches; MEng Computing-Final Year Project; Imperial College London: London, UK, 2015; Volume 2. [Google Scholar]
Kovalchik, S.A. Searching for the GOAT of tennis win prediction. J. Quant. Anal. Sports 2016, 12, 127–138. [Google Scholar] [CrossRef]
Yang, M.; Qu, Q.; Shen, Y.; Zhao, Z.; Chen, X.; Li, C. An effective hybrid learning model for real-time event summarization. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 4419–4431. [Google Scholar] [CrossRef]
Cornman, A.; Spellman, G.; Wright, D. Machine Learning for Professional Tennis Match Prediction and Betting; Working Paper; Stanford University: Stanford, CA, USA, 2017. [Google Scholar]
Sekar, A. Predicting the Winner of a Tennis Match Using Machine Learning Techniques. Ph.D. Thesis, College of Ireland, Dublin, Ireland, 2019. [Google Scholar]
Candila, V.; Palazzo, L. Neural networks and betting strategies for tennis. Risks 2020, 8, 68. [Google Scholar] [CrossRef]
Van Rooij, C. Machine Learning in Tennis: Predicting the Outcome of a Tennis Match Based on Match Statistics and Player Characteristics. Master’s Thesis, School of Humanities and Digital Sciences of Tilburg University, Tilburg, The Netherlands, 2021. [Google Scholar]
Wilkens, S. Sports prediction and betting models in the machine learning age: The case of tennis. J. Sports Anal. 2021, 7, 99–117. [Google Scholar] [CrossRef]
De Seranno, A. Predicting Tennis Matches Using Machine Learning. Master’s Thesis, Ghent University, Ghent, Belgium, 2020. [Google Scholar]
Yue, J.C.; Chou, E.P.; Hsieh, M.H.; Hsiao, L.C. A study of forecasting tennis matches via the Glicko model. PLoS ONE 2022, 17, e0266838. [Google Scholar] [CrossRef] [PubMed]
Solanki, S.; Jakir, V.; Jatav, A.; Sharma, D. Prediction of tennis match using machine learning. Int. J. Progress. Res. Eng. Manag. Sci. 2022, 2, 5–7. [Google Scholar]
Gupta, M.; Gupta, R.; Ansari, S.; Tambe, D.N. Tennis Match Prediction using Machine Learning. Int. J. Res. Publ. Rev. 2022, 3, 693–696. [Google Scholar]
Hawk-Eye Innovations. Hawk-Eye’s Accuracy & Reliability: Electronic Line Calling; Hawk-Eye Innovations: Basingstoke, UK, 2015. [Google Scholar]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014. [Google Scholar] [CrossRef]
Altmann, A.; Toloşi, L.; Sander, O.; Lengauer, T. Permutation importance: A corrected feature importance measure. Bioinfmatics 2010, 26, 1340–1347. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4765–4774. [Google Scholar]
Molnar, C. Model-Agnostic Methods. In Interpretable Machine Learning; Leanpub: Victoria, BC, Canada, 2020; pp. 143–221. [Google Scholar]
Matsuzaki, C. Tennis Fundamentals; Human Kinetics: Champaign, IL, USA, 2004; p. 91. [Google Scholar]

Figure 1. Pipeline workflow architecture.

Figure 2. Angle of the serve. δ: Serve angle; A: Position of the server when hitting the serve; B: ball direction; C: central point of the service box line.

Figure 3. Effectiveness of type 1 vs. type 4 in the DEUCE side.

Figure 4. Effectiveness of type 1 vs. type 4 in the ADVANTAGE side.

Figure 5. Effectiveness of type 1 vs. type 4 in BOTH sides.

Table 1. Sklearn classification report type 1 vs. type 4—deuce and advantage.

Effectiveness	Precision	Recall	F1-Score	Macro Average	Weighted Average
Type 1	0.91	0.99	0.95	0.95	0.94
Type 4	0.99	0.89	0.94	0.95	0.94

Table 2. Descriptive values of the variables.

			dL				Serve Angle				Speed
	Effectiveness	Population (%)	MaxLike (m)	Percentils (m)	Mean	SD	MaxLike (deg)	Percentils (deg)	Mean	SD	MaxLike (km h⁻¹)	Percentils (deg)	Mean	SD
	Type 1	9.68	0.19	0.04–1.41	0.41	0.41	5.78	2.04–7.78	5.76	1.52	179.86	159.01–210.86	185.72	16.85
DEUCE	Type 2	25.15	0.63	0.21–1.93	0.95	0.54	4.92	0.87–6.31	3.71	1.76	186.24	160.89–205.72	184.20	14.15
SIDE	Type 3	19.58	0.59	0.22–1.97	1.02	0.57	4.37	0.76–6.19	3.44	1.77	184.82	159.01–204.54	183.13	14.31
	Type 4	35.54	0.77	0.28–1.93	1.05	0.52	4.67	0.86–6.13	3.45	1.70	180.57	156.56–202.56	180.39	14.32
	Type 1	9.17	0.18	0.03–1.4	0.40	0.42	5.85	2.11–7.51	5.62	1.43	193.91	162.58–211.39	189.23	16.44
ADVANTAGE	Type 2	24.34	0.52	0.18–1.95	0.99	0.57	5.06	0.84–6.32	3.66	1.78	189.45	155.56–206.65	185.25	14.67
SIDE	Type 3	19.03	0.49	0.19–1.97	1.00	0.59	4.79	0.77–6.42	3.69	1.85	186.48	155.56–204.71	183.03	15.26
	Type 4	37.33	0.78	0.27–1.95	1.07	0.54	4.56	0.86–6.04	3.46	1.67	187.22	153.23–203.38	181.26	15.37
	Type 1	9.46	0.18	0.03–1.41	0.40	0.41	5.79	2.08–7.69	5.70	1.47	193.91	160.65–211.20	187.32	16.75
BOTH	Type 2	24.78	0.61	0.19–1.94	0.97	0.58	5.05	0.84–6.32	3.69	1.77	187.22	160.26–206.36	184.68	14.40
SIDES	Type 3	19.34	0.52	0.2–1.97	1.01	0.58	4.59	0.77–6.25	3.56	1.81	185.73	157.61–204.66	183.08	14.76
	Type 4	36.38	0.76	0.28–1.95	1.06	0.53	4.63	0.86–6.08	3.46	1.69	187.22	155.33–202.94	180.81	14.76

Table 3. Results of the predictions using the trained model.

Variable	Side	Values (Min–Max)	Effectiveness before (%)	Effectiveness after (%)
dL (m)	DEUCE		33.05	83.40
	AD	0–0.28	30.64	81.26
	BOTH		33.49	88.48
Serve angle (deg)	DEUCE		33.70	88.68
	AD	5.7–8.7	29.77	88.21
	BOTH		33.25	89.04
Speed (km h⁻¹)	DEUCE		35.17	45.75
	AD	187–220	34.36	44.13
	BOTH		30.38	39.91

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Vives, F.; Lázaro, J.; Guzmán, J.F.; Martínez-Gallego, R.; Crespo, M. Optimizing Sporting Actions Effectiveness: A Machine Learning Approach to Uncover Key Variables in the Men’s Professional Doubles Tennis Serve. Appl. Sci. 2023, 13, 13213. https://doi.org/10.3390/app132413213

AMA Style

Vives F, Lázaro J, Guzmán JF, Martínez-Gallego R, Crespo M. Optimizing Sporting Actions Effectiveness: A Machine Learning Approach to Uncover Key Variables in the Men’s Professional Doubles Tennis Serve. Applied Sciences. 2023; 13(24):13213. https://doi.org/10.3390/app132413213

Chicago/Turabian Style

Vives, Fernando, Javier Lázaro, José Francisco Guzmán, Rafael Martínez-Gallego, and Miguel Crespo. 2023. "Optimizing Sporting Actions Effectiveness: A Machine Learning Approach to Uncover Key Variables in the Men’s Professional Doubles Tennis Serve" Applied Sciences 13, no. 24: 13213. https://doi.org/10.3390/app132413213

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimizing Sporting Actions Effectiveness: A Machine Learning Approach to Uncover Key Variables in the Men’s Professional Doubles Tennis Serve

Abstract

1. Introduction

2. Materials and Methods

2.1. Sample

2.2. Instruments

2.3. Procedure

2.3.1. Experimental Setup

2.3.2. Data Processing

Dataset

EDA and Data Processing

2.3.3. Training the Deep Learning Model

2.3.4. Feature Importance Algorithms for Feature Explanation

2.3.5. Probabilistic and Statistical Analysis: Values Selection Based on Maximizing the Desired Effectiveness and Minimizing the Undesired One

2.3.6. Synthetic Dataset Generation and Simulation Based on the Predictions on the Dataset and Analysis of Improvements in Effectiveness

3. Results

3.1. Training of a Deep Neural Network

3.2. Calculation of the Most Relevant Variables

3.3. Values That Maximize the Effectiveness of Type 1 (and Minimize the Effectiveness of Type 4): That Is, Recommended Values

3.4. Generation of a Dataset Substituting Original Values for Recommended Values of the Variable in Question and Prediction of Efficiencies (the Model Trained in Point 1 Was Used to Make the Predictions)

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI