•  
  •  
 

Abstract

INTRODUCTION: Injuries are a significant concern in athletic populations, limiting performance, shortening careers, and negative impact on health. There is growing evidence that intrinsic factors such as body composition and biomechanical characteristics are associated with injury risk, with metrics like fat and muscle distribution linked to injury likelihood across multiple sports. Moreover, recent research has demonstrated that predictive models incorporating physiological and biomechanical variables can effectively estimate injury risk, supporting the feasibility of data-driven injury risk prediction in sport science. PURPOSE: The purpose of this study was to assess whether body composition and biomechanical measurements can be used within a machine learning framework to predict short-term injury risk and provide individualized, body-region–specific injury risk estimates for athletes. METHODS: A retrospective dataset consisting of 1,258 NCAA Division I athletes' records (males = 825; females = 433; height = 178.7 ± 10.37 cm; weight = 86.9 ± 23.32 kg) was analyzed, of which 246 cases involved an injury occurring within 180 days of assessment. Scans were collected between August 8, 2022, and December 9, 2025. Body composition (DXA) and biomechanical variables (DARI® Motion Analysis System, YBT - balance test) were used as model inputs, and a two-stage machine learning framework was implemented: a binary classifier to predict overall injury risk within 180 days, followed by a multiclass classifier to estimate body-region–specific injury risk among injured athletes using CatBoost (Categorical Boosting). Model performance was evaluated using area under the receiver operating characteristic curve (AUC) for injury prediction and class-based performance metrics for body region estimation, with a hold-out test set reserved for final evaluation. RESULTS: The injury risk prediction model achieved an area under the receiver operating characteristic curve (AUC) of 0.74, indicating good discrimination between injured and non-injured athletes. For injured athletes, the body-region prediction model achieved a top-1 accuracy of 50.0%, with performance improving to 62.5% and 77.1% when the true injury location was required to be within the top two and top three predicted regions. CONCLUSION: Machine learning models utilizing body composition and biomechanical data can reliably estimate short-term injury risks in athletes and provide body-region–specific risk profiles. Although precise prediction of a single injury location remains challenging, particularly in the presence of class imbalance and limited sample sizes, ranking-based body region risk estimates show substantial promise for supporting injury prevention strategies and individualized athlete monitoring.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.