Abstract
Assessment of body composition and bone mineral density via dual-energy X-ray absorptiometry (DXA) is commonly used in sports medicine, orthopedic, wellness, and research settings, whereby individualized reports are generated that can be uploaded to individual patient charts or entered into datasets for the purpose of analyzing trends or differences between population groups or within groups over time as a result of monitoring or interventions (nutritional, pharmacologic, therapeutic, activity). However, for large sample sizes, manual data entry from individual reports can be cumbersome and subject to human error. Exporting raw data directly from the DXA machine also presents challenges. PURPOSE: To develop a software algorithm, implemented in Python, that utilizes optical character recognition for rapid extraction of key DXA parameters from standardized enCORE-generated reports specific to the iDXA (GE®) system. This algorithm organizes and prepares datasets for statistical analysis and reporting for any desired sample size. METHODS: The software algorithm was developed in Python and implemented in a Streamlit-based interface, integrating pdfplumber for text extraction and regular expressions for structuring key DXA parameters. These included patient demographics, total body composition metrics (lean mass, fat mass, and bone mineral density), and regional analysis values for arms, legs, trunk, android, and gynoid regions. The algorithm was initially tested on 10 DXA reports to assess extraction accuracy. After validation, testing expanded to 50 reports, refining efficiency, and reducing errors. The system was then scaled to process 100 reports to ensure robustness for larger datasets. Optimization efforts focused on handling formatting inconsistencies, improving text recognition, and automating data categorization. Extracted values were structured into an Excel sheet with labeled sections for seamless integration into statistical workflows. RESULTS: The algorithm achieved an accuracy rate exceeding 99% in extracting DXA parameters, significantly reducing manual entry errors. Processing time per report was reduced from an average of 3 minutes for manual input to approximately 15 seconds using the automated system. The system successfully handled large datasets efficiently without accuracy degradation. Additionally, the automated extraction and formatting process streamlined data organization, allowing immediate integration into statistical analysis workflows. CONCLUSION: The Python-based software developed here allows for body composition and bone densitometry reports produced by the iDXA system to be rapidly extracted and organized, reducing manual input error and accelerating analyses within large sample sizes. Efforts are now ongoing to integrate this algorithm into a web-based platform, enabling DXA report uploads and dataset organization to be further customized.
Recommended Citation
Rivera, Ana S.; Ankersen, Jordan; and Lambert, Bradley
(2025)
"Extraction and Organization of Densitometry Data from Imaging Reports Produced by the GE iDXA System Using Optical Character Recognition,"
International Journal of Exercise Science: Conference Proceedings: Vol. 2:
Iss.
17, Article 108.
Available at:
https://digitalcommons.wku.edu/ijesab/vol2/iss17/108
Included in
Health and Physical Education Commons, Medical Education Commons, Sports Sciences Commons