Rheumatology Research Center, Tehran University of Medical Sciences, Tehran, Iran & Clinical Research Development Unit, Shariati Hospital, Tehran University of Medical Sciences, Tehran, Iran
Abstract: (52 Views)
The presence of missing data is regarded as one of the most common and frequently unavoidable challenges in data science and clinical research. This issue may adversely affect the accuracy, internal validity, and interpretation of research findings. In this context, an in-depth understanding of datasets enables health data analysts to implement strategies aimed at preventing and minimizing missing data during the design and conduct phases of a study. Nevertheless, owing to the inherent nature of clinical research, incomplete data remain unavoidable, thereby necessitating the use of practical and robust approaches for managing missing data. This article reviews the primary methods for addressing missing data and presents various missing-data mechanisms and patterns, as well as the proportion of missing data that may be considered ignorable. Finally, through an example based on a hypothetical dataset related to rheumatoid arthritis, one of the most widely used approaches for imputing missing data—multiple imputation by chained equations—is introduced. The corresponding codes are implemented and interpreted using the mice package in R software. Researchers with varying levels of expertise in biostatistics and R software can, provided that the relevant assumptions are met, apply the codes included in this article to estimate missing data in their own research datasets.
Madreseh E, Hosseingholizadeh N, Akhlaghi M, Alikhani M, Sadeghi S. Handling Missing Data in Clinical and Medical Research: Concepts, Challenges, and Implementation in R Software. Journal title 2025; 1 (3) : 7 URL: http://idap.ir/article-1-52-en.html