The analysis of primary care data plays an important role in understanding health at an individual and population level. Currently the utilization of computerized medical records is low due to the complexities, heterogeneities and veracity associated with these data. We present a deep learning methodology that clusters 11,000 records in an unsupervised manner identifying non-linear patterns in the data. This provides a useful tool for visualization as well as identify features driving the formation of clusters. Further analysis reveal the features that differentiate sub-groups that can aid clinical decision making. Our results uncover subsets that contain the highest proportion of missing data, specifically Episode type, as well as the sources that provide the most complete data.
Keywords: Episode type; computerized medical record; deep learning; general practice; visualization.