Data Curation Core


The overall goal of the Data Curation Core (DCC) is to establish and maintain scalable cyberinfrastructure capabilities that promote interdisciplinary collaboration, interoperability, and sustainability. The cyberinfrastructure supports computational and bio-statistical tools, models and analytics used by the Data Analytics Core (DAC) to identify the complex etiologic and mediating/moderating relationships and spatial/temporal mechanisms and pathways that link genetics, omics, behavior, environmental exposures, and social determinants.



Aim 1.   To establish a data governance structure that tracks data acquisition, curation, storage, quality, version control, compliance, access and usage of data.

Aim 2.  To curate and manage the data cyberinfrastructure.

Aim 3.  To promote use of the data cyberinfrastructure resources among internal and external investigators.

Data Storage. The linked data sets will be stored within a customized, secure cloud environment in a data lake maintained by the MMC Data Science Center. This will provide a HIPAA compliant workspace where data can be shared and analyzed securely among investigators. Access can be tightly controlled to the variable level and a complete audit trail is available. Unlimited storage will be made available by the MMC Data Science Center to investigators without charge (see Attachment 4: letter of support from President Hildreth). In addition, ESRI ArcView, SAS, and other statistical software packages will be run within the secure data lake, cloud environment.

Quality Assurance. The collection, evaluation and use of environmental data for these research projects will conform with ANSI/ASQC E4-1994, Specifications and Guidelines for Quality Systems for Environmental Data Collection and Environmental Technology Programs.110 This standard provides a basis for planning, implementing, documenting, and assessing an effective quality system. This applies to all measurements and information in the PHE data base that describe environmental processes, locations, or conditions; ecological or health effects and consequences; and the performance of environmental technology. The following individuals will be responsible for quality assurance and quality control aspects of the identified data bases. Each will be responsible for documenting the activities and oversight processes used to ensure the appropriate type, amount, and quality of data: PM2.5 (Mohammad Al-Hamdan, PhD), Heat Metrics (Bill Crosson, PhD), Public health Exposome (Wansoo Im, PhD), SCCS: surveys, linkage with Medicaid, Medicare, and Death Index files (Mike Mumma, PhD), and linkage of databases for computational analysis (Gary Rogers, PhD).



Paul D. Juarez, PhD.
Director, Data Curation Core
Department of Family & Community Medicine, Meharry Medical College
Phone: 615 327-6992

Wansoo Im, PhD.
Data Science Officer for the Center, Meharry Medical College

Chris Sohn, BS, Data Curator, Meharry Medical College


Data Curation Committee

Paul Juarez, PhD, co-director
Wansoo Im, PhD, co-director
Patricia Cifuentes, PhD
Darryl Hood, PhD
Michael Langston, PhD
Cynthia Colen, PhD
Mohammad Al-Hamdan, PhD