Skip to Main Content

R and Data Mining: Finding & Storing Data

These resources have been compiled to address the needs of students enrolled in HINF. 5008: Computational Methods in Health Informatics.

More Places to Find and Store Data

General Data Sources

  • Re3Data Most comprehensive list to date of research data repositories with downloadable datasets. Browse by subject, content type, and country
  • DataBib List of research data repositories with downloadable datasets organized by subject specialty. The list can be ordered A-Z as well.
  • Data.gov As a priority Open Government Initiative, Data.gov aims to increase the ability of the public to easily find, download, and use datasets that are generated and held by the Federal Government. 
  • Open Data Sites via Data.gov Complete list of open data cites organized geographically. Full list can be downloaded as CSV or Excel file
  • US Census Data Access Tools Data downloads, tools, and resources provided by the US Census Bureau
  • DataMarket An open portal for downloading, uploading, sharing and exploring datasets.
  • The World Bank Data Catalog Listing of available World Bank datasets, including databases, pre-formatted tables, reports, and other resources.
  • Dryad International repository of data underlying scientific and medical publications.
  • FigShare Research repository where users can make all of their research outputs available in a citableshareable and discoverable manner. 
  • Mendeley Data Multidisciplinary data repository

 

Healthcare Specific Data Sources

  • HealthData.gov Datasets and statistics. Site managed by the U.S. Department of Health & Human Services
  • European Bioinformatics Institute EMBL-EBI provides freely available data from life science experiments, performsbasic research in computational biology and offers an extensive user training programme, supporting researchers in academia and industry.
  • Public Access Health Data compiled by the Emory University Health Sciences Library
  • Sanger Institute As one of the largest sequencing centres in the world for more than 15 years, the Wellcome Trust Sanger Institute has produced more than 100 finished genomes. All can be accessed from the link here.
  • UCSC Genome Browser This page contains links to sequence and annotation data downloads for the genome assemblies featured in the UCSC Genome Browser. Table downloads are also available via the Genome Browser FTP server.
  • NCBI National Center for Biotechnology Information dataset downloads and software
  • NHDS National Hospital Discharge Survey conducted annually from 1965-2010. National probability survey designed to meet the need for information on characteristics of inpatients discharged from non-Federal short-stay hospitals in the United States.
  • SEER Data Cancer data provided by the Surveillance, Epidemiology, and End Results Program including incidence and population data associated by age, sex, race, year of diagnosis, and geographic areas. 
  • VirtualRDC @ Cornell Access to synthetic data constructed to statistically approximate the data available within the secure and restricted access environment of the Census Bureau's Research Data Centers (RDCs)
  • CDC Data and Statistics Downloadable datasets searchable by topic provided by the Centers for Disease Control and Prevention
  • CDC Public-Use Data Files Downloadable public-use data files provided by the National Center for Health Statistics through the Centers for Disease Control and Prevention's (CDC) FTP file server. Access data sets, documentation, and questionnaires from NCHS surveys and data collection systems.
  • World Health Organization Data & Statistics  WHO's portal for giving access to data and analyses for monitoring global health
  • ClinicalTrials.gov Registry and results database of publicly and privately supported clinical studies of human participants conducted around the world
  • ANDI Alzheimer's Disease Neuroimaging Initiative Data from the North American ADNI’s study participants, including Alzheimer’s disease patients, mild cognitive impairment subjects and elderly controls, are available from this site to researchers worldwide.