Skip to Main Content

Grant Writing: Resource Descriptions: WCM Technology, Computing, and IT Services

Computing for Researchers at Weill Cornell Medicine (WCM)

ITS services for researchers are provided by three core groups: Research Administrative Computing (RAC), the Scientific Computing Unit (SCU), and Research Informatics (RI). SCU and RI also work closely with colleagues at the Center for Advanced Computing (CAC) and Cornell Information Technologies (CIT) on our Ithaca campus as well as the Advanced Computing (AC) division at our Qatar campus. The Cornell University CyberCommons (https://academicintegration.cornell.edu/cybercommons) brings together all university research-related resources into one central website to allow researchers to easily find the IT services they need.

Infrastructure

Located on the Upper East Side of New York City, the Weill Cornell Medicine (WCM) Belfer Research Building (BRB) Data Center provides an enterprise hosting facility for the conduct of research. The 4,000-square-foot state-of-the-art facility employs resilient architecture to lessen downtime, maintains an enhanced cooling environment, and is generator-supported to ensure continuous power.  Our staff employs innovative monitoring, disaster recovery plans, backup systems, and other measures to protect services and ensure that all systems are running smoothly.

 

The Network Operations Center (NOC) is responsible for overseeing the day-to-day operations of all WCM data centers. The NOC is open 24 hours a day with various shifts, ensuring a manager is always onsite to supervise any scheduled work and perform quality checks. All requests to access the facility are carefully screened, with visitors always escorted to maintain strong security.

 

  • Cloud Services

ITS provides cloud infrastructure services via secured offerings with Amazon (AWS), Google (GCP), and Microsoft (Azure). ITS wraps the raw commercial services with WCM identity and security tools to allow secure computing in the cloud. In addition to these services, ITS also provides consulting services as needed to help researchers take advantage of cloud offerings and configure them to meet their computational needs. ITS has a dedicated network connection to AWS and VPN tunnels to GCP and Azure.

  • Network

The WCM network has two primary connections to the internet, as well as an Internet2 connection to enable high-speed connectivity to peer research institutions. Our network connections to our major institutional collaborators allow for a virtual campus between The Rockefeller University, Hunter College, Cornell University (Ithaca campus), Weill Cornell Medicine in Qatar, NewYork-Presbyterian Hospital, Memorial Sloan Kettering Cancer Center, and the Hospital for Special Surgery.

  • Server Infrastructure (on-premises)

 Through multiple data centers on campus, WCM ITS provides managed physical and virtual hosting services for researcher compute infrastructure. This includes scalable, tiered, storage arrays to accommodate diverse researcher needs. Disaster recovery services are provided through multiple data centers in NYC as well as through our Ithaca campus. Cloud-based infrastructure further supplements our disaster recovery capabilities.

Workplace Productivity

  • Collaboration Services

Researchers have access to a variety of tools to enable collaboration with their peers. Zoom provides video-conferencing capabilities on and off campus. Microsoft Office 365 tools are extensively used on campus to enable instant messaging, voice calls, group editing of documents, and secure file sharing. Microsoft Teams and SharePoint allow for more extensive group collaboration.

  • Remote Access

All WCM’s critical tools for workplace productivity can be accessed securely from off-campus. Many research-specific applications are accessible remotely through our Data Core service. Library journals and other publications can also be accessed from off campus.

Security

WCM uses an extensive set of tools to secure researchers’ desktop and mobile devices, as well as institution-wide tools for network and server environments. Combined with user training on HIPAA regulations and best practices to secure their data, WCM has built a strong ethos around secure data management. Users are provided with tools (e.g., password management) to help them better manage their security. Anti-spam, anti-virus, and anti-malware tools further protect users from malicious attacks on their personal devices. WCM provides single sign-on and multi-factor authentication for users combined with federation services with partner institutions. Institutional identities are managed through a central enterprise directory.

 

WCM’s cloud and on-premises infrastructure are bolstered by intrusion prevention and detection tools integrated with next-generation firewalls. Data loss prevention services monitor and prevent the egress of sensitive data. Server logs are analyzed in near real-time to monitor and prevent attacks on infrastructure.

Research Administrative Computing (RAC)

Research Administrative Computing (RAC), a division within ITS, works with several central research offices to provide a streamlined system. The Weill Research Gateway (WRG) is the hub of both vendor and homegrown applications for the management of research funding, compliance, and integrity. WRG contains application forms, management tools, and integrations, and provides reporting for grants and contracts, IRB, IACUC, safety committees, and clinical trials. Additionally, WRG contains custom applications for transparency and tracking where vendor systems lack these options.

Data Core

The Data Core is a secure, scalable, computing and storage environment managed by WCM ITS. Within a Data Core project, users can share access to a collection of data sets and process the data with a variety of software tools, while meeting appropriate regulatory requirements to protect sensitive and confidential data. The Data Core is useful for analysis of both secure and non-secure data by research teams and can be configured to allow access for external collaborators. To provide additional protections for sensitive research data, the Data Core provides a secure enclave with scalable virtual Windows environments which researchers (with IRB approval) can use to collaborate with peers. GPU and Linux services are also available. Data Core projects are actively monitored by a dedicated staff of data management specialists, to ensure project operations comply with all established governance requirements.

 

A Data Catalog sits on top of the Data Core to allow researchers throughout the institution to see what data sets are available, along with information on how and who to contact for potential collaborations. The WCM Data Archive is being added to the Data Core service for long-term storage of research data that may be useful later for sharing.

Scientific Computing Unit (SCU)

Research-specific data systems offer two multi-petabyte Data-Direct Network (DDN) storage units, one with Lustre and another with a GPFS filesystem. Data are served to users via a Mellanox EDR Infiniband network infrastructure with uplinks at 100GBs. Analytical systems include GPU clusters using a variety of NVIDIA cards, P100, V100, RTX6000, and traditional CPU-based clusters, offering mid- to high-memory computation. Automatic computational capacity extension is currently available as “burst” to Cornell University’s Red Cloud, and to AWS.  

Computational research, analytics, application installation, cluster configurations, storage deployments, cluster job and data management are supported by a dedicated staff from WCM’s Scientific Computing Unit. To maintain the highest level of technical skill, our staff participates in regular cross-training exercises with their Cornell Ithaca CAC (Center for Advanced Computing) and Qatar HPC counterparts.

Research Informatics

The Weill Cornell Medicine (WCM) Research Informatics (RI) team, which consists of twenty FTE as of July 2021, helps investigators obtain electronic patient and research data to support clinical and translational science.  Analysts, engineers, and project managers from WCM RI work with clinicians, basic scientists, study coordinators, and biostatisticians to optimize electronic data strategies with respect to study design, regulatory approval, and budget needs.

 

Source: Thomas Campion, PhD, Director, Research Informatics

Date: June 21, 2021

Secondary Use of Patients’ Electronic Records (SUPER)

The Weill Cornell Medicine (WCM) Research Informatics (RI) warehouse platform, Secondary Use of Patients’ Electronic Records (SUPER), aggregates and transforms clinical, billing, and research data from Weill Cornell Physicians, NewYork-Presbyterian Hospital, and numerous internal and external Next Generation Sequencing laboratories and healthcare institutions.  SUPER consists of Microsoft SQL Server 2016 database servers with up to 50 virtual central processing units (CPUs) and 150 terabytes of storage, plus RedHat Enterprise Linux virtual machines supporting software development using Java, Python, PHP and other programming languages.  The WCM RI team, which consists of twenty FTE as of July 2021, adheres to best practices for software engineering, including use of version control, code review, and project management methods spanning from Agile to Waterfall.  SUPER provides the electronic infrastructure for the WCM research enterprise.

 

Source: Thomas Campion, PhD, Director, Research Informatics

Date: June 21, 2021

Information Technologies & Services Department (ITS)

The Information Technologies & Services Department (ITS) is the primary group providing secure computing infrastructure for Weill Cornell Medicine (WCM).  ITS provides comprehensive IT infrastructure, management, service, and support for the Weill Cornell Medical College community. With its wide range of expertise in new and emerging technologies, ITS plays a vital role in advancing the institution’s mission in educational technologies, biomedical research, and patient care. With over 400 employees, ITS supports more than 10,000 computer users, over 1,000 servers, and numerous clinical, research, and administrative software applications, as well as WCM’s cloud services in Amazon, Google, and Microsoft. ITS staff's expertise and service offerings span all areas of technology, from ensuring institutional security standards and creating high-speed networking to developing websites and assisting with hardware purchasing.

Architecture for Research Computing in Health (ARCH)

Through Weill Cornell Medicine (WCM) Research Informatics (RI) service, the Architecture for Research Computing in Health (ARCH) program matches investigators with tools and services to obtain EHR data, collect novel measures, and integrate data from disparate sources.  

 

To facilitate patient cohort discovery preparatory to research, i2b2 provides WCM investigators with a self-service tool to query EHR data for more than 3 million patients seen by Weill Cornell physicians.  Structured data available in i2b2 include diagnoses (ICD-9/10), procedures (CPT), laboratory results (LOINC), medications (RxNorm), and tumor registry codes (ICD-O-3) plus allergies, demographics, encounters, family history, social history, vital signs, and other domains. After determining a cohort of interest using i2b2 de-identified data, investigators with IRB approval can request identified medical record numbers (MRNs).   Researchers can also request customized, detailed reports of EHR data from outpatient and inpatient settings through an iterative process with a database analyst.  

 

To support data collection for prospective studies and retrospective chart reviews, REDCap is available free of charge to the WCM community.  A premium version, SUPER REDCap, automatically pre-populates case report forms with data from EHR systems and enables investigators to adjudicate values, saving time and streamlining data collection for study teams.

 

To support big data analytics, ARCH makes the Observational Medical Outcomes Partnership (OMOP) common data model containing EHR data for 3 million patients mapped to standard reference terminologies.  Additionally, to extract clinical concepts from unstructured clinical notes, ARCH employs natural language processing (NLP) using the UIMA-based Leo framework.

 

In addition to local studies, ARCH contributes EHR data for WCM to several multi-institutional data sharing initiatives, including the NCATS Accrual to Clinical Trials (ACT) Network, PCORI-funded INSIGHT Clinical Research Network, TriNetX, NIH All of Us Research Program, TriNetX, and NCATS National COVID Cohort Collaborative (N3C).  

Recognizing that patient data exist across multiple electronic systems and scientists in different disease areas have specific information needs, ARCH provides custom research data repositories (RDRs) to groups of investigators.  RDRs integrate data from disparate clinical and research source systems, and contains data only for patients of interest to an investigator group and has three user interfaces to support scientific workflows--i2b2 for cohort discovery, SUPER REDCap for data collection, and the Observational Medical Outcomes Partnership (OMOP) data model via Microsoft SQL Server Management Studio for data querying and analysis. As of July 2021, twenty RDRs are in production at WCM, including but not limited to Anestheiology, Digestive Care, Myeloproliferative Neoplasms, and Pulmonary.  Notably, the RDR model supported COVID response efforts, including clinical care and 19 peer-reviewed publications.

 

After aggregating and transforming data from disparate systems into a single representation, RDRs make data available to investigators using multiple secure user interfaces. First, a custom version of i2b2 makes all data visible to scientists to discover relationships between variables and query through a point-and-click experience.  Second, SUPER REDCap enables annotation of clinical data from EHR systems and capture of novel measures; the research-grade data will then be made available for query alongside clinical data from EHR systems in the custom i2b2 interface. Third, Microsoft SQL Server Management Studio enables researchers to generate rows and columns of data for export and analysis in a statistical software package such as SAS, Stata, or R. With the ability to run pre-processed “canned” reports or custom SQL queries using data in the OMOP format, Microsoft SQL Server Management Studio provides power and flexibility for scientists to manipulate data. To access the RDR user interfaces, investigators must authenticate using a valid WCM campus-wide identifier (CWID) and have application-level authorization.

 

The WCM ITS Security team oversees identity management, including authentication via Active Directory (AD) and Security Association Markup Language (SAML). The WCM ARCH team controls authorization to specific applications according to standard operating procedures that emphasize the principle of minimum necessary access.  In coordination with the WCM Institutional Review Board and Privacy Office, the ARCH team serves as the Honest Broker of patient data, removing identifiers from protected health information (PHI) in accordance with the HIPAA Privacy Rule Safe Harbor definition.

 

Source: Thomas Campion, PhD, Director, Research Informatics

Date: June 21, 2021