Methodology for harmonization and integration

Strategy of harmonization

Prospective harmonization
Typically used in multi-center studies, this strategy imposes strict standards and protocols from the beginning. All cohort studies share the same study design, survey, meta-data, etc. Some adaptations may occur for individual data collection sites, but the goal is to maintain comparability.
Ex-ante retrospective harmonization
This strategy combines data from cohort studies that were not specifically designed to be comparable, but they used standard collection tools and standard operating procedures permitting data to be easily integrated.
Ex-post retrospective harmonization
This strategy combines data from cohort studies that were not specifically designed to be comparable, but no standard formats or protocols were used in general. Data can anyway be assessed and edited to achieve commonality through data processing procedures.

Data processing methods

Algorithmic
Harmonize the same measures (continuous variables, categorical, or both) with different but combinable ranges or categories.
Calibration
Harmonize to the same metric measure.
Standardization
Harmonize the same constructs measured using different scales with no known calibration method or bridging items.
Latent variable model
Harmonize the same constructs measured using different scales with no known calibration method but with bridging items present.
Multiple imputation
Harmonize datasets (and not variables) with the same set of variables using bridging variables.

Types of infrastructure

Data are centrally located
Data from all studies are stored on the same server.
Data are in different locations
Data from each study is stored in their local server. Each study imposes its data restrictions.
Some centrally, other locally
Some studies share their datasets to be stored in the same server, and other studies store their datasets in their local server.

Integrative data analysis

Meta-analysis
Combines the results of multiple studies addressing the same variable.
Pooled-analysis
Analyses can be carried out at individual-level after pooling data.
Federated analysis
Centralized analysis with individual-level data remaining on their local servers.

Software

OBiBa (Opal/Mica)
OBiBa software suite (obiba.org), developed by Maelstrom Research (maelstrom-research.org) and Epigeny (epigeny.io), includes advanced software components enabling data harmonization and federation for study networks that aim to harmonize and share data securely among their members.
DataShield
DataSHIELD is a method that enables advanced statistical analysis of individual-level data from several sources without actually pooling the data from these sources together (datashield.ac.uk).
Molgenis
Molgenis is an open-source web application to collect, manage, analyze, visualize and share large and complex biomedical datasets (https://www.molgenis.org/)
CharmStats
CharmStats allows you to work with your variables, to document the process as you go and even electronically publish your completed harmonization for review and citation (gesis.org/en/services/data-analysis/data-harmonization).
R / Rmarkdown
R is a free software environment for statistical computing and graphics (r-project.org). R-markdown turns the R analyses into reproducible documents (rmarkdown.rstudio.com).
Stata
Stata is a statistical software for data management, statistical analysis, graphics, simulations, regression, and custom programming (stata.com).
SAS
SAS is a statistical software suite for data management, advanced analytics, multivariate analysis, business intelligence, criminal investigation, and predictive analytics (sas.com).
SPSS
SPSS is a software platform that offers advanced statistical analysis, a vast library of machine learning algorithms, text analysis, open-source extensibility, integration with big data and seamless deployment into applications (ibm.com/analytics/spss-statistics-software).