I Introduction
Prediction and classification models trained on a single study often perform considerably worse in external validation than in crossvalidation [1, 2]. Their generalizability is compromised by overfitting, but also by various sources of study heterogeneity, including differences in study design, data collection and measurement methods, unmeasured confounders, and studyspecific sample characteristics [3]. Using multiple training studies can potentially address these challenges and lead to more replicable prediction models. In many settings, such as precision medicine, multistudy learning is motivated by systematic data sharing and data curation initiatives. For example, the establishment of gene expression databases such as Gene Expression Omnibus [4] and ArrayExpress [5] and neuroimaging databases such as OpenNeuro [6] has facilitated access to sets of studies that provide comparable measurements of the same outcome and predictors (even if the original measurements are not be comparable, they can often be made comparable through preprocessing and normalization procedures [7, 8]). For problems where such a set of studies is available, it is important to systematically integrate information across datasets when developing prediction and classification models.
One approach is to merge all of the datasets and treat the observations as if they are all from the same study (for example, see [9, 10]). The resulting increase in sample size can lead to improved training and better performance when the datasets are relatively homogeneous. Also, the merged dataset is often representative of a broader reference population than any of the individual datasets. Xu et al. [9] showed that a prognostic test for breast cancer metastases developed from merged data performed better than the prognostic tests developed using individual studies. Zhou et al. [11] proposed hypothesis tests for determining when it is beneficial to pool data across multiple sites for linear regression, compared to using data from a single site.
Another approach is to combine results from separately trained models. Metaanalysis and ensembling both fall under this approach. Metaanalysis combines summary measures from multiple studies to increase statistical power (for example, see [12, 13]
). A common combination strategy is to take a weighted average of the studyspecific summary measures. In fixed effects metaanalysis, the weights are based on the assumption that there is a single true parameter underlying all of the studies, while in random effects metaanalysis, the weights are derived from a model where the true parameter varies across studies according to a probability distribution. When learners are indexed by a finite number of common parameters, metaanalysis applied to these parameters can be used for multistudy learning, with useful results
[12]. Various studies have compared metaanalysis to merging. For effect size estimation, Bravata and Olkin
[14] showed that merging heterogeneous datasets can lead to spurious results while metaanalysis protects against such problematic effects. Taminau et al. [15] and Kosch and Jung [16] found that merging had higher sensitivity than metaanalysis in gene expression analysis, while Lagani et al. [17] found that the two approaches performed comparably in reconstruction of gene interaction networks. Ensemble learning methods [18], which combine predictions from multiple models, can also be used to leverage information in a multistudy setting. By combining predictions, ensembling leads to lower variance and higher accuracy, and is applicable to more general classes of learners than metaanalysis. Patil and Parmigiani
[19] proposed crossstudy learning, defined as weighted average ensembles of prediction models trained on different studies, as an alternative to merging. They showed empirically that when the datasets are heterogeneous, crossstudy learning can lead to improved generalizability and replicability compared to merging and metaanalysis.In this paper, we provide theoretical guidelines for determining whether it is more beneficial to merge or to ensemble. We consider both lowdimensional and highdimensional linear regression settings by studying merged and crossstudy learners (CSLs) based on ordinary least squares (LS) and ridge regression. We hypothesize a mixed effects model for heterogeneity and show that merging has lower prediction error than crossstudy learning when heterogeneity is low, but as heterogeneity increases, there exists a transition point beyond which crossstudy learning outperforms merging. We characterize this transition point analytically and study it via simulations. We also compare merging and crossstudy learning in practice, using microbiome data.
Ii Problem Definition
We will use the following matrix notation: is the identity matrix, is an matrix of 0’s,
is a vector of 0’s of length
, is the trace of matrix , is a diagonal matrix with along its diagonal, and is the entry in row and column of matrix . Other notation introduced throughout the paper is summarized in Table 1 of Appendix C.Suppose we have comparable studies that measure the same outcome and the same predictors, and the datasets have been harmonized so that measurements across studies are on the same scale. For study , let denote the number of observations, the outcome vector, and the design matrix, where the first column of is a vector of 1’s if there is an intercept. Assume the data are generated from the linear mixed effects model
(1) 
where is the vector of fixed effects, is the design matrix for the random effects obtained by subsetting , is the vector of random effects with and where , is the vector of residual errors with and , and . For , if , then the effect of the corresponding predictor differs across studies, and if , then the predictor has the same effect in each study.
The relationship between the predictors and the outcome in a given study can be seen as a perturbation of the populationlevel effect vector . The degree of heterogeneity in predictoroutcome relationships across studies can be summarized by the sum of the variances of the random effects divided by the number of fixed effects: . We are interested in comparing the performance of two multistudy learning approaches as varies: 1) merging all of the studies and fitting a single linear regression model, and 2) fitting a linear regression model on each study and forming a CSL by taking a weighted average of the predictions.
Learners
For lowdimensional settings where for all , we consider merged and crossstudy learners based on LS. The LS estimator of based on the merged data is
(2) 
where and . The LS estimator based on study is
(3) 
and the LS crossstudy estimator is
(4) 
where .
For highdimensional settings where for some , we consider merged and crossstudy learners based on ridge regression. When , is not invertible, so is not estimable using LS. Ridge regression overcomes the noninvertibility problem [20], which can also arise in lowdimensional settings with highly correlated predictors, by penalizing the norm of the coefficient vector. The predictors are typically standardized prior to fitting the model since ridge regression shrinks all predictors proportionally (for example, Horel and Kennard’s seminal paper [20] assumes the predictors have mean 0 and variance 1). The coefficient estimates based on the standardized data are then transformed back to the original scale. Note that ridge regression is locationinvariant [21], so without loss of generality, we assume that the predictors are scaled but not centered prior to applying ridge regression. We first provide the form of the ridge regression estimators in the case where an intercept is included. Let the scaled versions of and be denoted by and , where are positive definite scaling matrices. If scaling is not necessary or desirable (for example, if the predictors are measured in the same units), then set . Otherwise, let be diagonal with and
equal to the inverse standard deviation of column
of for and let be diagonal with and equal to the inverse standard deviation of column of for . The merged ridge regression estimator of can be written as(5)  
(6) 
where is the regularization parameter and is obtained from by setting , so that the intercept is not regularized [21]. The estimator of from study is
(7)  
(8) 
and the CSL estimator is
(9) 
If there is no intercept, then we set and to be the inverse standard deviations of the first columns of and respectively and replace with in the expressions above.
For simplicity, we assume the weights and the regularization parameters and are predetermined. Note that for linear regression, averaging predictions across studyspecific learners is equivalent to averaging the estimated coefficient vectors across studyspecific learners and then computing predictions. Thus, crossstudy learning based on linear regression is similar to metaanalysis of effect sizes. In particular, when , calculating is equivalent to performing a standard univariate metaanalysis. When , weights each predictor equally in a given study while metaanalytic approaches, which involve either performing separate univariate metaanalyses for each predictor or performing a multivariate metaanalysis (for example, see [22, 23]), do not impose this constraint.
Performance Comparison
Given a test set with design matrix and outcome vector , the goal is to identify conditions under which crossstudy learning has lower mean squared prediction error (MSPE) than merging, i.e.
where the expectations are taken with respect to and is the norm.
Iii Results
We consider two cases for the structure of : equal variances and unequal variances. Let be the distinct values on the diagonal of and let be the number of random effects with variance . In the equal variances case where and for , we provide a necessary and sufficient condition for the CSL to outperform the merged learner. In the unequal variances case, we provide sufficient conditions under which the CSL outperforms the merged learner and vice versa. These conditions allow us to characterize a transition point in terms of between a regime that favors merging and a regime that favors crossstudy learning.
In order to present the results more concisely, let , , , and . Also, let be a matrix where if random effect is the th random effect with variance and otherwise, so that subsets to the columns corresponding to .
Proofs of the results are provided in Appendix A.
i Least Squares
For the LS results, assume we are in a lowdimensional setting where for all .
i.1 Equal Variances
Definition 1.
Define
(10) 
Theorem 1.
Suppose the random effects have equal variances ( for ) and
(11) 
Then if and only if .
By Theorem 1, for any fixed weighting scheme that does not depend on , satisfies Equation 11, and leads to in , represents a transition point from a regime where merging outperforms crossstudy learning to a regime where crossstudy learning outperforms merging. When equal weights are used and is not identical for all , it follows from Jensen’s operator inequality [24] that Equation 11 holds and the numerator of is positive, so the transition point always exists.
Corollary 1.1.
Suppose the random effects have equal variances and there exist positive definite matrices such that as ,
where denotes almost sure convergence. If we set , then
(12) 
For example, suppose all study sizes are equal to , the predictors are independent and identically distributed within and across studies, and , , are positive definite. Then Corollary 1.1 applies with , , and . In the special case where and the predictor follows , the limit becomes
(13) 
and the asymptotic transition point is controlled simply by the variance of the residuals, the variance of the predictor, and the study sample size.
i.2 Unequal Variances
Definition 2.
Define
(14) 
(15) 
Theorem 2.
Suppose
(16) 
Then when .
Suppose
(17) 
Then when .
Corollary 2.1.
Suppose there exist positive definite matrices such that as ,



for
and we set .
If
(18) 
then
(19) 
If
(20) 
then
(21) 
In the unequal variances scenario, we can establish a transition interval such that the merged learner outperforms the CSL when is smaller than the lower bound of the interval and the CSL outperforms the merged learner when is greater than the upper bound of the interval. Note that the conditions and results for the equal variances scenario are special cases of the conditions and results for the unequal variances scenario. When , we have , , and .
i.3 Optimal Weights
It can be shown (see Appendix A) that the optimal weights for the CSL are given by
(22) 
where the weight for study is proportional to the inverse MSPE of the LS learner trained on that study.
In the equal variances setting, . We saw previously that when the weights satisfy Equation 11 and do not depend on , characterizes the value of beyond which crossstudy learning outperforms merging. The optimal weights depend on , so depends on under the optimal weighting scheme. Thus, it is difficult to obtain a closedform expression for the transition point, though numerical methods can be used to solve for the value of such that . In Appendix A, we provide a closedform approximation of the transition point. Note that the transition point under any fixed weighting scheme provides an upper bound for the transition point under the optimal weighting scheme. We also remark that in the special case where , the optimally weighted CSL has the same variance as the estimator from the true mixed effects model. If and are known, then the CSL will always be at least as efficient as the merged learner when , but in practice, and need to be estimated.
ii Ridge Regression
Below, we present results for ridge regression that are applicable to both low and highdimensional settings.
ii.1 Equal Variances
Definition 3.
Define
(23)  
(24)  
(25) 
Theorem 3.
Suppose the random effects have equal variances and
(26) 
Then if and only if .
ii.2 Unequal Variances
Definition 4.
Define
(27) 
(28) 
Theorem 4.
Suppose
(29) 
Then when .
Suppose
(30) 
Then when .
Again, the conditions and results for the equal variances scenario are special cases of the conditions and results for the unequal variances scenario.
iii Interpretation
The covariance matrices of linear regression coefficient estimators can be written as a sum of two components, one driven by betweenstudy variability and one driven by withinstudy variability. For example, for LS we have
where is the matrix such that , and
Since the merged learner ignores betweenstudy heterogeneity, the trace of its first component is generally larger than that of the CSL. However, since the merged learner is trained on a larger sample, the trace of its second component is generally smaller than that of the CSL. The merged and crossstudy learners based on LS are unbiased, so the transition point depends on the tradeoff between these two components. When , Expression 13 shows that having a highervariance predictor favors crossstudy learning over merging, since increasing the variance of the predictor amplifies the impact of the random effect.
Unlike LS estimators, ridge regression estimators are biased as a result of regularization. The transition point for ridge regression depends on the regularization parameters used on the merged and individual datasets. It also depends on the true coefficient vector through the squared bias terms in the MSPEs of the merged and crossstudy learners, so an estimate of is needed to compute the expressions in Theorems 3 and 4. These expressions can vary considerably for different choices of regularization parameters and different values of . We did not provide the asymptotic results for ridge regression as (with the study sizes held constant) because this scenario is not entirely fair to the CSL. For and sufficiently large , the merged learner will be in the lowdimensional setting while the CSL will remain in the highdimensional setting. As , the bias term approaches 0 for the merged learner (assuming ) but not for the CSL, which suggests that when is sufficiently large, merging will always yield lower MSPE than crossstudy learning. Also, due to the squared bias term in the MSPE, it is not straightforward to derive optimal CSL weights for ridge regression.
In general, the transition points for LS and ridge regression depend on the design matrix of the test set. However, the test design matrix drops out when it is a scalar multiple of an orthogonal matrix. For example, this occurs when
.Iv Simulations
We conducted simulations to verify the theoretical results for LS and ridge regression and to compare them to the empirical transition points for three methods for which we could not find a closedform solution: LASSO, single hidden layer neural network, and random forest. We also made performance comparisons with a linear mixed effects model, univariate random effects metaanalyses, and multivariate random effects metaanalysis. We used the R packages
glmnet, nnet, randomForest, nlme, metafor, and mvmeta for ridge regression/LASSO, neural networks, random forests, linear mixed effects models, univariate metaanalyses, and multivariate metaanalyses respectively.We considered four simulation scenarios corresponding to the settings in Theorems 1, 2, 3, and 4. We used 4 training studies and 4 test studies of size 40 for all scenarios. We set . For the lowdimensional settings, we set , , and generated 5 ’s from and 5 from , with 5 of the ’s having random slopes. For the highdimensional settings, we set , , and generated 30 ’s from and 70 from , with 10 of the
’s having random slopes. For each simulation scenario, we fixed the predictor values in the training and test sets and the model hyperparameters. Predictor values were sampled from datasets in the
curatedOvarianData R package [25]. Model hyperparameters were tuned once using 5fold crossvalidation with outcomes generated under . For various values of , including 0 and the theoretical transition point, we generated random slopes, residual errors, and outcomes for each training and test study according to Model (1), then trained and tested the following approaches: linear mixed effects model, random effects metaanalysis of univariate LS estimates, random effects multivariate metaanalysis, and merged learners and CSLs based on LS, ridge regression, LASSO, neural networks, and random forests. For ridge regression and LASSO, the predictors were standardized prior to model fitting. For linear mixed effects, we fit the true model, using restricted maximum likelihood to estimate . For metaanalysis of univariate LS estimates, we used the DerSimonian and Laird method. For multivariate metaanalysis, we used restricted maximum likelihood to estimate , constraining the covariance matrix to be diagonal. LS, linear mixed effects, and metaanalysis were only applied in the lowdimensional setting. We performed 1000 replicates for each value of and estimated the MSPE of each estimator by averaging the squared error across replicates.As seen in Figures 1 and 2, the empirical transition points for LS and ridge regression agree with the theoretical results from Theorems 1 and 3 (similar figures for Theorems 2 and 4 are provided in Appendix B). The methods all have similar empirical transition points except for random forest, which performed considerably worse than all of the other approaches (see Figure 8 in Appendix). The poor performance of random forest could be because the data were generated from a linear model. The univariate metaanalysis approach also performed poorly 8, which is unsurprising because the generating model is a multivariate model. The performance of the other models relative to the data generating model is summarized in Figure 3 for three values of . When , the merged regression learners and multivariate metaanalysis perform as well as or slightly better than the mixed effects model and outperform the CSLs. The merged neural network learner does slightly worse than the regression learners. At the LS transition point, all models perform similarly. Beyond the transition point, the models continue to perform similarly (when heterogeneity is high, all models perform poorly), with the CSLs slightly outperforming the merged learners and multivariate metaanalysis performing as well the mixed effects model. For each of the three values of , LASSO performed best, even slightly outperforming the mixed effects model and multivariate metaanalysis. This is likely because several of the true ’s were close to 0.
V Metagenomics Application
To illustrate in a practical example, we compared the performance of merging and crossstudy learning used datasets from the curatedMetagenomicData R package [26], which contains a collection of curated, uniformly processed human microbiome data. We focused on three gut microbiome studies that measured cholesterol as well as gene marker abundance in stool, restricting to samples from female patients: 1) Qin et al.’s 2012 study of Chinese type 2 diabetes patients and nondiabetic controls ( samples from independent female patients) [27], 2) Karlsson et al.’s 2013 study of middleaged European women with normal, impaired or diabetic glucose control ( samples from independent female patients) [28], and 3) HeintzBuschart et al’s 2016 study of patients with a family history of type 1 diabetes ( samples from 13 female patients) [29]. We used merging and crossstudy learning to train linear regression models to predict cholesterol, calculated the theoretical transition interval, and evaluated the performance of the two approaches.
We considered two scenarios: 1) training on different subsets of the same study and testing on a held out subset, and 2) training on different studies and testing on an independent study. In the first scenario, we randomly split the Qin et al. 2012 samples into five datasets of approximately equal size, using four for training and the remaining one for testing. We used age and the top five marker abundances most correlated with the outcome in the training set as the predictors. In second scenario, we used the Qin et al. 2012 and Karlsson et al. 2013 datasets for training and the HeintzBuschart et al. 2016 dataset for testing. We used age and the top twenty marker abundances most correlated with the outcome in the training set as the predictors. In each scenario, we fit merged and CSL versions of LS and ridge regression. We estimated by fitting a linear mixed effects model using residual maximum likelihood, allowing each predictor to have a random effect. For the CSLs, we used the optimal weights given by Equation 22, plugging in the estimate of . We calculated the theoretical transition bounds from Theorems 2 and 4 and compared them to the estimate of . We evaluated the performance of the models empirically by calculating the prediction error on the test set.
In the first scenario was estimated to be , was , and was , suggesting that merging was expected outperform crossstudy learning. In the test set, the merged versions of LS and ridge regression both had lower prediction error than the respective CSL versions (Figure 4). In the second scenario, was estimated to be , was , and was . In the test set, the CSL versions of LS and ridge regression both had lower prediction error than the respective merged versions (Figure 5).
Vi Discussion
The availability of large and increasingly heterogeneous collections of data for training classifiers is challenging traditional approaches for training and validating prediction and classification algorithms. At the same time, it is also opening opportunities for new and more general paradigms. One of these is crossstudy machine learning via CSLs, motivated by variation in the relation between predictors and outcomes across collections of similar studies. A natural benchmark for these methods is to combine all training studies, to exploit the power of larger training sample sizes. In previous work
[19], merged learners perform better than CSLs in lowheterogeneity settings. As heterogeneity increases, however, our earlier simulations indicated a "transition point" in the heterogeneity scale beyond which acknowledging crossstudy heterogeneity becomes preferable, and the CSLs outperform the merged learners.In this paper, we approached this problem analytically for the first time by characterizing crossstudy heterogeneity using a linear mixed effects model. We derived closedform transition points for standard and ridgeregularized linear regression models. We confirmed the analytic results in simulation and demonstrated that when the data are generated by a linear model, the LS and ridge regression solutions can serve as proxies for the transition point under other learning strategies (LASSO, neural network) for which closedform derivation is difficult. Finally, we estimated the transition point in cases of low and high crossstudy heterogeneity in microbiome data and showed how it can be used as a guide for deciding when and when not to merge studies together in the course of learning a prediction rule.
We focused on deriving analytic results for LS and ridge regression because of the opportunity to pursue closedform solutions. Other widely used methods such as LASSO, neural networks, and random forests are not as easily amenable to a closedform solution, so we used simulations to study the performance of merging versus ensembling for these methods. In our simulation settings, the merged learners based on LS, ridge regression, LASSO, and neural networks had comparable accuracy, as did the corresponding CSLs. The methods all had similar empirical transition points, perhaps as a consequence of their similar performance. An exception is random forest, which did not reach a transition point within the specified heterogeneity levels, and also performed less well in general, as is expected in data generated by linear models. The analytic results for LS/ridge regression could potentially serve as an approximation for other methods that perform comparably, though it is important to consider how the reliability of such an approximation could be affected by the nature of the data and choice of model hyperparameters.
In practice, the analytic transition point and transition interval expressions could be used to help guide decisions about whether to merge data from multiple studies when there is potential heterogeneity in predictoroutcome relationships across the study populations. can be estimated from the training data and compared to the theoretical transition points or bounds for LS and/or ridge regression. Various methods can be used to estimate
, including maximum likelihood and method of momentsbased approaches used in metaanalysis (for example, see
[30]), with the caveat that estimates will be imprecise when the number of studies is small.Under Model (1
), fitting a correctly specified mixed effects model will generally be more efficient than both the merged and crossstudy versions of LS. However, more flexible machine learning algorithms can potentially yield better prediction accuracy than the true model. For example, in the the lowdimensional simulations, the mixed effects model was outperformed by either the merged learner or CSL based on LASSO for most levels of heterogeneity. Moreover, fitting a mixed effects model can be computationally difficult when the number of predictors is large and standard mixed effects models are not appropriate for highdimensional data, though there are methods for penalized mixed effects models
[31, 32, 33, 34].A limitation of our derivations is that they treat the following quantities as known: the subset of predictors with random effects, the CSL weights, and the regularization parameters for ridge regression. In practice, these are usually selected using statistical procedures that introduce additional variability. Furthermore, we obtained closedform transition point expressions for cases where the CSL weighting scheme does not depend on the variances of the random effects. Such weighting schemes are generally be suboptimal (for example, the optimal weights for LS given by Equation 22 depend on ), so the closedform results are based on a conservative estimate of the maximal performance of crossstudy learning. Another limitation is the assumption that the random effects are uncorrelated, which is often not true in practice.
In summary, although this work is predicated upon the assumption that crossstudy heterogeneity manifests through random effects and assumes that weights and regularization parameters are known, we believe it provides a theoretical rationale for multistudy machine learning, and a strong foundation for developing practical rules and guidelines to implement it.
Vii Reproducibility
Code to reproduce the simulations and data application is available at
https://github.com/zoeguan/transition_point
Viii Acknowledgements
Work supported by NIH grants 4P30CA00651651 (Parmigiani) and 2T32CA00933736 (Patil), NSERC PGSD Scholarship (Guan) and NSF grant DMS1810829 (Patil and Parmigiani). We thank Lorenzo Trippa and Boyu Ren for useful discussions.
References
 [1] P. J. Castaldi, I. J. Dahabreh, and J. P. Ioannidis. An empirical assessment of validation practices for molecular classifiers. Briefings in bioinformatics, 12(3):189–202, 2011.
 [2] C. Bernau, M. Riester, A.L. Boulesteix, G. Parmigiani, C. Huttenhower, L. Waldron, and L. Trippa. Crossstudy validation for the assessment of prediction algorithms. Bioinformatics, 30(12):i105–i112, Jun 2014. PMID: 24931973.
 [3] Y. Zhang, C. Bernau, G. Parmigiani, and L. Waldron. The impact of different sources of heterogeneity on loss of accuracy from genomic prediction models. Biostatistics, page kxy044, 2018.
 [4] R. Edgar, M. Domrachev, and A. E. Lash. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic acids research, 30(1):207–210, 2002.
 [5] H. Parkinson, U. Sarkans, N. Kolesnikov, N. Abeygunawardena, T. Burdett, M. Dylag, I. Emam, A. Farne, E. Hastings, E. Holloway, et al. ArrayExpress update—an archive of microarray and highthroughput sequencingbased functional genomics experiments. Nucleic acids research, 39(suppl_1):D1002–D1004, 2010.
 [6] K. Gorgolewski, O. Esteban, G. Schaefer, B. Wandell, and R. Poldrack. OpenNeuro—a free online platform for sharing and analysis of neuroimaging data. Organization for Human Brain Mapping. Vancouver, Canada, page 1677, 2017.
 [7] C. Lazar, S. Meganck, J. Taminau, D. Steenhoff, A. Coletta, C. Molter, D. Y. WeissSolís, R. Duque, H. Bersini, and A. Nowé. Batch effect removal methods for microarray gene expression data integration: a survey. Briefings in bioinformatics, 14(4):469–490, 2012.
 [8] M. Benito, J. Parker, Q. Du, J. Wu, D. Xiang, C. M. Perou, and J. S. Marron. Adjustment of systematic microarray data biases. Bioinformatics, 20(1):105–114, 2004.
 [9] L. Xu, A. C. Tan, R. L. Winslow, and D. Geman. Merging microarray data from separate breast cancer studies provides a robust prognostic test. BMC bioinformatics, 9(1):125, 2008.
 [10] H. Jiang, Y. Deng, H.S. Chen, L. Tao, Q. Sha, J. Chen, C.J. Tsai, and S. Zhang. Joint analysis of two microarray geneexpression data sets to select lung adenocarcinoma marker genes. BMC bioinformatics, 5(1):81, 2004.
 [11] H. H. Zhou, Y. Zhang, V. K. Ithapu, S. C. Johnson, G. Wahba, and V. Singh. When can MultiSite Datasets be Pooled for Regression? Hypothesis Tests, consistency and Neuroscience Applications. arXiv preprint arXiv:1709.00640, 2017.
 [12] M. Riester, J. M. Taylor, A. Feifer, T. Koppie, J. E. Rosenberg, R. J. Downey, B. H. Bochner, and F. Michor. Combination of a novel gene expression signature with a clinical nomogram improves the prediction of survival in highrisk bladder cancer. Clinical Cancer Research, 18(5):1323–1333, March 2012.
 [13] G. C. Tseng, D. Ghosh, and E. Feingold. Comprehensive literature review and statistical considerations for microarray metaanalysis. Nucleic acids research, 40(9):3785–3799, 2012.
 [14] D. M. Bravata and I. Olkin. Simple pooling versus combining in metaanalysis. Evaluation & the health professions, 24(2):218–230, 2001.
 [15] J. Taminau, C. Lazar, S. Meganck, and A. Nowé. Comparison of merging and metaanalysis as alternative approaches for integrative gene expression analysis. ISRN bioinformatics, 2014, 2014.
 [16] R. Kosch and K. Jung. Conducting gene set tests in metaanalyses of transcriptome expression data. Research synthesis methods, 2018.
 [17] V. Lagani, A. D. Karozou, D. GomezCabrero, G. Silberberg, and I. Tsamardinos. A comparative evaluation of datamerging and metaanalysis methods for reconstructing genegene interactions. BMC bioinformatics, 17(5):S194, 2016.
 [18] T. G. Dietterich. Ensemble methods in machine learning. In International workshop on multiple classifier systems, pages 1–15. Springer, 2000.
 [19] P. Patil and G. Parmigiani. Training replicable predictors in multiple studies. Proceedings of the National Academy of Sciences, 115(11):2578–2583, 2018.
 [20] A. E. Hoerl and R. W. Kennard. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1):55–67, 1970.
 [21] P. J. Brown. Centering and scaling in ridge regression.
Comments
There are no comments yet.