Abstract
The National Science Foundation has recently released a Dear Colleague Letterurging more attention to the topics of reproducibility and replicability in science.
Building on prior research, we examine the effects of sampling bias in simulated
reproducibility rates in a toy example centered around linear regression models.
A number of different statistical parameter values are tested under simulation, in
which we find that true reproducibility rates are increased or decreased based upon the
amount of sampling bias present and the explanatory variables which sampling bias
is proportional to. The effects of noise in data and values of regression coefficients are
also analyzed. We find that when the sampling bias is proportional to an explanatory
variable present in all linear models under consideration the effect of sampling bias is
negligible, but when bias is proportional to an explanatory variable present in only a
portion of models under consideration we see effects on reproducibility rates.