The InteractionPoweR Package

This vignette describes the basics of interaction analyses and how to use this package. A more updated version of our discussion of many of the topics listed here can be found in our tutorial paper (open access), which can be found at: https://journals.sagepub.com/doi/full/10.1177/25152459231187531

Introduction

Interaction analyses take the form:

\[ Y \sim \beta_0 + X_1\beta_1 + X_2\beta_2 + X_1X_2\beta_3 + \epsilon \]

Where \(Y\) is the dependent variable, \(X_1\) and \(X_2\) are our independent variables, and our interaction term is \(X_1X_2\). The \(\beta\)s in this equation are our regression coefficients and \(\epsilon\) is our error. Note that in this equation, and throughout the code in this package, we refer to the two interacting variables as \(X_1\) and \(X_2\), as opposed to \(X\) and \(Z\), or \(X\) and \(M\). This is to emphasize that, as far as these simulations are concerned, \(X_1\) and \(X_2\) are interchangeable, and any conclusions about causality (i.e. “moderation”) will rely on the specifics of the variables.

The goal of the power analyses supported by this package is to determine how much power an analysis has to detect whether \(\beta_3\) (the interaction term regression coefficient) is different from 0 at some pre-specified \(\alpha\) value (alpha, or p-value). \(\alpha\) refers to our false positive rate, which is how frequently we will accept that our analysis has incorrectly rejected the null hypothesis. Power, refers to our true positive rate, how frequently do we want to correctly accept the alternative hypothesis? It may be easier to think about the inverse of power, the false negative rate: how frequently will we incorrectly accept the null hypothesis? A “typical” (though not necessarily recommended) value for power is 0.8, which means that 20%, or \(1/5\), of the time, the analysis will incorrectly conclude that there is no effect when there actually is one. (We recommend striving for at least a power of .9)

Effect sizes

Any simulation in this package requires at minimum 5 input variables:

N: the sample size.
r.x1.y: the Pearson’s correlation (\(r\)) between \(X_1\) and \(Y\)
r.x2.y: the Pearson’s correlation (\(r\)) between \(X_2\) and \(Y\)
r.x1.x2: the Pearson’s correlation (\(r\)) between \(X_1\) and \(X_2\)
r.x1x2.y: the Pearson’s correlation (\(r\)) between \(X_1X_2\) and \(Y\) - this is the interaction effect

It is important to emphasize here that inputs 2-5 are the population-level Pearson’s correlation between each pair of variables. This is true even if any or all of the variables in the simulation are binary. These correlations are used to derive the regression coefficients via path tracing rules. Also note that these effect sizes are the cross-sectional correlation. This is in contrast to how one specifies effects for experimental manipulations, where the effects are the correlation in each of the experimental conditions. The Pearson’s correlation is equivalent to the effect size \(\beta\) (i.e. in the regression \(Y \sim \beta_0 + X\beta + \epsilon\)) when both \(Y\) and \(X\) are normalized (mean = 0, sd = 1). For inputs 2-4, we imagine that it will be relatively straightforward for most users to identify the appropriate values (i.e. by surveying the literature and identifying large independent studies where the effects have been reported). However, in the case of the interaction effect size, r.x1x2.y, users may be less used to thinking about interaction effects as correlations. The interaction effect size is the how much the correlation between one of the two independent variables and the dependent variable changes when conditioned on the other independent variable. It is both how much \(corr(X_1,Y)\) changes when conditioned on \(X_2\), and equivalently how much \(corr(X_2,Y)\) changes when conditioned on \(X_1\).

A common way of thinking about interaction effect sizes is to plot the data as “simple slopes”. A simple slopes plot shows the correlation between one of the independent variables (e.g. \(X_1\)) and the dependent variable (\(Y\)) in different subsets of the data, where each subset is defined by their value at the second independent variable (e.g. \(X_2\)). For example, we can plot \(Y \sim X_1\) separately in participants with an \(X_2\) value in the lower-half of the distribution and the upper-half of the distribution.

Simulating single data sets

To facilitate user’s understanding of interaction effect sizes, InteractionPoweR includes functions for simulating single data sets and plotting the interaction as a simple slopes plot:

The function generate_interaction() simulates a single data set:

set.seed(2020)
library(InteractionPoweR)
example_data = generate_interaction(N = 350,        # sample size
                                    r.x1.y = .2,    # correlation between x1 and y
                                    r.x2.y =  .1,   # correlation between x2 and y
                                    r.x1.x2 = .2,   # correlation between x1 and x2
                                    r.x1x2.y = .15  # correlation between x1x2 and y
                                      )

The data can then be plotted using the plot_interaction() function:

plot_interaction(data = example_data, # simulated data
                 q = 2                # number of simple slopes
                 )

The function test_interaction() provides easy access to the results of the interaction regression, the adjusted \(R^2\) of the interaction term, the 95% confidence interval of the interaction term, the shape of the interaction (crossover.point = the value of \(X_1\) where the \(X_2\) simple-slopes intersect, shape = the shape of the interaction, >1 = cross-over, 1 = knock-out, <1 = attenuated), the simple slopes of \(X_2\), and the correlation between the variables:

test_interaction(data = example_data, # simulated data
                 q = 2                # number of simple slopes
                 )

## $linear.model
##       Estimate Std. Error  t value     Pr(>|t|)
## x1   0.1888967 0.05153863 3.665148 2.859696e-04
## x2   0.1356407 0.05165265 2.626016 9.022851e-03
## x1x2 0.2129386 0.05001033 4.257892 2.661829e-05

A simple power analysis

The simplest power analysis we can run is one in which all the parameters are already known. All the correlations are known, the sample size is known, and the interaction effect size is known. We additionally specify alpha, which is the p-value we’re using (0.05 is default). We can compute power analytically using the power_interaction_r2() function.

power_interaction_r2(
                  alpha = 0.05,   # p-value
                  N = 350,        # sample size
                  r.x1.y = .2,    # correlation between x1 and y
                  r.x2.y =  .1,   # correlation between x2 and y
                  r.x1.x2 = .2,   # correlation between x1 and x2
                  r.x1x2.y = .15) # correlation between x1x2 and y

##         pwr
## 1 0.8131373

We find that our analysis has 80% power (pwr), and the interaction has a shape of 0.75 (the interaction effect x1x2 is 75% of the magnitude of x1 - the interaction is an attenuated effect where the simple-slope lines are all in the same direction).

We can also use the function power_interaction to compute power via monte-carlo simulation. Additional parameters include n.iter, which is the number of simulations run. In this example we’ll use just 1,000 simulations, but we recommend using 10,000 for more stable results.

set.seed(290115)
power_interaction(n.iter = 1000,  # number of simulations
                  alpha = 0.05,   # p-value
                  N = 350,        # sample size
                  r.x1.y = .2,    # correlation between x1 and y
                  r.x2.y =  .1,   # correlation between x2 and y
                  r.x1.x2 = .2,   # correlation between x1 and x2
                  r.x1x2.y = .15) # correlation between x1x2 and y

## Warning: executing %dopar% sequentially: no parallel backend registered

##     N   pwr
## 1 350 0.805

We find that our analysis has 80% power (pwr).

Exploring the parameter space

Typically not all variables are known in a power analysis. For example, we know the magnitude of the interaction effect we’re interested in, and we want to learn what sample size would be needed to detect that effect with 90% power. Or we have a sample already, and we want to learn what is the smallest effect we can detect with 90% power. To answer these questions, the user simply needs to provide the range of parameters that they would like the analysis to use. Almost any of the input parameters can be ranges, and the power_interaction runs n.iter simulations for every combination of input parameters.

As the number of input parameters increase, so too does the total number of simulations. To reduce the amount of time an analysis takes, power_interaction() supports running simulations in parallel. The number of cores to be used for the parallel simulation is indicated by the cl flag (we recommend a number between 4 - 6 on most personal computers).

Finding the optimal sample size

For example, to explore multiple sample sizes we can set N = seq(200,600,by = 50), which runs a simulation for N = 200, 250, 300 etc, up to N=500. Equivalently, we could also set N = c(200,250,300,350,400,450,500,550,600), but the former is faster to write.

power_test = power_interaction_r2(
                  alpha = 0.05,                  # p-value
                  N = seq(200,600,by = 10),      # sample size
                  r.x1.y = .2,                   # correlation between x1 and y
                  r.x2.y =  .1,                  # correlation between x2 and y
                  r.x1.x2 = .2,                  # correlation between x1 and x2
                  r.x1x2.y = .15)                # correlation between x1x2 and y
                 
power_test

##      N       pwr
## 1  200 0.5724275
## 2  210 0.5935829
## 3  220 0.6139697
## 4  230 0.6335895
## 5  240 0.6524470
## 6  250 0.6705498
## 7  260 0.6879081
## 8  270 0.7045340
## 9  280 0.7204417
## 10 290 0.7356468
## 11 300 0.7501662
## 12 310 0.7640182
## 13 320 0.7772214
## 14 330 0.7897955
## 15 340 0.8017607
## 16 350 0.8131373
## 17 360 0.8239461
## 18 370 0.8342077
## 19 380 0.8439429
## 20 390 0.8531723
## 21 400 0.8619163
## 22 410 0.8701952
## 23 420 0.8780286
## 24 430 0.8854362
## 25 440 0.8924369
## 26 450 0.8990493
## 27 460 0.9052915
## 28 470 0.9111809
## 29 480 0.9167346
## 30 490 0.9219691
## 31 500 0.9269002
## 32 510 0.9315434
## 33 520 0.9359132
## 34 530 0.9400240
## 35 540 0.9438894
## 36 550 0.9475223
## 37 560 0.9509354
## 38 570 0.9541406
## 39 580 0.9571493
## 40 590 0.9599725
## 41 600 0.9626206

We can plot these results using the function plot_power_curve():

plot_power_curve(power_data = power_test, # output from power_interaction()
                 power_target = .9,        # the power we want to achieve 
                 x = "N" # x variable
                 )

By eye-balling this plot, we can see that N=450 yields approximately 90% power, and N=330 yields approximately 80% power. Since it is so fast to compute analytic power, we can re-run the analysis over a restricted range to get the exact N.

power_test = power_interaction_r2(
                  alpha = 0.05,                  # p-value
                  N = seq(450,470,by = 1),      # sample size
                  r.x1.y = .2,                   # correlation between x1 and y
                  r.x2.y =  .1,                  # correlation between x2 and y
                  r.x1.x2 = .2,                  # correlation between x1 and x2
                  r.x1x2.y = .15)                # correlation between x1x2 and y
                 
power_test

##      N       pwr
## 1  450 0.8990493
## 2  451 0.8996899
## 3  452 0.9003268
## 4  453 0.9009600
## 5  454 0.9015896
## 6  455 0.9022156
## 7  456 0.9028379
## 8  457 0.9034566
## 9  458 0.9040718
## 10 459 0.9046834
## 11 460 0.9052915
## 12 461 0.9058960
## 13 462 0.9064970
## 14 463 0.9070945
## 15 464 0.9076886
## 16 465 0.9082792
## 17 466 0.9088663
## 18 467 0.9094501
## 19 468 0.9100304
## 20 469 0.9106073
## 21 470 0.9111809

We achieve 90% power with N=460.

In the case where we are using simulations, running a lot of simulations is computationally quite expensive. The function power_estimate() can thus be used to obtain a more precise answer when we haven’t sampled the parameter space as densely. This function fits a regression model to the power results to estimate when a specific power will be achieved.

power_test = power_interaction_r2(
                  alpha = 0.05,                  # p-value
                  N = seq(200,600,by = 50),      # sample size
                  r.x1.y = .2,                   # correlation between x1 and y
                  r.x2.y =  .1,                  # correlation between x2 and y
                  r.x1.x2 = .2,                  # correlation between x1 and x2
                  r.x1x2.y = .15)                # correlation between x1x2 and y
                

power_estimate(power_data = power_test, # output from power_interaction()
                        x = "N",        # the variable we want a precise number for
             power_target = 0.9         # the power we want to achieve 
)

## [1] 445.3621

With only 9 samples of the parameter space, we can estimate that we nee N=453 to achieve 90% power, which isn’t that far off.

Finding the smallest detectable effect size

Another common use-case is when the sample size and variables-of-interest are known, and we want to know how small of an interaction effect can be detected at a certain power level. We can repeat the same steps as above, except this time r.x1x2.y will be a range of values.

power_test = power_interaction_r2(
                  alpha = 0.05,                  # p-value
                  N = 350                ,       # sample size
                  r.x1.y = .2,                   # correlation between x1 and y
                  r.x2.y =  .1,                  # correlation between x2 and y
                  r.x1.x2 = .2,                  # correlation between x1 and x2
                  r.x1x2.y = seq(.1,.2,by=.005))  # correlation between x1x2 and y
                  
power_test

##    r.x1x2.y       pwr
## 1     0.100 0.4760123
## 2     0.105 0.5138689
## 3     0.110 0.5516069
## 4     0.115 0.5888872
## 5     0.120 0.6253837
## 6     0.125 0.6607914
## 7     0.130 0.6948340
## 8     0.135 0.7272701
## 9     0.140 0.7578978
## 10    0.145 0.7865584
## 11    0.150 0.8131373
## 12    0.155 0.8375646
## 13    0.160 0.8598129
## 14    0.165 0.8798947
## 15    0.170 0.8978583
## 16    0.175 0.9137828
## 17    0.180 0.9277732
## 18    0.185 0.9399539
## 19    0.190 0.9504639
## 20    0.195 0.9594509
## 21    0.200 0.9670667

plot_power_curve(power_data = power_test, # output from power_interaction()
                 power_target = .9        # the power we want to achieve 
                 )

We see that we have approximately 90% power to detect effects as small as r.x1x2.y = 0.1725.

In some cases it may also be useful to take a look the distribution of simple slopes across the range of parameters tested. If we run our analysis as a simulation and use detailed_results=T, we can look at the distribution of effect sizes and simple slopes.

set.seed(316834)
power_test = power_interaction(n.iter = 1000,    # number of simulations
                  alpha = 0.05,                  # p-value
                  N = 350                ,       # sample size
                  r.x1.y = .2,                   # correlation between x1 and y
                  r.x2.y =  .1,                  # correlation between x2 and y
                  r.x1.x2 = .2,                  # correlation between x1 and x2
                  r.x1x2.y = seq(.1,.2,by=.01),   # correlation between x1x2 and y
                  cl = 2,                         # number of clusters for parallel analyses
                  detailed_results = T           # detailed results
                )

## Performing 11000 simulations

plot_simple_slope(power_data = power_test)

From this we can learn that the range of slopes that would be consistent with the smallest effect we are powered to detect, r.x1x2.y=0.17, is quite large. In particular, note that the lower slope could be negative, 0, or even fairly large in the positive direction - all consistent with r.x1x2.y=0.17. See the section below on Detailed Results for more information on all the outputs when detailed.results = TRUE.

Varying multiple parameters

It is not uncommon that multiple parameters in the simulation are unknown. To find the power at every combination of the parameters, simply input a range of values for every unknown parameter in the simulation. We generally recommend to only vary up to 3 parameters at a time, any more and it can be difficult to parse the results (and simulations can take a very long time to run). For example, lets say there’s a range of plausible effect sizes for the interaction, and we want to know how large of a sample we would need to detect each of them. To test this, we can vary both N and r.x1x2.y:

power_test = power_interaction_r2(
                  alpha = 0.05,                   # p-value
                  N = seq(100,600,by = 10),       # sample size
                  r.x1.y = .2,                    # correlation between x1 and y
                  r.x2.y = .1,                    # correlation between x2 and y
                  r.x1.x2 = .2       ,            # correlation between x1 and x2
                  r.x1x2.y = seq(.15,.2,by=.005), # correlation between x1x2 and y
                  detailed_results = T)

plot_power_curve(power_data = power_test, # output from power_interaction()
                 power_target = .9,       # the power we want to achieve 
                 x = "N",                 # x-axis
                 group = "r.x1x2.y"       # grouping variable
                 )

power_estimate(power_data = power_test %>% dplyr::select(c("N","r.x1x2.y","pwr")), # select the variables used for the estimate
                        x = "N",        # the variable we want a precise number for
             power_target = 0.9         # the power we want to achieve 
)

##    r.x1x2.y estimate
## 1     0.150 440.7966
## 2     0.155 412.6445
## 3     0.160 387.7582
## 4     0.165 365.0462
## 5     0.170 344.5132
## 6     0.175 325.8309
## 7     0.180 308.7349
## 8     0.185 293.3426
## 9     0.190 278.8560
## 10    0.195 265.4087
## 11    0.200 253.3799

From this we’ve learned that, depending on exactly what effect size we’re aiming for, we’ll need an N between 260 and 450.

Why correlations are important

Users may wonder why this package requires so many effect sizes and correlations. Here is a simple example, where the magnitude of the correlation between x1 and x2, as well as their main effects, jointly influence power, in somewhat surprising ways.

power_test = power_interaction_r2(
                  alpha = 0.05,                   # p-value
                  N = 600,                       # sample size
                  r.x1.y = c(-.3,0,.3),                    # correlation between x1 and y
                  r.x2.y = round(seq(-.3,.3,.05),2),                    # correlation between x2 and y
                  r.x1.x2 = seq(-.8,.8,.05),        # correlation between x1 and x2
                  r.x1x2.y = .1,                  # correlation between x1x2 and y
                  detailed_results = T)

plot_power_curve(power_data = power_test, # output from power_interaction()
                 power_target = .8,       # the power we want to achieve 
                 x = "r.x1.x2",
                 group = "r.x2.y",
                 facets =  "r.x1.y"          
                 )

Each line shows how power varies depending on the correlation between x1 and x2. This effect then also depends on the main effects of x1 and x2. Thus, ignoring correlations could result in a very inaccurate power calculation.

Reliability

Reliability is an important issue in statistics. Even if effects are large, if your measurements are unreliable, then you’ll be under-powered. This is especially true for interactions, as the reliability of both \(X_1\) and \(X_2\) influence power, as well as the reliability of \(Y\). In the context of InteractionPoweR, “reliability” means the proportion of the variance of each variable that is attributable to true signal, as opposed to measurement error. A reliability of ‘1’ (the default), means that your variables were measured with no error (reliability must be greater than 0, and less than or equal to 1). Common statistics that reflect reliability include test-retest reliability, inter-rater reliability, and Cronbach’s alpha. The flags rel.x1, rel.x2, and rel.y control the reliability of each simulated variable. We recommend exploring how much reliability will affect your power. For example:

power_test = power_interaction_r2(
                  alpha = 0.05,                   # p-value
                  N = 450,                        # sample size
                  r.x1.y = .2,                    # correlation between x1 and y
                  r.x2.y = .1,                    # correlation between x2 and y
                  r.x1.x2 = .2,                   # correlation between x1 and x2
                  r.x1x2.y = .15,                 # correlation between x1x2 and y
                  rel.x1 =  seq(.2,1,by=.1),      # x1 reliability    
                  rel.x2 =  seq(.2,1,by=.1),      # x2 reliability          
                  rel.y  =  seq(.2,1,by=.2))      # y reliability          

plot_power_curve(power_data = power_test, # output from power_interaction()
                 power_target = .9,       # the power we want to achieve 
                 x     = "rel.x1",        # x-axis
                 group = "rel.x2",        # grouping variable
                 facets= "rel.y"          # facets variable
                 )

These plots make it clear how large of an effect reliability will have on your results. In this simulation, even a reliability of “acceptable” or “good” (reliability = 0.8), is enough to bring our power down, from 0.9 to 0.636!

Binary/ordinal variables

power_interaction_r2() assumes all variables are continuous, with a normal distribution. power_interaction() simulates all variables as continuous and normal. However, variables can be transformed so that they are binary/ordinal (e.g., 5 discrete levels, though any value under 20 is possible). Typically, when a continuous and normal variable is transformed to be binary/ordinal, the correlations between that variable and all other variables are reduced or altered. Sometimes, it makes sense for this to happen in the power analysis. For example, say a variable is continuous in the literature, and the effect sizes in the power analysis are drawn from prior work with the continuous variable, but in your analysis you’ve chosen to dichotomize that variable or to measure that trait with a ordinal scale. In that case, the reduction in correlations makes sense, because that accurately reflects your data analysis and what your power will be. On the other hand, say a variable is binary (at least in the literature), maybe you’re looking at sex or diagnosis for a disorder. In that case, your input correlations are the correlations with the binary/ordinal variable, so you don’t want them to be reduced or altered.

power_interaction() distinguishes between these cases with the adjust.correlations flag. The default is adjust.correlations = TRUE. This indicates that the input correlations are with the binary/ordinal variables. To circumvent the problem of variable transformations altering correlations, power_interaction() runs a function to compute how much the input correlations need to be adjusted so that the final output variables have the correlation structure specified by the user. When adjust.correlations = FALSE, this function is not run, allowing the user to see the impact of variable transformations on the correlation structure, and on their power.

The flags k.x1, k.x2, and k.y control the number of discrete values each variable takes (i.e., k.x1 =2 means that x1 is a binary variable, while k.x2 = 5 means that x2 is a ordinal-variable).

Binary

Here’s a single data set where x1, x2, and y are all binary. Note that when \(Y\) is binary, the analysis is run as a logistic regression.

test_data = generate_interaction(
                  N = 450,                        # sample size
                  r.x1.y = .2,                    # correlation between x1 and y
                  r.x2.y = .1,                    # correlation between x2 and y
                  r.x1.x2 = .2,                   # correlation between x1 and x2
                  r.x1x2.y = .15,                 # correlation between x1x2 and y
                  k.x1 = 2,                       # x1 is binary
                  k.x2 = 2,                       # x2 is binary
                  k.y = 2,                        # y is binary
                  adjust.correlations = TRUE)     # Adjust correlations?  

plot_interaction(data =test_data )

ordinal

Here’s an example showing the effects of artificially discretizing x1 on power. We see that power is lower when there are fewer discrete values.

power_test = power_interaction(
                  n.iter=1000,
                  N = 450,                        # sample size
                  r.x1.y = .2,                    # correlation between x1 and y
                  r.x2.y = .1,                    # correlation between x2 and y
                  r.x1.x2 = .2,                   # correlation between x1 and x2
                  r.x1x2.y = .15,                 # correlation between x1x2 and y
                  k.x1 = c(2,3,4,6,8),             # x1 has 2-10 discrete values
                  adjust.correlations = FALSE)    # Adjust correlations?  

plot_power_curve(power_test,power_target = .9)

Detailed and full results

Beyond power and the range of simple slopes, power_interaction() and power_interaction_r2() generate a lot of additional information about the simulations. These can be optionally returned using detailed_results = TRUE and full_simulation = TRUE (for power_interaction() only). By default, both of these flags are FALSE. detailed_results returns additional information for each unique setting combination, including the mean correlation structure between the simulated variables across niter simulations, and the mean regression coefficients. full_simulation, as the name suggests, returns the output of test_interaction() for every single simulated data set that power_interaction() generates. The output can be quite large (i.e. if 10,000 simulations are run, it will have 10,000 rows).

Detailed results

Let’s return to our example of examining a range of interaction effect sizes, except now with detailed_results = TRUE. This yields a lot more information about what the simulated data looked like, on average.

power_test = power_interaction(n.iter = 1000,    # number of simulations
                  alpha = 0.05,                  # p-value
                  N = seq(50,200,50)  ,          # sample size
                  r.x1.y = .2,                   # correlation between x1 and y
                  r.x2.y =  .1,                  # correlation between x2 and y
                  r.x1.x2 = .2,                  # correlation between x1 and x2
                  r.x1x2.y = .4,                 # correlation between x1x2 and y
                  detailed_results = TRUE        # return detailed results       
                   )
power_test

##     N   pwr x1_pwr x2_pwr x1x2_est_mean x1x2_r2_mean crossover_mean shape_mean
## 1  50 0.780  0.277  0.079     0.4426144    0.1763593     -0.1539197   3.210472
## 2 100 0.979  0.509  0.100     0.3955305    0.1523973     -0.1629060   2.961055
## 3 150 0.998  0.695  0.135     0.3910607    0.1526577     -0.1656292   3.206382
## 4 200 0.998  0.823  0.155     0.3936075    0.1546993     -0.1607875   2.736218
##   shape_q_2.5 shape_q_97.5 crossover_q_2.5 crossover_q_97.5    min.lwr
## 1 -20.6471929    20.103591      -0.8424624        0.4985743 -0.5779247
## 2   0.6913046    11.439831      -0.7250498        0.3597690 -0.4340121
## 3   0.9342058     7.868203      -0.6388662        0.2561451 -0.3734257
## 4   1.0795667     6.020076      -0.5102325        0.1646625 -0.3482832
##     min.upr   max.lwr   max.upr x1x2_95_CI_2.5_mean x1x2_95_CI_97.5_mean
## 1 0.2391543 0.1499189 0.9550447           0.1730314            0.7121974
## 2 0.1677606 0.2346053 0.7952463           0.2112746            0.5797864
## 3 0.1279905 0.2774962 0.7592573           0.2442872            0.5378341
## 4 0.1071991 0.3042969 0.7259304           0.2672668            0.5199481
##   x1x2_95_CI_width_mean r_y_x1x2_q_2.5 r_y_x1x2_q_50.0 r_y_x1x2_q_97.5
## 1             0.5391660      0.2404203       0.4415142       0.6305032
## 2             0.3685118      0.2190781       0.4014453       0.5758182
## 3             0.2935468      0.2375604       0.3996108       0.5472230
## 4             0.2526813      0.2658015       0.3986438       0.5217490
##   x1_est_mean x2_est_mean r_x1_y_mean r_x2_y_mean r_x1_x2_mean r_y_x1x2_mean
## 1   0.1922320  0.06429109   0.1969611  0.10240447    0.2083746     0.4360222
## 2   0.1877368  0.06183341   0.1938392  0.09652172    0.1993411     0.3960011
## 3   0.1915144  0.06219301   0.2028635  0.10097609    0.1989817     0.3964685
## 4   0.1909169  0.06193294   0.2005727  0.09768533    0.1993219     0.3961706
##   r_x1_x1x2_mean r_x2_x1x2_mean
## 1  -1.058919e-02   -0.002177245
## 2  -8.850879e-03   -0.002288713
## 3   7.677160e-06    0.003463617
## 4  -4.323312e-03   -0.003413765

What is all this?

The first two columns are our standard output, the setting that was varied across simulations, and the power. Next we have x1_pwr1 and x2_pwr. This is the power to detect an effect of x1 and x2 in the interaction model. x1x2_est_mean and x1x2_r2_mean are then the mean effect size (\(B_3\)) and mean change in the adjusted \(R^2\) (or pseudo-\(R^2\) when \(Y\) is binary) when the interaction term \(X_1X_2\) is added to the model. crossover and shape are the value of \(X_1\) where the \(X_2\) simple slopes intersect, and the shape of the interaction (\(B_3 / B_1\)), which reflects whether it is a knock-out, attenuated, or crossover interaction. We also give the mean upper and lower bounds of the 95% confidence interval of the shape and crossover.

For example, one interesting insight we can gain here is that even though n=50 has 76% power in this analysis, the observed shape will vary widely. The confidence interval indicates that it is even likely that we will observe significant effects where the interaction has the opposite shape from anticipated. In fact, we don’t see 95% of simulations yielding a shape that is consistent with our hypothesis that r.x1x2.y > r.x1.y, until N=200, which has 99.99% power!

Next we have min.lwr, min.upr, max.lwr, and max.upr. These reflect the range of simple-slopes that were observed in the simulations where the interaction was significant. By default, the number of simple slopes (q) is 2 (i.e. a 50/50 split), and \(X_2\) is the variable being conditioned on. The ranges reflect that the majority of the lower simple slope ranges from -0.05 to .2, and the upper simple slope ranges from .2 to .45. The proportion of the simple slopes reflected by these ranges can be controlled with the IQR flag. The default value for IQR is 1.5, which means that the output ranges are the median lower and upper simple slope, +/- 1.5 IQRs (an IQR is the 75th percentile - 25th percentile). This output is intended to give further insight into the effect sizes detected by the simulation.

x1x2_95_CI_2.5_mean and x1x2_95_CI_97.5_mean are the mean lower and upper 95% confidence intervals of the interaction term, and x1x2_95_CI_width_mean is the mean width of the confidence interval. Similarly, r_y_x1x2_q_2.5, r_y_x1x2_q_50.0, and r_y_x1x2_q_97.5 are quantiles (2.5%, 50%, and 97.5%) of the correlation between \(Y\) and \(X_1X_2\) when the interaction is significant.

x1_est_mean and x2_est_mean are the mean main effects of \(X_1\) and \(X_2\), and pwr_x1 and pwr_x2 is the power to detect those main effects, in the context of the full interaction regression.

Full results

full_results can be useful when one wants a better grasp of what goes into the power analysis. If we return to our first example, we can see what range of sample-level correlations one can expect to see, given a population-level correlation:

set.seed(942141)
power_test = power_interaction(n.iter = 1000,  # number of simulations
                  alpha = 0.05,   # p-value
                  N = 350,        # sample size
                  r.x1.y = .2,    # correlation between x1 and y
                  r.x2.y =  .1,   # correlation between x2 and y
                  r.x1.x2 = .2,   # correlation between x1 and x2
                  r.x1x2.y = .15, # correlation between x1x2 and y
                  full_simulation = T, # return the full simulation results     
                  detailed_results = T # get detailed results, including correlations
                   )
# the standard output:
power_test$results

##     N   pwr x1_pwr x2_pwr x1x2_est_mean x1x2_r2_mean crossover_mean shape_mean
## 1 350 0.808  0.949  0.207     0.1652355   0.02668037     -0.3782685  0.9575423
##   shape_q_2.5 shape_q_97.5 crossover_q_2.5 crossover_q_97.5    min.lwr
## 1   0.4320577     1.978912        -1.19942        0.3104526 -0.0780142
##     min.upr   max.lwr   max.upr x1x2_95_CI_2.5_mean x1x2_95_CI_97.5_mean
## 1 0.2031697 0.1853961 0.4726258          0.06420129            0.2662696
##   x1x2_95_CI_width_mean r_y_x1x2_q_2.5 r_y_x1x2_q_50.0 r_y_x1x2_q_97.5
## 1             0.2020683     0.09287497       0.1652633        0.258831
##   x1_est_mean x2_est_mean r_x1_y_mean r_x2_y_mean r_x1_x2_mean r_y_x1x2_mean
## 1   0.1916138  0.05885155    0.202843  0.09743667    0.2009862     0.1675586
##   r_x1_x1x2_mean r_x2_x1x2_mean
## 1   -0.001093734    0.002505613

# range of correlations when the test is significant

quants = c(0,.025,.25,.5,.75,.975,1) #quantiles

power_test$simulation %>% 
  dplyr::filter(sig_int ==1 ) %>%  # only significant results
  dplyr::summarise(prob = quants,
                   qs = stats::quantile(r_y_x1x2,quants))

## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
##   always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

##    prob         qs
## 1 0.000 0.05200316
## 2 0.025 0.09287497
## 3 0.250 0.13391873
## 4 0.500 0.16526331
## 5 0.750 0.19832108
## 6 0.975 0.25883097
## 7 1.000 0.30228835

When \(B_3\) is significant (sig_int == 1), 95% of the observed sample-level correlations range from 0.1 to 0.25, even though the population-level correlation is 0.15! Also note that more than half of the observed significant effect sizes are greater than 0.15, even though the median of all effects is 0.15. This is why post-hoc power-analyses using the observed effect-sizes in your sample are typically not a great idea. Because the choice to run the power analysis is conditioned on the result being significant, you’ve effectively subjected yourself to publication bias, and as a result the power analysis will tend to yield an inflated estimate of power.

David AA Baranger