Me pregunto si me estoy perdiendo algo obvio, pero no podía hacerlo estadísticamente mediante ANCOVA? Una cuestión importante es que las pistas en las dos regresiones se estiman con error. Son estimaciones de las pistas en la población en general. Si la preocupación es, si las dos líneas de regresión son paralelas o no en la población , a continuación, no tiene sentido comparar a $a_1$ $a_2$ directamente por equivalencia exacta; ambos están sujetos a error o incertidumbre que debe ser tomada en cuenta.
Si pensamos acerca de esto desde un punto de vista estadístico, y podemos combinar los datos en $x$ $y$ para ambos conjuntos de datos en algunos de manera significativa (es decir, $x$ $y$ en ambos conjuntos se han extraído de las dos poblaciones con rangos similares para las dos variables es simplemente la relación entre ellos que son diferentes en las dos poblaciones), entonces podemos encajar los dos siguientes modelos:
$$\hat{y} = b_0 + b_1x + b_2g$$
and
$$\hat{y} = b_0 + b_1x + b_2g + b_3xg$$
Where $b_i$ are the model coefficients, and $g$ is a grouping variable/factor, indicating which data set each observation belongs to.
We can use an ANOVA table or F-ratio to test if the second, more complex model fits the data better than the simpler model. The simpler model states that the slopes of the two lines are the same ($b_1$) but the lines are offset from one another by an amount $b_2$.
The more complex model includes an interaction between the slope of the line and the grouping variable. If the coefficient for this interaction term is significantly different from zero or the ANOVA/F-ratio indicates the more complex model fits the data better then we must reject the Null hypothesis that that two lines are parallel.
Here is an example in R using dummy data. First, data with equal slopes:
set.seed(2)
samp <- factor(sample(rep(c("A","B"), each = 50)))
d1 <- data.frame(y = c(2,5)[as.numeric(samp)] + (0.5 * (1:100)) + rnorm(100),
x = 1:100,
g = samp)
m1 <- lm(y ~ x * g, data = d1)
m1.null <- lm(y ~ x + g, data = d1)
anova(m1.null, m1)
Which gives
> anova(m1.null, m1)
Analysis of Variance Table
Model 1: y ~ x + g
Model 2: y ~ x * g
Res.Df RSS Df Sum of Sq F Pr(>F)
1 97 122.29
2 96 122.13 1 0.15918 0.1251 0.7243
Indicating that we fail to reject the null hypothesis of equal slopes in this sample of data. Of course, we'd want to assure ourselves that we had sufficient power to detect a difference if there really was one so that we were not lead to erroneously fail to reject the null because our sample size was too small for the expected effect.
Now with different slopes.
set.seed(42)
x <- seq(1, 100, by = 2)
d2 <- data.frame(y = c(2 + (0.5 * x) + rnorm(50),
5 + (1.5 * x) + rnorm(50)),
x = x,
g = rep(c("A","B"), each = 50))
m2 <- lm(y ~ x * g, data = d2)
m2.null <- lm(y ~ x + g, data = d2)
anova(m2.null, m2)
Which gives:
> anova(m2.null, m2)
Analysis of Variance Table
Model 1: y ~ x + g
Model 2: y ~ x * g
Res.Df RSS Df Sum of Sq F Pr(>F)
1 97 21132.0
2 96 103.8 1 21028 19439 < 2.2e-16 ***
---
Signif. codes: 0 ‘***' 0.001 ‘**' 0.01 ‘*' 0.05 ‘.' 0.1 ‘ ' 1
Here we have substantial evidence against the null hypothesis and thus we can reject it in favour of the alternative (in other words, we reject the hypothesis that the slopes of the two lines are equal).
The interaction terms in the two models I fitted ($b_3xg$) give the estimated difference in slopes for the two groups. For the first model, the estimate of the difference in slopes is small (~0.003)
> coef(m1)
(Intercept) x gB x:gB
2.100068977 0.500596394 2.659509181 0.002846393
and a $t$-test on this would fail to reject the null hypothesis that this difference in slopes is 0:
> summary(m1)
Call:
lm(formula = y ~ x * g, data = d1)
Residuals:
Min 1Q Median 3Q Max
-2.32886 -0.81224 -0.01569 0.93010 2.29984
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.100069 0.334669 6.275 1.01e-08 ***
x 0.500596 0.005256 95.249 < 2e-16 ***
gB 2.659509 0.461191 5.767 9.82e-08 ***
x:gB 0.002846 0.008047 0.354 0.724
---
Signif. codes: 0 ‘***' 0.001 ‘**' 0.01 ‘*' 0.05 ‘.' 0.1 ‘ ' 1
Residual standard error: 1.128 on 96 degrees of freedom
Multiple R-squared: 0.9941, Adjusted R-squared: 0.9939
F-statistic: 5347 on 3 and 96 DF, p-value: < 2.2e-16
If we turn to the model fitted to the second data set, where we made the slopes for the two groups differ, we see that the estimated difference in slopes of the two lines is ~1 unit.
> coef(m2)
(Intercept) x gB x:gB
2.3627432 0.4920317 2.8931074 1.0048653
The slope for group "A" is ~0.49 (x
in the above output), whilst to get the slope for group "B" we need to add the difference slopes (give by the interaction term remember) to the slope of group "A"; ~0.49 + ~1 = ~1.49. This is pretty close to the stated slope for group "B" of 1.5. A $t$de la prueba en esta diferencia de pendientes también indica que la estimación de la diferencia se apartó de 0:
> summary(m2)
Call:
lm(formula = y ~ x * g, data = d2)
Residuals:
Min 1Q Median 3Q Max
-3.1962 -0.5389 0.0373 0.6952 2.1072
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.362743 0.294220 8.031 2.45e-12 ***
x 0.492032 0.005096 96.547 < 2e-16 ***
gB 2.893107 0.416090 6.953 4.33e-10 ***
x:gB 1.004865 0.007207 139.424 < 2e-16 ***
---
Signif. codes: 0 ‘***' 0.001 ‘**' 0.01 ‘*' 0.05 ‘.' 0.1 ‘ ' 1
Residual standard error: 1.04 on 96 degrees of freedom
Multiple R-squared: 0.9994, Adjusted R-squared: 0.9994
F-statistic: 5.362e+04 on 3 and 96 DF, p-value: < 2.2e-16