9 votos

Cómo calcular la diferencia de dos pistas?

¿Hay un método para entender si dos líneas son (más o menos) paralelo? Tengo dos líneas generadas a partir de las regresiones lineales y me gustaría entender si son paralelos. En otras palabras, me gustaría llegar al diferente de las pendientes de las dos líneas.

¿Hay una función de R para calcular esto?

EDIT: ... y cómo puedo obtener la pendiente (en grados) de una línea de regresión lineal?

10voto

Issac Kelly Puntos 3014

... El seguimiento de la respuesta @mpiktas ', esto es cómo se extrae la pendiente de un lm objeto y aplicar la fórmula anterior.

 # prepare some data, see ?lm
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2,10,20, labels=c("Ctl","Trt"))
weight <- c(ctl, trt)

lm.D9 <- lm(weight ~ group)
# extract the slope (this is also used to draw a regression line if you wrote abline(lm.D9)
coefficients(lm.D9)["groupTrt"] 
      groupTrt 
   -0.371 
# use the arctan*a1 / (360 / (2*pi)) formula provided by mpiktas
atan(coefficients(lm.D9)["groupTrt"]) * (360/(2 * pi)) 
 groupTrt 
-20.35485 
180-atan(coefficients(lm.D9)["groupTrt"]) * (360/(2 * pi))
 groupTrt 
200.3549 
 

7voto

Marc-Andre R. Puntos 789

La primera pregunta es la realidad de la geometría. Si tiene dos líneas de la forma:

$$y=a_1x+b_1$$ $$y=a_2x+b_2$$

then they are parallel if $a_1=a_2$. So if the slopes are equal then then the lines are parallel.

For the second question, use the fact that $\tan \alpha=a_1$, where $\alpha$ is the angle the line makes with $x$-axis, and $a_1$ is the slope of the line. So

$$\alpha=\arctan a_1$$

and to convert to degrees, recall that $2\pi=360$. So the answer in the degrees will be

$$\alpha=\arctan a_1\cdot \frac{360}{2\pi}.$$

The R function for $\arctan$ is called atan.

Sample R code:

> x<-rnorm(100)
> y<-x+1+rnorm(100)/2
> mod<-lm(y~x)
> mod$coef
    (Intercept)           x 
      0.9416175   0.9850303 
    > mod$coef[2]
        x 
0.9850303 
> atan(mod$coef[2])*360/2/pi
       x 
44.56792 

The last line is the degrees.

Update. For the negative slope values conversion to degrees should follow different rule. Note that the angle with the x-axis can get values from 0 to 180, since we assume that the angle is above the x-axis. So for negative values of $a_1$, the formula is:

$$\alpha=180-\arctan a_1\cdot \frac{360}{2\pi}.$$

Nota. Mientras que para mí era divertido recordar de alta escuela de la trigonometría, de las realmente útiles respuesta es la dada por Gavin Simpson. Dado que las pendientes de las rectas de regresión son variables aleatorias, la comparación de hipótesis estadísticas marco debe ser utilizado.

3voto

David J. Sokol Puntos 1730

Me pregunto si me estoy perdiendo algo obvio, pero no podía hacerlo estadísticamente mediante ANCOVA? Una cuestión importante es que las pistas en las dos regresiones se estiman con error. Son estimaciones de las pistas en la población en general. Si la preocupación es, si las dos líneas de regresión son paralelas o no en la población , a continuación, no tiene sentido comparar a $a_1$ $a_2$ directamente por equivalencia exacta; ambos están sujetos a error o incertidumbre que debe ser tomada en cuenta.

Si pensamos acerca de esto desde un punto de vista estadístico, y podemos combinar los datos en $x$ $y$ para ambos conjuntos de datos en algunos de manera significativa (es decir, $x$ $y$ en ambos conjuntos se han extraído de las dos poblaciones con rangos similares para las dos variables es simplemente la relación entre ellos que son diferentes en las dos poblaciones), entonces podemos encajar los dos siguientes modelos:

$$\hat{y} = b_0 + b_1x + b_2g$$

and

$$\hat{y} = b_0 + b_1x + b_2g + b_3xg$$

Where $b_i$ are the model coefficients, and $g$ is a grouping variable/factor, indicating which data set each observation belongs to.

We can use an ANOVA table or F-ratio to test if the second, more complex model fits the data better than the simpler model. The simpler model states that the slopes of the two lines are the same ($b_1$) but the lines are offset from one another by an amount $b_2$.

The more complex model includes an interaction between the slope of the line and the grouping variable. If the coefficient for this interaction term is significantly different from zero or the ANOVA/F-ratio indicates the more complex model fits the data better then we must reject the Null hypothesis that that two lines are parallel.

Here is an example in R using dummy data. First, data with equal slopes:

set.seed(2)
samp <- factor(sample(rep(c("A","B"), each = 50)))
d1 <- data.frame(y = c(2,5)[as.numeric(samp)] + (0.5 * (1:100)) + rnorm(100),
                 x = 1:100,
                 g = samp)
m1 <- lm(y ~ x * g, data = d1)
m1.null <- lm(y ~ x + g, data = d1)
anova(m1.null, m1)

Which gives

> anova(m1.null, m1)
Analysis of Variance Table

Model 1: y ~ x + g
Model 2: y ~ x * g
  Res.Df    RSS Df Sum of Sq      F Pr(>F)
1     97 122.29                           
2     96 122.13  1   0.15918 0.1251 0.7243

Indicating that we fail to reject the null hypothesis of equal slopes in this sample of data. Of course, we'd want to assure ourselves that we had sufficient power to detect a difference if there really was one so that we were not lead to erroneously fail to reject the null because our sample size was too small for the expected effect.

Now with different slopes.

set.seed(42)
x <- seq(1, 100, by = 2)
d2 <- data.frame(y = c(2 + (0.5 * x) + rnorm(50),
                       5 + (1.5 * x) + rnorm(50)),
                 x = x,
                 g = rep(c("A","B"), each = 50))
m2 <- lm(y ~ x * g, data = d2)
m2.null <- lm(y ~ x + g, data = d2)
anova(m2.null, m2)

Which gives:

> anova(m2.null, m2)
Analysis of Variance Table

Model 1: y ~ x + g
Model 2: y ~ x * g
  Res.Df     RSS Df Sum of Sq     F    Pr(>F)    
1     97 21132.0                                 
2     96   103.8  1     21028 19439 < 2.2e-16 ***
---
Signif. codes:  0 ‘***' 0.001 ‘**' 0.01 ‘*' 0.05 ‘.' 0.1 ‘ ' 1

Here we have substantial evidence against the null hypothesis and thus we can reject it in favour of the alternative (in other words, we reject the hypothesis that the slopes of the two lines are equal).

The interaction terms in the two models I fitted ($b_3xg$) give the estimated difference in slopes for the two groups. For the first model, the estimate of the difference in slopes is small (~0.003)

> coef(m1)
(Intercept)           x          gB        x:gB 
2.100068977 0.500596394 2.659509181 0.002846393

and a $t$-test on this would fail to reject the null hypothesis that this difference in slopes is 0:

> summary(m1)

Call:
lm(formula = y ~ x * g, data = d1)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.32886 -0.81224 -0.01569  0.93010  2.29984 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 2.100069   0.334669   6.275 1.01e-08 ***
x           0.500596   0.005256  95.249  < 2e-16 ***
gB          2.659509   0.461191   5.767 9.82e-08 ***
x:gB        0.002846   0.008047   0.354    0.724    
---
Signif. codes:  0 ‘***' 0.001 ‘**' 0.01 ‘*' 0.05 ‘.' 0.1 ‘ ' 1 

Residual standard error: 1.128 on 96 degrees of freedom
Multiple R-squared: 0.9941, Adjusted R-squared: 0.9939 
F-statistic:  5347 on 3 and 96 DF,  p-value: < 2.2e-16 

If we turn to the model fitted to the second data set, where we made the slopes for the two groups differ, we see that the estimated difference in slopes of the two lines is ~1 unit.

> coef(m2)
(Intercept)           x          gB        x:gB 
  2.3627432   0.4920317   2.8931074   1.0048653 

The slope for group "A" is ~0.49 (x in the above output), whilst to get the slope for group "B" we need to add the difference slopes (give by the interaction term remember) to the slope of group "A"; ~0.49 + ~1 = ~1.49. This is pretty close to the stated slope for group "B" of 1.5. A $t$de la prueba en esta diferencia de pendientes también indica que la estimación de la diferencia se apartó de 0:

> summary(m2)

Call:
lm(formula = y ~ x * g, data = d2)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.1962 -0.5389  0.0373  0.6952  2.1072 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 2.362743   0.294220   8.031 2.45e-12 ***
x           0.492032   0.005096  96.547  < 2e-16 ***
gB          2.893107   0.416090   6.953 4.33e-10 ***
x:gB        1.004865   0.007207 139.424  < 2e-16 ***
---
Signif. codes:  0 ‘***' 0.001 ‘**' 0.01 ‘*' 0.05 ‘.' 0.1 ‘ ' 1 

Residual standard error: 1.04 on 96 degrees of freedom
Multiple R-squared: 0.9994, Adjusted R-squared: 0.9994 
F-statistic: 5.362e+04 on 3 and 96 DF,  p-value: < 2.2e-16

i-Ciencias.com

I-Ciencias es una comunidad de estudiantes y amantes de la ciencia en la que puedes resolver tus problemas y dudas.
Puedes consultar las preguntas de otros usuarios, hacer tus propias preguntas o resolver las de los demás.

Powered by:

X