El valor convencionalmente se expresa en la forma
$$y = X\beta + \varepsilon$$
for an $n$-vector $s$ of responses, an $n\times k$ model matrix $X$, and a $k$-vector of parameters $\beta$, under the assumptions that the random errors $\varepsilon = (\varepsilon_i)$ are uncorrelated with equal variances $\sigma^2$ and zero means: that is,
$$E(\varepsilon)=0; \ \operatorname{Var}(\varepsilon) = \sigma^2 I_{n}.$$
When this is the case, the ordinary least squares estimate is
$$\hat\beta = (X^\prime X)^{-} X^\prime y.$$
Let $Z$ be a $2\times k$ matrix whose rows $z_R$ and $z_T$ give the values of the regressors for Rachel and Thomas, respectively. The predicted responses are in the $2$-vector $Z\hat\beta$. The actual responses are $z_R\beta+\varepsilon_R$ and $z_T\beta+\varepsilon_T$ where these new epsilons are zero-mean uncorrelated random variables, independent of the original $\epsilon$, and with common variances $\sigma^2$.
The difference between those values for Rachel minus Thomas, which I will call $\delta$, is simply $$\delta=(z_R\beta+\varepsilon_R ) - (z_T\beta + \varepsilon_T) = (1,-1)Z\beta + \varepsilon_R - \varepsilon_T.$$
Both sides are $1\veces 1$ matrices--that is, numbers--and evidently they are random by virtue of the appearance of $s$ on the right hand side. (The right hand side is the estimated difference between Rachel's and Thomas's responses, plus the deviation $\varepsilon_R$ between Rachel's actual and predicted responses, minus the deviation $\varepsilon_T$ between Thomas's actual and predicted responses.) We may compute its expectation term by term:
$$\eqalign{
E(\delta) y= E\left((1,-1)Z\beta + \varepsilon_R - \varepsilon_T\right)\\
&= (1,-1)Z\beta +0 - 0\\
&= z_1\beta - z_2\beta.
}$$
This is exactly what one would suppose: the expected difference is the difference in predicted values. It can be estimated by replacing the parameters by their estimates. To indicate this, let's place a hat over the "$E$":
$$\hat{E}(\delta) = (1,-1)Z\hat\beta = z_1\hat\beta - z_2\hat\beta.\tag{1}$$
That's the $2.88-2.51$ appearing in the question.
We may continue the analysis of the difference between Rachel and Thomas by expressing the two components of uncertainty about that distribution: one is because $\beta$ and $\sigma$ are estimated from random data and the other is the appearance of those random deviations $\varepsilon_R$ and $\varepsilon_T$.
$$\eqalign{
\operatorname{Var}(\text{Rachel}-\text{Thomas}) &= \operatorname{Var}\left((1,-1)Z\hat\beta + \varepsilon_R - \varepsilon_T\right) \\
&= (1,-1)Z \operatorname{Var}(\hat\beta) Z^\prime (1,-1)^\prime + \operatorname{Var}(\varepsilon_R) + \operatorname{Var}(\varepsilon_T) \\
&=(1,-1)Z \operatorname{Var}(\hat\beta) Z^\prime (1,-1)^\prime + 2\hat\sigma^2.\la etiqueta{2}
}$$
The variances of the epsilons are estimated by $\hat\sigma^2$. We don't know $\operatorname{Var}(\hat\beta)$ because it depends on $\sigma$. It is routine to estimate this variance by replacing $\sigma^2$ by its least-squares estimate $\hat\sigma^2$, producing a quantity sometimes written $\widehat{\operatorname{Var}}(\hat\beta)$.
These estimates can be converted into probabilities only by making more specific assumptions about the conditional distributions of $y$ on $X$. By far the simplest is to assume $y$ is multivariate Normal, for then $\delta$ (being a linear transform of the vector $y$) itself is Normal and therefore its mean and variance completely determine its distribution. Its estimated distribution is obtained by placing the hats on $E$ and $\operatorname{Var}$.
Finally we have assembled all the information needed for a solution. The OLS procedure estimates the distribution of Rachel's response minus Thomas's response to be Normal with a mean equal to the difference in predicted values $(1)$ and with a variance estimated by $(2)$, which involves the estimated error variance $\hat\sigma^2$ and the variance-covariance matrix of the coefficient estimates, $\operatorname{Var}(\hat\beta)$.
This R
code directly carries out the calculations exhibited in formulas $(1)$ and $(2)$:
fit <- lm(cgpa ~ hgpa + sat + ltrs, data=df) # model to predict College GPA
Z <- as.matrix(data.frame(intercept=1, hgpa=c(4,3), sat=c(1168,1168),ltrs=c(6,6)))
cont <- matrix(c(1,-1), 1, 2) # Rachel - Thomas "contrast".
beta.hat <- coef(fit) # Estimated coefficients for prediction
delta.hat <- cont %*% Z %*% beta.hat # Predicted mean difference
sigma.hat <- sigma(fit) # Estimated error SD
var.delta.hat <- cont %*% Z %*% vcov(fit) %*% t(Z) %*% t(cont) + 2 * sigma.hat^2
pnorm(0, -delta.hat, sqrt(var.delta.hat)) # Chance Rachel > Thomas
The output for these data is $0.67$: OLS estimates that there is a $67\$ chance that Rachel's CGPA exceeds that of Thomas. (It turns out in this case, because Rachel and Thomas are so similar, the model fits so well, and the amount of data is so large, that $%\widehat{\operatorname{Var}}(\hat\delta)$ is tiny compared to $2\hat\sigma^2$ y por lo tanto podrían ser descuidado. Que no siempre será el caso.)
Este es el mecanismo que subyace en el cálculo de los intervalos de predicción: podemos calcular intervalos de predicción para la diferencia entre Rachel y Thomas CGPA el uso de esta distribución.