No, normalmente, el valor observado un 'valor estimado'.
Sin embargo, a pesar de que el valor observado es, sin embargo, técnicamente es una estimación de la media, en su particular $x$, y tratarla como una estimación de la realidad nos dicen que el sentido en el que OLS es mejor estimación de la media de allí.
En general se habla de regresión se utiliza en la situación en la que si usted fuera a tomar otro ejemplo con el mismo $x$'s, usted podría no obtener los mismos valores para el $y$'s. En la regresión lineal, tratamos la $x_i$ fijo/cantidades conocidas y las respuestas, el $Y_i$ como variables aleatorias (con los valores observados se denota por a $y_i$).
El uso más común de la notación, podemos escribir
$$Y_i = \alpha + \beta x_i + \varepsilon_i$$
The noise term, $\varepsilon_i$, is important because the observations don't lie right on the population line (if they did there'd be no need for regression; any two points would give you the population line); the model for $S$ must account for the values it takes, and in this case, the distribution of the random error accounts for the deviations from the ('true') line.
The estimate of the mean at point $x_i$ for ordinary linear regression has variance
$$\Big(\frac{1}{n} + \frac{(x_i-\bar{x})^2}{\sum(x_i-\bar{x})^2}\Big)\,\sigma^2$$
while the estimate based on the observed value has variance $\sigma^2$.
It's possible to show that for $n$ at least 3, $\,\frac{1}{n} + \frac{(x_i-\bar{x})^2}{\sum(x_i-\bar{x})^2}$ is no more than 1 (but it may be - and in practice usually is - much smaller). [Further, when you estimate the fit at $x_i$ by $y_i$ you're also left with the issue of how to estimate $\sigma$.]
But rather than pursue the formal demonstration, ponder an example, which I hope might be more motivating.
Let $v_f = \frac{1}{n} + \frac{(x_i-\bar{x})^2}{\sum(x_i-\bar{x})^2}$, the factor by which the observation variance is multiplied to get the variance of the fit at $x_i$.
However, let's work on the scale of relative standard error rather than relative variance (that is, let's look at the square root of this quantity); confidence intervals for the mean at a particular $x_i$ will be a multiple of $\sqrt{v_f}$.
So to the example. Let's take the cars
data in R; this is 50 observations collected in the 1920s on the speed of cars and the distances taken to stop:
So how do the values of $\sqrt{v_f}$ compare with 1? Like so:
The blue circles show the multiples of $\sigma$ for your estimate, while the black ones show it for the usual least squares estimate. As you see, using the information from all the data makes our uncertainty about where the population mean lies substantially smaller - at least in this case, and of course given that the linear model is correct.
As a result, if we plot (say) a 95% confidence interval for the mean for each value $x$ (including at places other than an observation), the limits of the interval at the various $x$'s are typically small compared to the variation in the data:
This is the benefit of 'borrowing' information from data values other than the present one.
Indeed, we can use the information from other values - via the linear relationship - to get good estimates the value at places where we don't even have data. Consider that there's no data in our example at x=5, 6 or 21. With the suggested estimator, we have no information there - but with the regression line we can not only estimate the mean at those points (and at 5.5 and 12.8 and so on), we can give an interval for it -- though, again, one that relies on the suitability of the assumptions of linearity (and constant variance of the $Y$s y la independencia).