Resumen
La generalización de la regresión de mínimos cuadrados para valores complejos de variables es directa, que consiste principalmente en la sustitución de la matriz de transpone por conjugar transpone en las habituales fórmulas de matriz. Un complejo de valores de regresión, sin embargo, corresponde a un complicado multivariante de regresión múltiple cuya solución sería mucho más difícil obtener utilizando el estándar (variable real) métodos. Por lo tanto, cuando el complejo modelo de valor es significativo, utilizando complejos aritmética para obtener una solución es fuertemente recomendado. Esta respuesta también incluye algunas formas para mostrar los datos y presentar gráficos de diagnóstico de la fit.
Por simplicidad, vamos a discutir el caso de las ordinarias (univariante) de regresión, que puede ser escrito
$$z_j = \beta_0 + \beta_1 w_j + \varepsilon_j.$$
I have taken the liberty of naming the independent variable $W$ and the dependent variable $Z$, which is conventional (see, for instance, Lars Ahlfors, Complex Analysis). All that follows is straightforward to extend to the multiple regression setting.
Interpretation
This model has an easily visualized geometric interpretation: multiplication by $\beta_1$ will rescale $w_j$ by the modulus of $\beta_1$ and rotate it around the origin by the argument of $\beta_1$. Subsequently, adding $\beta_0$ translates the result by this amount. The effect of $\varepsilon_j$ is to "jitter" that translation a little bit. Thus, regressing the $z_j$ on the $w_j$ in this manner is an effort to understand the collection of 2D points $(z_j)$ as arising from a constellation of 2D points $(w_j)$ via such a transformation, allowing for some error in the process. This is illustrated below with the figure titled "Fit as a Transformation."
Note that the rescaling and rotation are not just any linear transformation of the plane: they rule out skew transformations, for instance. Thus this model is not the same as a bivariate multiple regression with four parameters.
Ordinary Least Squares
To connect the complex case with the real case, let's write
$z_j = x_j + i y_j$ for the values of the dependent variable and
$w_j = u_j + i v_j$ for the values of the independent variable.
Furthermore, for the parameters write
$\beta_0 = \gamma_0 + i \delta_0$ and $\beta_1 = \gamma_1 +i \delta_1$.
Every one of the new terms introduced is, of course, real, and $i^2 = -1$ is imaginary while $j=1, 2, \ldots, n$ indexes the data.
OLS finds $\hat\beta_0$ and $\hat\beta_1$ that minimize the sum of squares of deviations,
$$\sum_{j=1}^n ||z_j - \left(\hat\beta_0 + \hat\beta_1 w_j\right)||^2
= \sum_{j=1}^n \left(\bar z_j - \left(\bar{\hat\beta_0} + \bar{\hat\beta_1} \bar w_j\right)\right) \left(z_j - \left(\hat\beta_0 + \hat\beta_1 w_j\right)\right).$$
Formally this is identical to the usual matrix formulation: compare it to $\a la izquierda(z - X\beta\right)'\left(z - X\beta\right).$ The only difference we find is that the transpose of the design matrix $X'$ is replaced by the conjugate transpose $X^* = \bar X '$. Consequently the formal matrix solution is
$$\hat\beta = \left(X^*X\right)^{-1}X^* z.$$
At the same time, to see what might be accomplished by casting this into a purely real-variable problem, we may write the OLS objective out in terms of the real components:
$$\sum_{j=1}^n \left(x_j-\gamma_0-\gamma_1u_j+\delta_1v_j\right)^2
+ \sum_{j=1}^n\left(y_j-\delta_0-\delta_1u_j-\gamma_1v_j\right)^2.$$
Evidently this represents two linked real regressions: one of them regresses $x$ on $u$ and $v$, the other regresses $y$ on $u$ and $v$; and we require that the $v$ coefficient for $x$ be the negative of the $u$ coefficient for $y$ and the $u$ coefficient for $x$ equal the $v$ coefficient for $y$. Moreover, because the total squares of residuals from the two regressions are to be minimized, it will usually not be the case that either set of coefficients gives the best estimate for $x$ or $y$ alone. This is confirmed in the example below, which carries out the two real regressions separately and compares their solutions to the complex regression.
This analysis makes it apparent that rewriting the complex regression in terms of the real parts (1) complicates the formulas, (2) obscures the simple geometric interpretation, and (3) would require a generalized multivariate multiple regression (with nontrivial correlations among the variables) to solve. We can do better.
Example
As an example, I take a grid of $w$ values at integral points near the origin in the complex plane. To the transformed values $w\beta$ are added iid errors having a bivariate Gaussian distribution: in particular, the real and imaginary parts of the errors are not independent.
It is difficult to draw the usual scatterplot of $(w_j, z_j)$ for complex variables, because it would consist of points in four dimensions. Instead we can view the scatterplot matrix of their real and imaginary parts.
Ignore the fit for now and look at the top four rows and four left columns: these display the data. The circular grid of $w$ is evident in the upper left; it has $81$ points. The scatterplots of the components of $w$ against the components of $z$ show clear correlations. Three of them have negative correlations; only the $s$ (the imaginary part of $z$) and $u$ (the real part of $w$) are positively correlated.
For these data, the true value of $\beta$ is $(-20 + 5i, -3/4 + 3/4\sqrt{3}i)$. It represents an expansion by $3/2$ and a counterclockwise rotation of 120 degrees followed by translation of $20$ units to the left and $5$ units up. I compute three fits: the complex least squares solution and two OLS solutions for $(x_j)$ and $(y_j)$ separately, for comparison.
Fit Intercept Slope(s)
True -20 + 5 i -0.75 + 1.30 i
Complex -20.02 + 5.01 i -0.83 + 1.38 i
Real only -20.02 -0.75, -1.46
Imaginary only 5.01 1.30, -0.92
It will always be the case that the real-only intercept agrees with the real part of the complex intercept and the imaginary-only intercept agrees with the imaginary part fo the complex intercept. It is apparent, though, that the real-only and imaginary-only slopes neither agree with the complex slope coefficients nor with each other, exactly as predicted.
Let's take a closer look at the results of the complex fit. First, a plot of the residuals gives us an indication of their bivariate Gaussian distribution. (The underlying distribution has marginal standard deviations of $2$ and a correlation of $0.8$.) Then, we can plot the magnitudes of the residuals (represented by sizes of the circular symbols) and their arguments (represented by colors exactly as in the first plot) against the fitted values: this plot should look like a random distribution of sizes and colors, which it does.
Finally, we can depict the fit in several ways. The fit appeared in the last rows and columns of the scatterplot matrix (q.v.) and may be worth a closer look at this point. Below on the left the fits are plotted as open blue circles and arrows (representing the residuals) connect them to the data, shown as solid red circles. On the right the $(w_j)$ are shown as open black circles filled in with colors corresponding to their arguments; these are connected by arrows to the corresponding values of $(z_j)$. Recall that each arrow represents an expansion by $3/2$ around the origin, rotation by $120$ degrees, and translation by $(-20, 5)$, plus that bivariate Guassian error.
These results, the plots, and the diagnostic plots all suggest that the complex regression formula works correctly and achieves something different than separate linear regressions of the real and imaginary parts of the variables.
Code
The R
code to create the data, fits, and plots appears below. Note that the actual solution of $\hat\beta$ se obtiene en una sola línea de código. El trabajo adicional-pero no demasiado-sería necesario para obtener la costumbre de mínimos cuadrados de la salida: la varianza-covarianza de la matriz del ajuste, los errores estándar, los valores de p, etc.
#
# Synthesize data.
# (1) the independent variable `w`.
#
w.max <- 5 # Max extent of the independent values
w <- expand.grid(seq(-w.max,w.max), seq(-w.max,w.max))
w <- complex(real=w[[1]], imaginary=w[[2]])
w <- w[Mod(w) <= w.max]
n <- length(w)
#
# (2) the dependent variable `z`.
#
beta <- c(-20+5i, complex(argument=2*pi/3, modulus=3/2))
sigma <- 2; rho <- 0.8 # Parameters of the error distribution
library(MASS) #mvrnorm
set.seed(17)
e <- mvrnorm(n, c(0,0), matrix(c(1,rho,rho,1)*sigma^2, 2))
e <- complex(real=e[,1], imaginary=e[,2])
z <- as.vector((X <- cbind(rep(1,n), w)) %*% beta + e)
#
# Fit the models.
#
print(beta, digits=3)
print(beta.hat <- solve(Conj(t(X)) %*% X, Conj(t(X)) %*% z), digits=3)
print(beta.r <- coef(lm(Re(z) ~ Re(w) + Im(w))), digits=3)
print(beta.i <- coef(lm(Im(z) ~ Re(w) + Im(w))), digits=3)
#
# Show some diagnostics.
#
par(mfrow=c(1,2))
res <- as.vector(z - X %*% beta.hat)
fit <- z - res
s <- sqrt(Re(mean(Conj(res)*res)))
col <- hsv((Arg(res)/pi + 1)/2, .8, .9)
size <- Mod(res) / s
plot(res, pch=16, cex=size, col=col, main="Residuals")
plot(Re(fit), Im(fit), pch=16, cex = size, col=col,
main="Residuals vs. Fitted")
plot(Re(c(z, fit)), Im(c(z, fit)), type="n",
main="Residuals as Fit --> Data", xlab="Real", ylab="Imaginary")
points(Re(fit), Im(fit), col="Blue")
points(Re(z), Im(z), pch=16, col="Red")
arrows(Re(fit), Im(fit), Re(z), Im(z), col="Gray", length=0.1)
col.w <- hsv((Arg(w)/pi + 1)/2, .8, .9)
plot(Re(c(w, z)), Im(c(w, z)), type="n",
main="Fit as a Transformation", xlab="Real", ylab="Imaginary")
points(Re(w), Im(w), pch=16, col=col.w)
points(Re(w), Im(w))
points(Re(z), Im(z), pch=16, col=col.w)
arrows(Re(w), Im(w), Re(z), Im(z), col="#00000030", length=0.1)
#
# Display the data.
#
par(mfrow=c(1,1))
pairs(cbind(w.Re=Re(w), w.Im=Im(w), z.Re=Re(z), z.Im=Im(z),
fit.Re=Re(fit), fit.Im=Im(fit)), cex=1/2)