La proximidad de una función de $f$ a "rectitud" mide el grado en que $f$ está "cerca" de una función lineal del tiempo. Gran flexibilidad y poder para especificar la rectitud puede lograrse mediante la ampliación de las funciones lineales, que son combinaciones lineales de la función constante $1$ y la identidad de la función $t\to t$, a una base $E$ el (Hilbert) espacio de $L^2$ integrable funciones de la serie de los tiempos.
Convencional extensiones de incluir los polinomios
$$E = (e_0, e_1, e_2, \ldots, e_k, \ldots) = (1, t, t^2, \ldots, t^k, \ldots)$$
but can be any set of independent functions. Intuitively, the further out we go into one of these bases, the more we "depart from linearity."
Given $p$ time series
$$\mathrm{x}_i = ((t_1, x_{i1}), (t_2, x_{i2}), \ldots, (t_j, x_{ij}), \ldots, (t_n, x_{in}))$$
let us compute their projections onto the first $p+1$ elements of this basis using ordinary least squares, giving
$$x_{ij} = b_{i0} + b_{i1}t_j + b_{i2}e_2(t_j) + \cdots + b_{ip}e_p(t_j) + \varepsilon_{ij}.$$
We seek a linear combination of the $\mathrm{x}_i$, with coefficients $\mathbf{\lambda} = (\lambda_1, \lambda_2, \ldots, \lambda_p)$ that is "straightest" in the sense that all the coefficients of $e_j$ for $j\gt 1$ vanish. That is,
$$\sum_{i=1}^p \lambda_i b_{ij} = 0, \ j = 2, 3, \ldots, p.$$
This is the most we can hope for with $p$ series: if we tried to make one more coefficient vanish, we would have $p$ simultaneous linear equations governing the $\lambda_i$ and usually only $\lambda_i=0$ would be the solution. By invoking only $p-1$ equations, we are guaranteed to have a system with a nontrivial kernel.
We are left to choose a basis element of that kernel. Assuming it is just one-dimensional (which will generically be the case), we may impose one more condition. A convenient one is to make the resulting linear combination look like an arithmetic mean of the time series. I do that by standardizing it so that its variance is $1/p$ times the collective variance of all the time series data. This can easily be done in two steps: first, require that the coefficients sum to unity:
$$\lambda_1 + \lambda_2 + \ldots + \lambda_p = 1.$$
Then, perform the standardization. All other solutions will be linear functions of this one plus a linear function of time.
Let's turn to a worked example. This proposal is implemented in the following R
code, which generates an array of time series (which may have irregular spacing and multiple observations per time), finds the coefficients $b_{ik}$ using least squares fits, adjoins the sum-to-unity vector $(1,1,\ldots, 1)$ to this matrix, and solves for $\lambda$. That directly produces the linear combination of the original series, $\sum_{i}\lambda_i \mathrm{x}_i$, que luego es estandarizado. La serie original se trazan (en color, usando las líneas de trazos) y la "más directa" combinación lineal es overplotted (línea sólida negra) para la comparación.
#
# Specify the problem.
#
n <- 96 # Number of time steps
k <- 1 # Observations per time step
p <- 5 # Number of series
shape <- 3 # Higher (positive) values make times more evenly spaced
set.seed(17) # Makes the results reproducible
#
# Create time series, one per column of `y`. The times themselves are in `times`.
#
q <- p
times <- rep(c(cumsum(rgamma(n, shape, shape)), NA), k)
n.k <- length(times)
beta <- round(matrix(rnorm(q*p), q), 1)
y <- matrix(rnorm(q*n.k), ncol=p) %*% beta +
100 * outer(times, 1:p, function(i,j) sin(2*j*i/n)) + rnorm(p*n.k, sd=sqrt(n.k))
#
# Construct fits using a basis of orthogonal polynomials.
#
x <- times; x[is.na(times)] <- 0 # (`poly` chokes on NA values, so zero them out)
basis <- poly(x, degree=p-1) # Includes a linear term
fits <- apply(y, 2, function(z) coef(lm(z ~ basis)))
fits["(Intercept)", ] <- 1 # Make coefficients sum to unity
lambda <- solve(fits, c(1, rep(0, p-1)))
y.hat <- y %*% lambda
#
# Standardize the combination to look like an average of the time series
#
y.hat <- (y.hat - mean(y.hat, na.rm=TRUE)) / sd(y.hat, na.rm=TRUE) *
sd(y, na.rm=TRUE) / sqrt(p) + mean(y, na.rm=TRUE)
#
# Display the results.
#
colors <- as.list(rainbow(p))
times.range <- range(times, na.rm=TRUE)
plot(times.range, range(c(y, y.hat), na.rm=TRUE), type="n",
xlab="Time", ylab="Value", main="Data and Fit")
invisible(mapply(function(z, c) lines(times, z, col=c, lwd=2, lty=3),
as.data.frame(y), colors))
lines(times, y.hat, lwd=2, col="Black", lty=1)