a) Gilbert Strang's OpenCourseWare (MIT) lectures on YouTube really difficult to beat. Geometric insight, clear examples. One of the best lecturers in the world.
b) The CRC Handbook of Linear Algebra chapter 3. Short, "Just the Facts" and links to more detail.
c) The "eigshow" demo in Matlab. Launch MATLAB. Go to "Get More Apps". Get "Cleve_Lab". Now the command "eigshow()" works, with some pre-loaded demos; or you can choose your own 2 by 2 matrix and call it, e.g. "eigshow( [0.5, 0.5; 0.5, 0.5] )".
I cannot do better on my own. So I recommend those resources to you. Of course all of Gil Strang's lectures are online and are very accessible and useful, and the Handbook of Linear Algebra is a daunting resource, covering about 1/3 of all of linear algebra. And Cleve's Laboratory (written by Cleve Moler, the co-founder of MATLAB) is amazing.
Instead I will do a little "application" of linear algebra, and show you how you can invert an infinite (but upper triangular) matrix using Cramer's Rule.
If we know a series for $y= f(x) = a_0 + a_1 (x-\alpha) + a_2 (x-\alpha)^2 + \cdots$ and if we want to know $x$ for a given $y$, then an old-fashioned technique for doing so is to find a series for $x$ in terms of $y$.
y = arctan(x);
kilobytes used=743, alloc=5168, time=0.19
x = tan(y);
Order := 8;
series( arctan(x), x );
Order := 8
series(tan(y),y);
This is called series reversion, which is different from what happens when you take the reciprocal $1/f(x)$ of the series. There is opportunity for confusion because the phrases "taking the reciprocal" and "solving for $x$" are both sometimes called "inversion" because a reciprocal is a multiplicative inverse, and solving for $x$ means computing the functional inverse $x = \mathrm{inv}f(y)$. We will use the word "reversion" consistently.
If the series is centred at $x=\alpha$ then $y=a_0$ is where the $y$-series should be centred. We are also going to assume that $a_1 \ne 0$ so that $y-a_0 = O(x-\alpha)$ as $x \to \alpha$. In the terminology of Henrici, this makes the series for $y$ into what he called an "almost unit". This requires that $f'(\alpha)$, which is $a_1$, not be zero: $x=\alpha$ can't be a local maximum, local minimum, or other kind of critical point.
First we clean things up a little bit. We write $v = (y-a_0)/a_1$ and $u = (x-\alpha)$. Then the series can be written as $$ v = u + \beta_2 u^2 + \beta_3 u^3 + \cdots $$ where $\beta_k = a_k/a_1$. So, without loss of generality, we may go back to our original notation using $x$ and $y$ and just assume that the original series was of the form $y = x + a_2 x^2 + a_3 x^3 + \cdots$.
Powers of the series are going to turn out to be important.
Order := 5:
Y := series( x + a__2 * x^2 + a__3 * x^3 + a__4*x^4 + O(x^5), x );
We're using the built-in Maple "series" command here, which can actually do series reversion, so our whole exercise here is to see if we can figure out for ourselves a way to make it work. The idea is that we want to know what's going on, to understand the process. If all we want to do is (instead) just reverse a series or two, then we could just use the Maple command:
solve( Y=y, x );
That's probably faster than any code we will ever write, because the "series" command is actually in the "Maple kernel," being one of the handful of routines that was deemed important enough to be written as efficiently as possible.
But if our purpose is understanding what's going on, or at least convincing ourselves that we could build a system that did what "series" does, then we should look into methods. And, besides, it's rather fun.
Since we have written code to do series algebra ourselves, and published it on occasion, we feel comfortable in using "series" just for multiplying series together: the Cauchy convolution is easy enough to program, and fast enough for our purposes here. Yes, it's a cheat to use "series", but we could use our own code if we had to.
There are two naive ways to proceed, one worse than the other. The worse one is to take the unknown series for $x$ as a function of $y$ and substitute it into the equation for $y = x + a_2 x^2 + \cdots$. Equating coefficients on both sides will give a sequence of nonlinear equations to solve for the unknown coefficients. Surprisingly, this works! It works because the nonlinear equations are actually triangular, and one may solve them in sequence.
The better one, though, is to put the known series for $y$ into the equation $$ x = y + \gamma_2 y^2 + \gamma_3 y^3 + \cdots $$ and equate coefficients. This gives a triangular linear system of equations for the unknowns $\gamma_k$, which is much easier to solve.
This suggests that we should compute the series for $y^2$, $y^3$, and so on.
This cell contains some macros for LaTeX notation used later.
$\newcommand{\goes}[1]{{\>{}_{\overleftarrow{#1}}\>}}$ $\newcommand{\mat}[1]{\mathbf{#1}}$
Y2 := series(Y*Y,x);
Y3 := series( Y*Y2, x );
Y4 := series( Y*Y3, x );
Combining all the above information, we see that we can explicitly construct the matrix in the equation below (up to as high an order as we desire):
Here's code to construct that triangular matrix, given input of a function $y = f(x)$ which has $f(0)=0$. It doesn't even assume that the derivative is $1$, which we want for simplicity (but isn't really needed, so long as $f'(0) \ne 0$.)
Pretty clearly "all we need to do" to reverse the series is to invert that (triangular!) matrix. This is easy-peasy!
T := proc(f, x, {n::posint := Order-1})
local A, i, j, fpoweri;
A := Matrix(n, n, 0);
fpoweri := 1;
for i to n do
fpoweri := series(f*fpoweri, x, n + 1);
for j from i to n do
A[i, j] := coeff(fpoweri, x, j);
end do;
end do;
return A;
end proc:
Order := 8;
f := x -> x*(1+x)^(1/3);
x -> x*(x+1)^(1/3)
series( f(x),x);
A := T( f(x), x );
Ai := A^(-1);
hex := add(Ai[1,k]*y^k, k=1..Order-1);
series( f(hex)-y, y, Order+2);
This test shows that the method works, as-is.
But it seems a bit wasteful, somehow. All we want is the first row of the inverse matrix. Can we find that without inverting the whole matrix?
Yes. The answer is yes. One method is by the venerable Cramer's Rule, in spite of its (fairly well-deserved) reputation as being extremely inefficient for solving general systems. But as we will see, for solving triangular systems where we just want the first row of the answer, it works out quite neatly: and all we have to do is compute a sequence of leading principal minors, and this winds up being of computational cost only $O(n^2)$, which is much less than the $O(n^3)$ cost of inverting $\mat{T}$ or indeed computing the matrix $\mat{T}$ to begin with, at least in the naive way that we have presented here.
Here's the code (which, unlike the previous, assumes that $a_1 = 1$): the acronym is for "Unit Upper Hessenberg Determinant" although actually it computes a sequence of determinants via the following recurrence relation, which computes the leading principal minors $\det \mat{H}^{(j)}$ (the determinants of the $1..j$ by $1..j$ leading block) of a unit upper Hessenberg matrix $\mat{H}$ (that is, one with entries $h_{i,j} = 0$ if $i>j+1$, $h_{i+1,i} = 1$, and potentially anything in the upper triangle).
$$ \det\mat{H}^{(j)} = (-1)^{j+1} \left( h_{1,j} + \sum_{i=2}^j (-1)^{i-1} h_{i,j} \det\mat{H}^{(i-1)} \right) $$To prove that recurrence relation, use Laplace expansion about the final column of the unit upper Hessenberg matrix $\mat{H}$.
UUHDet := proc(H::Matrix)
local i,m,n,dH;
use LinearAlgebra in
(m,n) := Dimensions(H);
dH := Array(1..n);
dH[1] := H[1,1];
for j from 2 to n do
dH[j] := (-1)^(j+1)*(H[1,j] +
add( (-1)^(i-1)*H[i,j]*dH[i-1], i=2..j ) );
end do;
return dH;
end use;
end proc:
V := UUHDet( A[1..Order-2, 2..Order-1] );
We see above the sequence of answers we got before, apart from alternating sign and beginning with the leading $1$. Inspection of the code shows that yes, the cost was quadratic in the dimension $n$, as stated.
So, how does it work? We remind you what Cramer's Rule is, and indeed give a proof.
We need a special notation. Let $\mat{A} \goes{i} \mat{v}$ mean the matrix where the $i$th column of the matrix $\mat{A}$ has been replaced by the (column) vector $\mat{v}$. Now, suppose that we wish to solve $\mat{A}\mat{x} = \mat{b}$, and that the matrix $\mat{A}$ has columns $\mat{a}_1$, $\mat{a}_2$, $\ldots$, $\mat{a}_n$. Consider the following matrix-matrix product:
because $\mat{I} \goes{i} \mat{x} = [\mat{e}_1, \mat{e}_2, \ldots, \mat{e}_{i-1}, \mat{x}, \mat{e}_{i+1}, \ldots, \mat{e}_n]$ and, by the matrix-times-matrix-expressed-as-columns view of matrix-matrix multiplication, the multiplication gives $[\mat{a}_1, \mat{a}_2, \ldots, \mat{a}_{i-1}, \mat{b}, \mat{a}_{i+1}, \ldots, \mat{a}_n]$. Taking determinants of both sides, $$ \mathrm{det}\left(\mat{A}\left( \mat{I} \goes{i} \mat{x} \right) \right) = \mathrm{det}\left(\mat{A}\right)\cdot \det\left( \mat{I} \goes{i} \mat{x} \right) = \mathrm{det}\left(\mat{A}\right) \cdot x_i = \det\left( \mat{A} \goes{i} \mat{b} \right)\>, $$ where the appearance of the $i$th component of the vector $x$ falls out from Laplace expansion of the determinant of $\mat{I} \goes{i} \mat{x}$ along the $i$th row. The last equation gives Cramer's Rule. We learned this proof from Carlson et al, 1992; the authors of that paper cite older sources for the basic ideas.
We will use Cramer's Rule $n$ times, to get the first row (and only the first row) of $\mat{T}^{-1}$. We remind you that you can compute the $k$th column of $\mat{T}^{-1}$ by solving the linear system $\mat{T}\mat{v}_k = \mat{e}_k$. Therefore the first entry of that column will be $$ v_{1,k} = \frac{\det\left( \mat{T} \goes{1} \mat{e}_k \right)}{\det \mat{T}} $$
The matrix $\mat{T}$ is unit upper triangular, so $\det\mat{T} = 1$.
What happens when we replace the first column of an upper triangular matrix with an elementary vector that has zeros in all but the $k$th row? Peter Lancaster's dictum—use the 5 by 5 case as an example—explains it all.
Matrix(5,5, (i,j)->`if`(i=j,1,`if`(j>i, tau[i,j], 0)) );
Replacing the first column by $\mat{e}_1$ just gives us $\mat{T}$ back again, and its determinant is $1$, so the leading coefficient of the reversion is $1$. This is correct.
Replacing the first column by $\mat{e}_2$ gives $$ \left[\begin{array}{c|cccc} 0 & \tau_{1,2} & \tau_{1,3} & \tau_{1,4} & \tau_{1,5} \\ \hline 1 & 1 & \tau_{2,3} & \tau_{2,4} & \tau_{2,5} \\ \hline 0 & 0 & 1 & \tau_{3,4} & \tau_{3,5} \\ 0 & 0 & 0 & 1 & \tau_{4,5} \\ 0 & 0 & 0 & 0 & 1 \end{array}\right] $$ and now Laplace expansion about the first column says that this determinant is the negative of the determinant of $$ \left[\begin{array}{cccc} \tau_{1,2} & \tau_{1,3} & \tau_{1,4} & \tau_{1,5} \\ 0 & 1 & \tau_{3,4} & \tau_{3,5} \\ 0 & 0 & 1 & \tau_{4,5} \\ 0 & 0 & 0 & 1 \end{array}\right] $$ which gives $-\tau_{1,2}$. Again this is correct.
Replacing the first column by $\mat{e}_3$ gives $$ \left[\begin{array}{c|cccc} 0 & \tau_{1,2} & \tau_{1,3} & \tau_{1,4} & \tau_{1,5} \\ 0 & 1 & \tau_{2,3} & \tau_{2,4} & \tau_{2,5} \\ \hline 1 & 0 & 1 & \tau_{3,4} & \tau_{3,5} \\ \hline 0 & 0 & 0 & 1 & \tau_{4,5} \\ 0 & 0 & 0 & 0 & 1 \end{array}\right] $$ and now Laplace expansion about the first column gives the determinant of $$ \left[\begin{array}{cccc} \tau_{1,2} & \tau_{1,3} & \tau_{1,4} & \tau_{1,5} \\ 1 & \tau_{2,3} & \tau_{2,4} & \tau_{2,5} \\ 0 & 0 & 1 & \tau_{4,5} \\ 0 & 0 & 0 & 1 \end{array}\right]\>. $$ The upper left 2 by 2 block is, although it's not so obvious, upper Hessenberg in shape. The lower 2 by 2 block is unit upper triangular. Looking back at the 1 by 1 case we see that we can say that that case was upper Hessenberg, too, even if only "trivially."
If we are working on the fourth column, then replacing the first column of $\mat{T}$ by $\mat{e}_4$ gives
$$ \left[\begin{array}{c|cccc} 0 & \tau_{1,2} & \tau_{1,3} & \tau_{1,4} & \tau_{1,5} \\ 0 & 1 & \tau_{2,3} & \tau_{2,4} & \tau_{2,5} \\ 0 & 0 & 1 & \tau_{3,4} & \tau_{3,5} \\ \hline 1 & 0 & 0 & 1 & \tau_{4,5} \\ \hline 0 & 0 & 0 & 0 & 1 \end{array}\right] $$Evaluating the determinant by Laplace expansion on the first column gives a partitioned matrix to take the determinant of: it is block diagonal where the upper left block is the unit upper Hessenberg matrix $$ \left[\begin{array}{ccc} \tau_{1,2} & \tau_{1,3} & \tau_{1,4} \\ 1 & \tau_{2,3} & \tau_{2,4} \\ 0 & 1 & \tau_{3,4} \end{array}\right] $$ and the lower block is just $1$ (more generally, it will be an upper triangular matrix with unit diagonal).
The final entry is the determinant of $$ \left[\begin{array}{c|cccc} 0 & \tau_{1,2} & \tau_{1,3} & \tau_{1,4} & \tau_{1,5} \\ 0 & 1 & \tau_{2,3} & \tau_{2,4} & \tau_{2,5} \\ 0 & 0 & 1 & \tau_{3,4} & \tau_{3,5} \\ 0 & 0 & 0 & 1 & \tau_{4,5} \\ \hline 1 & 0 & 0 & 0 & 1 \end{array}\right] $$ which, by Laplace expansion about the first column, is just the determinant of the unit upper Hessenberg matrix that occupies the $1..4$ by $2..5$ block.
We conclude that the first entry of the inverse will just be $1$, the next entry will be the negative of the determinant of $[\tau_{1,2}]$ which is just $-\tau_{1,2}$, the following will be the determinant of the two-by-two matrix $$ \left[\begin{array}{cc} \tau_{1,2} & \tau_{1,3} \\ 1 & \tau_{2,3} & \end{array}\right] $$ and so on. We get a bonus: each determinant is a leading principal minor of the same unit upper Hessenberg matrix, and there is a recurrence relation for the determinant of a unit upper Hessenberg matrix that computes exactly those. That recurrence relation was encoded in the Maple procedure above, and its proof (by Laplace expansion about the final column) is very similar to the construction that we just went through.
A := T( x + a[2]*x^2 + a[3]*x^3 + a[4]*x^4 + a[5]*x^5, x, n=5 );
map( expand, UUHDet( A[1..4,2..5] ) );
solve( series(x + a[2]*x^2 + a[3]*x^3 + a[4]*x^4 + a[5]*x^5-y +O(x^6),x), x);
Order := 13:
A := T( x*exp(x), x );;
V := UUHDet( A[1..Order-2, 2..Order-1] );
series( LambertW(x), x );
Series reversion is an old (and old-fashioned) technique. We believe that Newton knew it well. There is a very clearly specified algorithm for it in Knuth's Volume 2: Seminumerical Algorithms. The book Applied and Computational Complex Analysis, Volume 1, by Henrici describes it in the first chapter. In 1978 Brent and Kung published the first "Fast Algorithm" for it, and in 2012 Fredrik Johansson published an even faster one, by a constant factor.
There is a further refinement, called Lagrange Inversion, which for certain functions $f(x)$ allows an explicit formula for the $n$th term of the reversion. This amounts to finding an explicit formula for the first row of the matrix inverse. As a side effect, it provides explicit formulas for all the rows of the matrix inverse. We will take this up later. Interestingly, Lambert claims to have invented it first, but blames the journal Acta Helvetica for losing his papers with the proof. We now believe that this was entirely possible.