-----
References:
	[1] My post in this document on 2019_12_04, covering 
	    my intuitive understanding of the SVD.
	[2] Linear Algebra and its Applications, second edition
	    by David C Lay. p 450-451 on how diagonal matrices remove the
	    "cross product" of the Quadratic Form and thus simplify its
	    expansion.
	[3] Lay, p459, Example 1, on how Quadratic Forms with no 
	    cross product are easily minmaxed.
	[4] Lay, p445, Theorem 2, on diagonalizability of symmetric
	    matrices
	[5] (A*A)* = (A*A**) = A*A
	[6] Lay, p467, Example 1, the expansion of ||Ax||^2
	[7] Lay, p467, Example 1, "The Quantity ||Ax||^2 is maximized at 
	    the same x that maximizes ||Ax||, and ||Ax||^2 is easier to 
	    study."
	[8] Lay, p451, Change of Variable in a Quadratic Form
	[9] Lay, p461, on the relation of x*Ax = y*Dy, when x = Py
	[10] Lay, p460-461, Theorem 6, and its proof
	[11] Lay, p314-315, Theorem 5, The Diagonalization Theorem
	[12] Lay, p468, The Singular Values of an MxN Matrix
-----

-----
Notation:
 ||A||  = "Length of A"
 A*     = "A Transpse"
-----


The whole point of the SVD is to find the vectors x that minmax
||Ax||. See [1].  The SVD works because of the following.


1. If B is a symmetric matrix then
		y = x*Bx
	is a Quadratic Form.

2. y = x*Bx is easily minmaxed whenever B is a diagonal matrix.  The
largest entry of B is the max, the smallest entry is the min.  See
[2], [3]

3. if B is not diagonal it can be made diagonal, because B is
symmetric and all symmetric matrices can be orthogonally
diagonalized. See [4]

4. x*(A*A)x is always a valid Quadratic Form, no matter what A is,
because A*A is always symmetric.  See [5].

5. x*(A*A)x is easily minmaxed, because A*A is always symmetric, and
symmetric matrices can always be diagonalized, and Quadratic Forms
with diagonal matrices are easy to minimax

6. The vectors x that minmax x*(A*A)x, also minmax ||Ax||^2 because:
		x*(A*A)x = ||Ax||^2 
	because: 
		||Ax||^2 = (Ax)*(Ax) = x*A*Ax =
		x*(A*A)x see [6].

7. The vectors x that minmax ||Ax||^2, also minmax ||Ax|| See [7]

8. So...  
	To minmax ||Ax||, you minmax ||Ax||^2.  
	To do that, you minmax x*(A*A)x.  
	To do that, you diagonalize (A*A) so that A*A = PDP*

9. Using a change of variable, you can say that:

	given:
		A*A = PDP*
		P*P = I
		x = Py   <-- change of variable

	we can show:
		x*(A*A)x = y*Dy

	because: 
		x*(A*A)x
		  = x*(PDP*)x
		  = (Py)*(PDP*)Py 
		  = (y*P*)(PDP*)Py 
		  = y*(P*P)D(P*P)y
		= y*Dy

	see [8],[9]


10. Since x*(A*A)x = y*Dy, x*(A*A)x is minmaxed when y*Dy is minmaxed.
y*Dy is minmaxed when y is a standard basis vector such as <1,0,0>.
And since x = Py we can say x*(A*A)x is minmaxed when x is a column of
P.  See [10]

11. Since x*(A*A)x = y*Dy, the minmax values of x*(A*A)x are the
minmax values y*Dy, which are the diagonal entries of D.

12. Since A*A = PDP*, we know the diagonal entries of D are the
eigenvalues of A*A, and the columns of P are the eigenvectors of A*A.
See [11]

13. Thus ||Ax|| is minmaxed by the eigenvalues and eigenvectors of A*A
The whole chain of minmax equivalence is:

		||Ax||
		is minmaxed by ||Ax||^2 
		is minmaxed by x*(A*A)x 
		is minmaxed by y*Dy
		is minmaxed by the eigenvalues and eigenvectors of A*A

Important Note!: ||Ax|| = sqrt(||Ax||^2).  Thus the squareroots of the
eigenvalues of A*A are the singular values of A.  See [12].


14. Then from here, we can get the SVD by construction:

For any set of vectors {v1,...,vr} (of the right size), you can derive
A = UEV*, as follows:

{v1,...,vr} --> {Av1,...,Avr}  // v --> Av

{Av1,...,Avr} = {Av1,...,Avr}  // A set of column vectors, 
                               // equals itself, trivially.

{Av1,...,Avr} = {u1,...,ur}    // where u1 = Av1, ur = Avr, etc
                               // important: when v is transformed
                               // by A, it can change dimension!
                               // so v might be a 3-vector, but
                               // Av = u, might be a 2-vector, or
                               // 20-vector, etc.

 A|v1,...,vr| = |u1,...,ur|

 A|v1,...,vr| = |U1,...,Ur|E  // where U1 = u1 normalized,
                              // and E is a diagonal matrix where
                              // each diagonal entry is the length
                              // of the corresponding u,
                              // aka the corresponding Av,
                              // aka the singular values of A

         AV   =   UE
          A   =   UEV*


15. However, A = UEV* is not very useful when {v1,...,vr} is just any
set of vectors.  Its real power is when {v1,...,vr} are the
eigenvectors of A*A, because it's those vectors that minmax ||Av||,
per 13 above.