----- References: [1] My post in this document on 2019_12_04, covering my intuitive understanding of the SVD. [2] Linear Algebra and its Applications, second edition by David C Lay. p 450-451 on how diagonal matrices remove the "cross product" of the Quadratic Form and thus simplify its expansion. [3] Lay, p459, Example 1, on how Quadratic Forms with no cross product are easily minmaxed. [4] Lay, p445, Theorem 2, on diagonalizability of symmetric matrices [5] (A*A)* = (A*A**) = A*A [6] Lay, p467, Example 1, the expansion of ||Ax||^2 [7] Lay, p467, Example 1, "The Quantity ||Ax||^2 is maximized at the same x that maximizes ||Ax||, and ||Ax||^2 is easier to study." [8] Lay, p451, Change of Variable in a Quadratic Form [9] Lay, p461, on the relation of x*Ax = y*Dy, when x = Py [10] Lay, p460-461, Theorem 6, and its proof [11] Lay, p314-315, Theorem 5, The Diagonalization Theorem [12] Lay, p468, The Singular Values of an MxN Matrix ----- ----- Notation: ||A|| = "Length of A" A* = "A Transpse" ----- The whole point of the SVD is to find the vectors x that minmax ||Ax||. See [1]. The SVD works because of the following. 1. If B is a symmetric matrix then y = x*Bx is a Quadratic Form. 2. y = x*Bx is easily minmaxed whenever B is a diagonal matrix. The largest entry of B is the max, the smallest entry is the min. See [2], [3] 3. if B is not diagonal it can be made diagonal, because B is symmetric and all symmetric matrices can be orthogonally diagonalized. See [4] 4. x*(A*A)x is always a valid Quadratic Form, no matter what A is, because A*A is always symmetric. See [5]. 5. x*(A*A)x is easily minmaxed, because A*A is always symmetric, and symmetric matrices can always be diagonalized, and Quadratic Forms with diagonal matrices are easy to minimax 6. The vectors x that minmax x*(A*A)x, also minmax ||Ax||^2 because: x*(A*A)x = ||Ax||^2 because: ||Ax||^2 = (Ax)*(Ax) = x*A*Ax = x*(A*A)x see [6]. 7. The vectors x that minmax ||Ax||^2, also minmax ||Ax|| See [7] 8. So... To minmax ||Ax||, you minmax ||Ax||^2. To do that, you minmax x*(A*A)x. To do that, you diagonalize (A*A) so that A*A = PDP* 9. Using a change of variable, you can say that: given: A*A = PDP* P*P = I x = Py <-- change of variable we can show: x*(A*A)x = y*Dy because: x*(A*A)x = x*(PDP*)x = (Py)*(PDP*)Py = (y*P*)(PDP*)Py = y*(P*P)D(P*P)y = y*Dy see [8],[9] 10. Since x*(A*A)x = y*Dy, x*(A*A)x is minmaxed when y*Dy is minmaxed. y*Dy is minmaxed when y is a standard basis vector such as <1,0,0>. And since x = Py we can say x*(A*A)x is minmaxed when x is a column of P. See [10] 11. Since x*(A*A)x = y*Dy, the minmax values of x*(A*A)x are the minmax values y*Dy, which are the diagonal entries of D. 12. Since A*A = PDP*, we know the diagonal entries of D are the eigenvalues of A*A, and the columns of P are the eigenvectors of A*A. See [11] 13. Thus ||Ax|| is minmaxed by the eigenvalues and eigenvectors of A*A The whole chain of minmax equivalence is: ||Ax|| is minmaxed by ||Ax||^2 is minmaxed by x*(A*A)x is minmaxed by y*Dy is minmaxed by the eigenvalues and eigenvectors of A*A Important Note!: ||Ax|| = sqrt(||Ax||^2). Thus the squareroots of the eigenvalues of A*A are the singular values of A. See [12]. 14. Then from here, we can get the SVD by construction: For any set of vectors {v1,...,vr} (of the right size), you can derive A = UEV*, as follows: {v1,...,vr} --> {Av1,...,Avr} // v --> Av {Av1,...,Avr} = {Av1,...,Avr} // A set of column vectors, // equals itself, trivially. {Av1,...,Avr} = {u1,...,ur} // where u1 = Av1, ur = Avr, etc // important: when v is transformed // by A, it can change dimension! // so v might be a 3-vector, but // Av = u, might be a 2-vector, or // 20-vector, etc. A|v1,...,vr| = |u1,...,ur| A|v1,...,vr| = |U1,...,Ur|E // where U1 = u1 normalized, // and E is a diagonal matrix where // each diagonal entry is the length // of the corresponding u, // aka the corresponding Av, // aka the singular values of A AV = UE A = UEV* 15. However, A = UEV* is not very useful when {v1,...,vr} is just any set of vectors. Its real power is when {v1,...,vr} are the eigenvectors of A*A, because it's those vectors that minmax ||Av||, per 13 above.