This didactic synthesis compares three solution methods for polynomial approximation and systematically presents their common characteristics and their close interrelations:
1. Classical Gram–Schmidt orthonormalization and Fourier approximation in
2. Linear least-squares solution via QR factorization on an equally spaced grid in
3. Linear least-squares solution via the normal equations method in and on an equally
spaced grid in
The first two methods are linear least-squares systems with Vandermonde matrices ; the normal equations contain matrices of Hilbert type . The solutions on equally spaced grids in converge to the solutions in All solution characteristics and their relations are illustrated by symbolic or numeric examples and graphs.
Definitions and Notation
Let . Consider the Hilbert space of real-valued square integrable functions (or , for short), equipped with Lebesgue measure and scalar product
(1) |
and the corresponding -norm
scalar products can be approximated by scalar products on discrete grids in based on Riemann sums and similarly for norms.
Partition the finite interval into subintervals by the points
and set
Suppose that is a bounded function on . Let be any point in the subinterval and define the grid
The Riemann sums on the partition and grid are defined by
(2) |
(3) |
(4) |
(5) |
Equation (3) is called the left-hand Riemann sum, (4) the right-hand Riemann sum and (5) the (composite) midpoint rule.
For an equally spaced partition, the step size is
The equally spaced partition points are
and the equally spaced grid of length (excluding the endpoint ) is
It is also possible to use the grid points or grid shifted by an amount , where so that , as
Let
(6) |
For equally spaced grids, the Riemann sums simplify to
(7) |
Setting , and gives the left-hand Riemann sum, the composite midpoint rule and the right-hand Riemann sum, respectively. The error of the Riemann sums is defined as
The set of continuous real-valued functions forms a dense subspace of , [1], Theorem (13.21). For , the restrictions and to this grid are well-defined. Define the -dimensional scalar product on this grid:
(8) |
The -dimensional 2-norm is
The factor ensures that the norms of constant functions agree:
Denote the linear space of polynomials with real coefficients of degree at most by and define the polynomial by
The polynomial can be written as a scalar product (or dot product) of two -tuples, the monomials up to degree and the coefficients:
Introducing the Vandermonde matrix
(9) |
every polynomial of degree can be written as the product of a matrix and a vector as
The product of a matrix and an -vector is regarded as a 1-vector, not a scalar, as in Mathematica.
Restricting the Vandermonde matrix to the interval gives an operator mapping into :
(10) |
Whereas is an unbounded operator, is a bounded operator with respect to the
2-norms on and .
The polynomial approximates in the -norm, as measured by
(11) |
In matrix-vector notation, this constitutes a linear least-squares problem for the coefficients :
(12) |
where
Now take a discrete grid
and sample on this discrete grid:
The polynomial of degree approximates in the 2-norm on this grid as measured by
(13) |
In matrix-vector notation, this constitutes a linear least-squares problem for the coefficients :
(14) |
where
(15) |
A rectangular or square matrix of this form is called a Vandermonde matrix.
Let , be Hilbert spaces with scalar products , and let be a bounded linear operator defined everywhere on . Then the adjoint operator of is defined by the identity ([2], p. 196, Definition 1)
(16) |
For Hilbert spaces , over the reals, one writes instead of .
Theorems and Properties for Later Use
Simple Quadrature Formula
All Riemann sums integrate constant functions exactly on any grid, since
If is bounded on and continuous except for finitely many points, it has a Riemann integral. Consult [3], Chapter 2, for proofs and further references on quadrature formulas.
Theorem 1
(17) |
A similar result holds for the left-hand Riemann sums (with ).
If , the error term of the elementary midpoint formula is given in [3], (2.6.5):
(18) |
Therefore the error of the composite midpoint formula can be bounded by
(19) |
Convergence of Discrete to Scalar Products
By Theorem 1, for functions and , the discrete scalar product converges at least as fast as to the scalar product:
By equation (19), for functions , the discrete scalar product converges at least as fast as to the scalar product:
Singular Value Decomposition
See [4], sections 2.4 and 2.6.
Theorem 2
Matrix Properties from SVD
(20) |
Define to be the index of the last positive singular value
Then
(21) |
The condition number of a rectangular matrix with full column rank with respect to the 2-norm (in short, the 2-condition number) is defined as the quotient of its largest to its smallest singular value ([4], equation (2.6.5)):
(22) |
By equation (20), the 2-condition number has the properties
(23) |
(24) |
If is an invertible matrix, . The SVD of is obtained from the SVD of as
(25) |
(26) |
If is a real matrix with orthonormal columns , it can be completed to an orthogonal matrix . Therefore the SVD of is
(27) |
The 2-condition number of is
(28) |
Adjoint Linear Operators
See [2], sections VII.1 and VII.2.
Theorem 3
.
Finite-Dimensional Linear Operators
For and a subset , linear combinations from the subset define a finite-dimensional linear operator
(29) |
Obviously, and if and only if is linearly independent.
is bounded and has the matrix representation
Apply the definition of the adjoint operator and notice that the first scalar product is that of , ; then
Therefore the adjoint operator is
(30) |
Substituting into the preceding equation gives the representation of as an matrix (note the scalar products are taken in ):
(31) |
Here is Hermitian positive semidefinite if is over the complex numbers, and symmetric positive semidefinite if is over the reals, and
Properties of Vandermonde Matrices
A polynomial of degree has at most distinct zeros, therefore the set of monomials is a linearly independent subset of and
(32) |
By [5], the determinant of the Vandermonde matrix is the product
Therefore, the rectangular Vandermonde matrix (15) has full rank if and only if the points are pairwise disjoint.
Mathematica Definitions for Polynomial Approximation
Names are chosen according to previous notation and terminology.
This defines the 2-condition number.
This defines symbolic integration with time constrained to 10 seconds.
This defines numerical integration.
The function (which is ) first attempts symbolic integration; if that is unsuccessful or takes too long, it performs numerical integration.
This defines the scalar product in .
Since is listable in and , it also implements the adjoint operator for a set of functions according to equation (32).
This defines the norm in .
This defines functions for discrete grids.
If , or is a machine number, the functions , , and return machine numbers.
This defines the functions , , and .
To avoid potential clashes with predefined values for the variables , and , the script letters , and are used for symbolic results.
These sample functions are used in the commands.
Linear Least-Squares Solution via QR Factorization
Let . Given a rectangular data matrix and an observation vector ,
(33) |
the linear least-squares (LS) problem is to find:
(34) |
A comprehensive description of the QR factorization of an matrix via Householder, Givens, fast Givens, classical Gram–Schmidt and modified Gram–Schmidt methods is given in [4], section 5.2. Here only the essential steps are presented.
Principle of QR Factorization
Let be an orthogonal matrix. Such a matrix preserves lengths of vectors:
Given the real matrix , , the goal is to construct an orthogonal matrix such that
where is an upper-triangular matrix of the form
Obviously
Linear Least-Squares Solution by “Thin” QR Factorization
The Mathematica function QRDecomposition deviates from the full QR factorization as follows:
is output as an matrix. The rows of are orthonormal. Only the upper-triangular
submatrix is output.
Then the unique solution of the upper-triangular system is straightforward:
It gives the unique solution of the full-rank linear least-squares problem (34):
Since multiplication with an orthogonal matrix does not change the singular values, the condition numbers do not change either:
(35) |
This holds in particular for the Vandermonde matrices, both in the discrete and continuous case.
Mathematica Definitions for QR Orthonormalization
This defines and .
Classical Gram–Schmidt Orthonormalization
Polynomial Approximation via Classical Gram–Schmidt Orthonormalization
Applying the classical Gram–Schmidt orthonormalization process in a pre-Hilbert space, described in [2], p. 88 ff., to the monomials in gives an orthonormal system of polynomials
(36) |
that satisfy
(37) |
The Fourier coefficients of any function with respect to this orthonormal system are defined according to [2], p. 86, equation (1) (the dot here denotes the placeholder for the integration argument in the scalar product):
The best approximation to is given as the Fourier sum of terms:
Mathematica Definitions for Classical Gram–Schmidt Orthonormalization
The orthonormal system of polynomials is given by the function . The functions and are also defined.
Example Interval : Fourier Approximation by Legendre Polynomials
This defines the polynomials .
These polynomials differ from the classical Legendre polynomials LegendreP built into Mathematica only by normalization factors.
This shows the Fourier approximation of a sample set of functions.
The Classical Gram–Schmidt Orthonormalization Interpreted as QR Factorization
Proposition 1
(38) |
(39) |
Proof
Theorem 4
Proof
For this QR factorization, the inverse of the upper-triangular matrix is already calculated by the Gram–Schmidt process:
Equations (26), (28) and (35) give the relations on the 2-condition numbers of the operators or matrices:
Corollary 1
The numerical instability of the classical Gram–Schmidt process in machine arithmetic is discussed in [4], section 5.2.8. However, since the Gram–Schmidt orthonormalization of the monomials with respect to the scalar product can be performed by symbolic calculations, numerical algorithm stability is not an issue here, contrary to the -dimensional scalar product .
Mathematica Definitions: Gram–Schmidt Orthonormalization as QR Factorization
This defines the Gram–Schmidt coefficient matrix .
This defines the lower- and upper-triangular matrices of QR decomposition.
This defines the orthogonal matrix .
Illustrating Proposition 1 and Theorem 4 for an Arbitrary Interval
Here is a set of orthonormal polynomials (, ).
This gives the Gram–Schmidt coefficient matrix , with .
Multiply the matrix of monomials from the right or left.
This reproduces orthogonal matrix .
By construction, the polynomials contained in the matrix columns are orthonormal with respect to the scalar product in .
This verifies the QR decomposition , as in Theorem 4.
Correspondence between Gram–Schmidt Orthonormalization and QR Factorization Interpretation
Select one of the sample functions and compare the results from the Gram–Schmidt orthonormalization and QR factorization interpretation.
Examples: Condition Numbers of Vandermonde Matrices
For the continuous case, because , the singular values of equal the singular values of .
Here is the case of a discrete grid.
Example Interval : Relation to Hilbert Matrices
Obviously, is the lower-triangular Cholesky factor of the .
Consequently, the 2-condition number of the Vandermonde matrix of size on is the square root of the 2-condition number of .
Summary of Results for Orthogonalization Methods
This gives summary results.
Characterizing Orthogonalization Solutions
- The system matrix is a Vandermonde matrix of dimension in the continuous setting and in the discrete setting.
- has full rank in the continuous and discrete settings.
- The discrete case is solved by the QR factorization of into an orthogonal matrix and an upper-triangular matrix .
- In the continuous case, the Gram–Schmidt orthonormalization applied to the monomials in reads in matrix vector notation as .
- This is reformulated as QR factorization of the Vandermonde matrix: .
- Consequently, the 2-condition number of equals the 2-condition number of the upper-triangular matrix :
orthogonal. Therefore the 2-condition number is much smaller than for .
The Normal Equations Method for Finite Dimensions
The approach for deriving the normal equations for the least-squares problem (34) is described in [4], section 5.3, for example. Define
A necessary condition for a minimum is , or equivalently,
(40) |
These are called the normal equations. The minimum is unique if the Hessian matrix has full rank . Then there is a unique solution of the linear least-squares problem (34) or (40):
(41) |
For an equally spaced , defined as in equation (5), the Vandermonde matrix has full rank, so the Hessian matrix for polynomial approximation has full rank as well.
If is rank deficient (), then there are an infinite number of solutions to the least-squares problem. There is still a unique solution with minimum 2-norm, which can be written with the pseudo-inverse matrix ([4], section 5.5):
For full rank, .
The Normal Equations Algorithm for the Case
Suppose with (see [4] section 5.3.1). Then this algorithm computes the unique solution to the linear least-squares problem (34):
- Compute the lower-triangular portion of .
- Compute .
- Compute the Cholesky factorization (or via CholeskyDecomposition).
- Solve the triangular systems and (or and ).
The Normal Equations Method on
The approach for deriving the normal equations for the case applies with one modification to the continuous least-squares approximation equation (12) (see [6]):
The matrix transpose has to be replaced by the adjoint operator :
A necessary condition for a minimum is , or equivalently,
(42) |
These are called the normal equations.
(43) |
is called the Hessian matrix. The minimum is unique if has full rank . The elements can be calculated via equation (31):
Obviously, the Hessian matrix is symmetric and positive semidefinite. Since has full rank for any nonzero interval , then has full rank (as well by equation (32)) and is therefore positive definite. Then there exists a unique solution of equations (42) and (12).
Finally, calculate the elements on the right-hand side of via equation (30):
Convergence of Discrete Approximations to Continuous Approximations
This subsection investigates under which conditions and how fast the polynomial approximations on discrete grids converge to the continuous polynomial approximation in .
For an equally spaced , the normal equations, multiplied by the step size , read
(44) |
Define
(45) |
then by equation (7), the matrix elements of are just the Riemann sums for the matrix elements (integrals) of . Therefore, the Hessian matrices on the discrete grid converge to the continuous Hessian matrix in any matrix norm according to:
Define
(46) |
then by equation (7), the elements of are just the Riemann sums for the elements of , the moments of . Therefore, the right-hand side of the normal equations on the discrete grid converge to the right-hand side of the continuous normal equations in any vector norm according to:
(47) |
Proposition 2
Proof
From equation (42), the solution of the polynomial approximation in is
From equation (44), the solution for the discrete grid is
(48) |
For the matrix inverses
Expanding the difference of the solution coefficient vectors completes the proof:
The Relation of the Normal Equations Method to Gram–Schmidt Orthonormalization
on
Theorem 5
Proof
From Theorem 4 and because ,
By [4], Theorem 4.2.7, the Cholesky decomposition of a symmetric positive-definite square matrix is unique.
Equations (24) and (26) give the relation for the 2-condition numbers of the matrices.
Corollary 2
Definitions to Solve Normal Equations
This defines the continuous and discrete Hessian matrices.
These are the right‐hand sides of the continuous and discrete normal equations.
This gives the solution of the normal equations system.
This gives the approximation polynomials for the continuous and discrete cases.
Illustrating Theorem 4 for the General Interval
This gives the Gram–Schmidt coefficient matrix, its inverse and inverse transpose.
The matrix times its transpose equals the Hessian matrix of the normal equations.
Equivalently, is the lower-triangular and is the upper-triangular Cholesky factor of the Hessian matrix .
Special Case: The Interval for Hilbert Matrices
For , the Hessian matrix is identical to the Hilbert matrix of dimension :
Hilbert matrices are ill-conditioned already for dimensions (i.e. have condition numbers greater than about 10,000) and soon reach the limits of 64-bit IEEE arithmetic.
Summary of Results for the Normal Equations Method
Here are the summary results. It takes some time for .
Characterization of the Normal Equations Solution
- Continuous case: The Hessian matrix originates from the full-rank Vandermonde matrix of the orthogonalization solutions by taking .
- Discrete case: The Hessian matrix comes from the full-rank Vandermonde matrix of the orthogonalization solutions by taking .
- Consequently,
- gives the exact Hilbert matrix of dimension .
- For other intervals , the Hessian matrix is close to a Hilbert matrix (“Hilbert-type matrix”).
- Hilbert matrices and Hilbert-type matrices are very ill-conditioned already for dimensions greater than four and return extremely inaccurate numerical solutions.
- For intervals symmetric around zero, , has a chessboard pattern of zeros, since the columns of are alternating orthogonal.
- The normal equations are solved by Cholesky decomposition in the continuous and discrete settings.
matrix . This is the root cause of the inherent ill-conditioning of the normal equations.
Comparison of Solution Methods
Gram–Schmidt Orthonormalization versus Normal Equations
This performs Gram–Schmidt orthonormalization for a special case. For other cases, change the 6 in to an integer between 1 and 10.
Here are the results from the normal equations.
The two solutions agree both symbolically and in exact arithmetic.
But there are numeric differences in IEEE 758 machine arithmetic.
These differences come from the lower error amplification expressed in a lower 2-condition number,
The numerical solution via the Gram–Schmidt orthonormalization solution is usually more accurate than the normal equations solution.
QR Factorization versus Normal Equations on an Equally Spaced Grid
Here is the QR factorization. Again, for other cases, change the 3 in to an integer between 1 and 10.
Normal equations.
The difference between the numerical solutions is due to the difference in how their rounding errors propagate.
Because of the lower error amplification expressed in a lower 2-condition number, the numerical solution via QR factorization is more accurate than the normal equations solution.
Convergence of the Right-Hand Sides of Normal Equations
This calculates the convergence order for grids of powers of 2.
Choose problem parameters. Yet again, for other cases, change the 7 in to an integer between 1 and 10.
Here are the approximation errors for grids of powers of 2.
Here is the convergence order. To see the result for another function, select another sample function in . Zero errors are suppressed.
For case 3, all but the first two elements of are zero. For case 5, all but the first element of are zero; therefore the logarithmic plots look incomplete.
Sample functions 3, 4, 5, 6 have discontinuities in the zeroth, first or second derivative; 7 and 8 have singularities in the first or second derivative. These sample functions and do not satisfy all the assumptions of Theorem 1. Therefore the convergence order of the right-hand side can be lower than 1 (respectively 2) as predicted by equation (47). Sample functions 1, 2, 9, 10 are infinitely often continuously differentiable; therefore they have maximum convergence order 1 (respectively 2) according to equation (47).
Convergence of the Normal Equations Solution
These are the approximation errors for grids of powers of 2.
This takes a few minutes.
This plots the convergence order.
Relations between Solution Methods
Analyzing polynomial approximation, this article has systematically worked out the close relations between the solutions obtained by:
- Gram–Schmidt orthonormalization and Fourier approximation
- QR factorization
- Normal equations
The interrelations are:
- The Gram–Schmidt orthonormalization applied to the monomials on is reformulated as QR factorization of the Vandermonde matrix with the following one-to-one correspondences.
- For , Gram–Schmidt orthonormalization returns scalar multiples of the classical Legendre polynomials.
- The Hessian matrix in the normal equations originates from the full-rank Vandermonde matrix of the orthogonalization solutions by taking .
- The 2-condition number of is the square of the 2-condition number of the Vandermonde matrix .
- The upper Cholesky factor of is identical to the upper-triangular matrix of the QR factorization of in both the continuous and discrete cases.
- is identical to of the QR factorization in both the continuous and discrete cases.
- The QR and normal equations solutions on an equally spaced grid of points in the interval converge to the solution in with the same order as the error of the quadrature formulas.
the orthonormal polynomials .
with respect to the 2-norm.
Summary
This article has analyzed the polynomial approximation of a real-valued function with respect to the least-squares norm in different settings:
Three different solution methods for this least-squares problem have been compared:
- The orthogonalization solutions
- The normal equations solution in the continuous and discrete settings
All definitions and solution methods were implemented in Mathematica 11.1.
All solution characteristics and their relations were illustrated by symbolic or numeric examples and graphs.
Acknowledgments
The author thanks Alexandra Herzog and Jonas Gienger for critical proofreading and the anonymous referees for valuable improvements of the paper. The paper evolved from a lecture series in numerical analysis, given by the author for ESOC staff and contractors from 2012 to 2016, using Mathematica:
- Lecture #4: Linear Least-Squares Systems
- Lecture #6: Numerical Quadrature
- Lecture #9: Interpolation and Approximation of Functions, Curve and Surface Fitting
References
[1] | E. Hewitt and K. Stromberg, Real and Abstract Analysis, New York: Springer-Verlag, 1975. |
[2] | K. Yosida, Functional Analysis, 5th ed., New York: Springer-Verlag, 1978. |
[3] | P. J. Davis and P. Rabinowitz, Methods of Numerical Integration, 2nd ed., London: Academic Press, 1984. |
[4] | G. H. Golub and C. F. Van Loan, Matrix Computations, 4th ed., Baltimore: The John Hopkins University Press, 2013. |
[5] | E. W. Weisstein. “Vandermonde Determinant” from Wolfram MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/VandermondeDeterminant.html. |
[6] | J. D. Faires and R. Burden, Numerical Methods, 3rd ed., Pacific Grove, CA: Thomson/Brooks/Cole, 2003. |
G. Gienger, “Polynomial Approximation,” The Mathematica Journal, 2017. dx.doi.org/doi:10.3888/tmj.19-2. |
About the Author
2011–2016 ESA/ESOC research and technology management office
2010–2011 ESOC navigation support office
1987–2009 Mathematical analyst in ESOC Flight Dynamics division; supported more than 20
space missions
1988 Received Dr. rer. nat. from Heidelberg University, Faculty of Science and Mathematics
1984–1987 Teaching assistant at University of Giessen
1982–1984 Research scientist at Heidelberg University
1975–1981 Studied mathematics and physics at Heidelberg University
Gottlob Gienger
European Space Operations Centre ESOC
Research and Technology Management Office OPS-GX
Robert-Bosch-Strasse 5, 64293 Darmstadt, Germany
Retired staff
Gottlob.Gienger@gmail.com