What Is Information-Based Complexity?
HENRYK WOŹNIAKOWSKI
The purpose of this short note is to informally introduce information-based complexity (IBC). We describe the basic notions of IBC for the approximate solution of continuous mathematically posed problems.
IBC is the branch of computational complexity that studies continuous mathematical problems. Typically, such problems are defined on spaces of functions of d variables and often d is huge. Since one cannot enter a function of real or complex variables into a digital computer, the available information is usually given by finitely many function values at some prescribed sample points.1 The sample points can be chosen adaptively, that is, the choice of the jth point may be a function of the already computed function values at the j – 1 previously used points.
Such information is
partial, contaminated and priced.
It is partial since knowing finitely many function values, we cannot in general recover the function exactly and we are unable to find the exact solution of a continuous problem. It is contaminated since the function values are computed with model, experimental, and/or rounding errors. It is also priced since we are charged for each experiment leading to a function value or for each computation needed to obtain a function value. Often, it is expensive to obtain a function value. For example, some functions occurring in computational practice require thousands or millions of arithmetic operations to compute one function value.
Continuous problems for which partial, contaminated and priced information is available arise in many areas including numerical analysis, statistics, physics, chemistry and many computational sciences. Such problems can only be solved approximately to within some error threshold ε.
The goal of IBC is to create a theory of computational complexity for such problems. Intuitively, complexity is defined as the minimal cost of all possible algorithms that compute the solution of a continuous problem to within error at most ε. For many problems, the minimal cost is determined by the minimal number of function values needed for computing the solution to within ε.
Depending on precisely how the error and the cost are defined we have various settings. In the worst case setting, the error and cost of algorithms are defined by their worst case performance. That is, the error is the supremum of the distance between the exact solution and the approximation computed by the algorithm for all functions from a given set. The distance may be defined by a norm, or by a metric, and by a specific error criterion. For example, we may have the absolute, relative or normalized error criterion. Similarly, the cost is defined as the supremum of the costs of the algorithm for all functions from the same set of functions. The cost for a single function is equal to the sum of information and combinatory costs. The information cost is the number of function values times the cost of computing one function value. The combinatory cost is the number of all arithmetic operations needed to combine the already computed function values. Here we assume for simplicity that the cost of one arithmetic operation is taken as unity, and by arithmetic operations we mean such operations as addition, multiplication, subtraction, division and comparison of numbers. The set of permissible arithmetic operations can be extended by permitting the computation of special functions such as logarithms, exponential and trigonometric functions. The sets of functions studied in IBC are usually unit balls or whole spaces, depending on the error criterion. Typical spaces are Hilbert or Banach spaces of infinite dimension.
In the average case setting, the error and the cost of algorithms are defined by their average performance. That is, we assume a probability measure on a given set of functions and we take the expectation of the errors and costs of the algorithm for all functions from the given set with respect to this probability measure. The probability measures studied in IBC in the average case setting are usually Gaussian or truncated Gaussian measures defined on infinitely dimensional spaces.
In the probabilistic setting, the error and the cost of algorithms are defined as in the worst case setting by taking the supremum over a given set of functions modulo a subset of small measure. That is, we agree that the algorithms behave properly on a set of measure at least, say, 1 – δ, and we do not control their behavior on a set of measure at most δ.
In the worst case, average case and probabilistic settings discussed so far, we consider only deterministic algorithms. In the randomized setting, we also permit randomization. That is, we use randomized algorithms that can compute function values at randomized points, and can combine these values using also randomization. The randomized error and the randomized cost are defined analogously as before by taking the expectation with respect to a random element of the algorithm and then the worst case, average case or probabilistic performance with respect to functions from a given set.
Quite recently, one more setting of IBC has been added. This is the quantum setting where computations are performed on a (still) hypothetical quantum computer. This leads to different definitions of the error and the cost of algorithms, and the quantum complexity is defined as the minimal cost needed to compute the solution of a continuous problem to within error at most ε with a given probability.
The model of computation used in IBC is the real number model which is consistent with the fact that most continuous problems are solved today using floating point arithmetic, usually with a fixed precision. The cost of operations in floating point arithmetic does not depend on the size of the input and modulo rounding errors is equivalent to the real number model. We simplify the IBC analysis by not considering rounding errors. Surprisingly enough, for most algorithms whose cost in the real number model is close to the complexity of a continuous problem we can find numerically stable implementation of such algorithms. Then we obtain essentially the same results in floating point arithmetic as in the real number model when we assume that the problem is not too ill-conditioned and that ε is related to the relative precision of floating point arithmetic.
Today we know rather tight bounds on the complexity of many continuous problems, and this holds in all settings we mentioned above. There are many multivariate problems whose complexity in the worst case, average case and randomized settings is exponential in d. These IBC results may be contrasted with discrete problems, such as factorization, where the exponential complexity is conjectured and not yet proved. Of course, for discrete problems we have complete information and we cannot use powerful proof techniques for partial information that essentially allow us to obtain lower bounds and prove intractability.
We now briefly explain how lower and upper bounds are obtained in IBC. For simplicity, we assume that function values can be computed exactly. We start with lower bounds since they are usually harder to obtain.
Lower bounds are possible to obtain by using so-called adversary arguments. That is, we want to identify two functions that are indistinguishable with respect to finitely many function values used by the algorithm, and with the most widely separated solutions. That is, they have the same function values used by the algorithm but with the maximal distance between solutions we are trying to approximate. Clearly, the algorithm cannot distinguish between these two functions and the best it can do is to take the mean of their two solutions. So no matter how the algorithm is defined, there is no way to beat half of the distance between these two solutions. Hence, the maximal distance between the solutions for indistinguishable functions gives us a lower bound on the error. This is usually expressed as a function of n, where n is the number of function values used by algorithms. We stress that in many cases, it is quite hard to find this function of n, although we may use a whole arsenal of mathematical tools to help us to find this function. Today, there are many techniques for finding lower bounds. Just to name a few, they are based on n-widths, especially Gelfand and Kolmogorov n-widths, on decomposable reproducing kernels, on optimality of linear algorithms for linear problems, on adaption versus non-adaption techniques etc.
Upper bounds can be obtained, for example, by using so-called interpolatory algorithms. Namely, when we have already computed, say n function values f(xj) for j = 1, 2, . . . , n, we want to find a function g belonging to the same set of functions as f, and sharing the same function values as f. That is, g(xj) = f(xj) for all j = 1, 2, . . . , n, so we interpolate the data. Then the interpolatory algorithm takes the exact solution for g as the approximate solution for f. For given information by n function values at xj, the error of the interpolatory algorithm is almost minimal since it can differ from the lower bound only by a factor of at most 2. Obviously, we still need to find optimal information, i.e., points xj for which the error is minimal, and it is usually a hard nonlinear problem.
The cost of the interpolatory algorithm is, in general, more tricky. For some spaces and solutions, it turns out that splines are interpolatory and we can use vast knowledge about splines to compute them efficiently. For some spaces or solutions, it may, however, happen that the cost of an interpolatory algorithm is large. For some cases, we can use different algorithms. For example, for many linear problems, it has been proved that the error is minimized by linear algorithms. This is a vast simplification that helps enormously for the search of easy to implement algorithms with almost minimal error. The first such result for general linear functionals defined over balanced and convex sets of functions was proved by S. Smolyak in 1965. There are many generalizations of this result for linear operators but this is beyond this note.
We want to mention one more technique of obtaining upper bounds which uses a randomization argument, although the original problem is studied, say, in the worst case setting. It turns out that for Hilbert spaces and linear functionals, we can explicitly compute the worst case error of any linear algorithm. This worst case error obviously depends on the sample points used by the algorithm. So assume for a moment that the sample points are independent randomly chosen points. We then compute the expected worst case error with respect to some distribution of these points. It turns out that for many cases, the expectation is small. By the mean value theorem, we know that there must be sample points for which the worst case error is at least as small as the expectation. Furthermore, using Chebyshev’s inequality, we know that the measure of sample points with error exceeding the expectation by a factor larger than one is large. This proves non-constructively that good linear algorithms exist. Obviously, we now face the problem of how to construct them. There are a number of options but we want to mention only one of them for multivariate integration of d-variate functions with d in the hundreds or thousands. There is a beautiful algorithm, called the CBC algorithm, that permits the construction of n sample points component by component (so the name CBC) by using the fast Fourier transform (FFT) in time of order n ln(n) d. The CBC algorithm was designed by the Australian school of I. Sloan, S. Joe, F. Kuo and J. Dick starting from 2001, and its fast implementation was proposed by R. Cools and D. Nuyens in 2006. In this way, today we approximate integrals of functions with even 9125 variables.
For many multivariate problems defined on spaces of d-variate functions, we know that the worst case complexity of computing the solution to within ε is Θ (ε–pd). That is, lower and upper bounds are proportional to ε–pd with factors in the Θ notation independent of ε–1 but possibly dependent on d. Sometimes such estimates hold modulo a power of ln ε–1 which we omit for simplicity. The exponent pd usually depends on the smoothness of the set of functions. If they are r times differentiable then usually
Note that for fixed r and varying d, we have an arbitrarily large power of ε–1. In general, we cannot, however, claim that the complexity is exponential in d since it also depends on how the factor in the upper bound in the Θ notation depends on d. For Lipschitz functions we have r = 1, and for multivariate integration it is known due to A. Sukharev from 1979, who used the proof technique and results of S. Bakhvalov mostly from 1959, that the complexity in the worst case setting is roughly
with e = exp(1). Hence, it depends exponentially on d. If the complexity of a multivariate problem is exponential in d, we say that the problem is intractable, or suffers from the curse of dimensionality following Bellman who coined this phrase in 1957. One central issue of IBC research today is to determine for which multivariate problems and for which settings we have tractability, that is, when the complexity is not exponential in ε–1 and d. Depending on how we measure the lack of exponential dependence, we have various notions of tractability such as polynomial, strong polynomial and weak tractability. There is a huge literature on the complexity of multivariate problems. However, most of these papers and books have results which are sharp with respect to ε–1 but have unfortunately unknown dependence on d. To prove tractability we must establish also sharp dependence on d. Therefore tractability requires new proof techniques to obtain sharp bounds also on d. The second book in the list below is devoted to tractability of multivariate problems.
We end this note with a list of IBC books, where the reader can find more information, results and proofs on the complexity of continuous problems.
1. Sometimes we may use more general information consisting of finitely many linear functionals but, for simplicity, we restrict ourselves in this note only to function values as available information.
•E. Novak, Deterministic and stochastic error bounds in numerical analysis. Lecture Notes in Math. 1349, Springer-Verlag, Berlin, 1988.
•E. Novak and H. Woźniakowski, Tractability of multivariate problems. Volume I: Linear information, EMS Tracts Math. 6, European Mathematical Society Publishing House, Zürich, 2008.
•L. Plaskota, Noisy information and computational complexity. Cambridge University Press, Cambridge, UK, 1996.
•K. Ritter, Average-case analysis of numerical problems. Lecture Notes in Math. 1733, Springer-Verlag, Berlin, 2000.
•K. Sikorski, Optimal solution of nonlinear equations. Oxford University Press, 2000.
•J. F. Traub and H. Woźniakowski, A general theory of optimal algorithms. Academic Press, New York, 1980.
•J. F. Traub, G. W. Wasilkowski, and H. Woźniakowski, Information, uncertainty, complexity. Addison-Wesley, Reading, MA, 1983.
•J. F. Traub, G. W. Wasilkowski, and H. Woźniakowski, Information-based complexity. Comput. Sci. Sci. Comput, Academic Press, New York, 1988.
•J. F. Traub and A. G. Werschulz, Complexity and information. Lezioni Lincee, Cambridge University Press, Cambridge, UK, 1998.
•A. G. Werschulz, The computational complexity of differential and integral equations: An information-based approach. Oxford Math. Monogr., Oxford University Press, Oxford, 1991.