# Interpolating the population of China¶

using FundamentalsNumericalComputation


We create two vectors for data about the population of China. The first has the years of census data, the other has the numbers of millions of people.

year = 1980:10:2010
pop = [984.736, 1148.364, 1263.638, 1330.141];


It’s convenient to measure time in years since 1980. We use .- to subtract a scalar from a vector elementwise.

t = year .- 1980
y = pop;


Now we have four data points $$(t_1,y_1),\dots,(t_4,y_4)$$, so $$n=4$$ and we seek an interpolating cubic polynomial. We construct the associated Vandermonde matrix:

V = [ t[i]^j for i=1:4, j=0:3 ]

4×4 Array{Int64,2}:
1   0    0      0
1  10  100   1000
1  20  400   8000
1  30  900  27000


To solve for the vector of polynomial coefficients, we use a backslash:

c = V \ y

4-element Array{Float64,1}:
984.736
18.766600000000025
-0.23968500000000276
-6.949999999993395e-5


The algorithms used by the backslash operator are the main topic of this chapter. For now, observe that the coefficients of the cubic polynomial vary over several orders of magnitude, which is typical in this context. By our definitions, these coefficients are given in ascending order of power in $$t$$.

We can use the resulting polynomial to estimate the population of China in 2005:

p = Polynomial(c)    # construct a polynomial
p(2005-1980)   # apply the 1980 time shift

1303.0119375


The official figure is 1297.8, so our result is not bad.

We can visualize the interpolation process. First, we plot the data as points. We’ll shift the $$t$$ variable back to actual years.

scatter(t,y, label="actual", legend=:topleft,
xlabel="years since 1980", ylabel="population (millions)", title="Population of China")


We want to superimpose a plot of the polynomial. We do that by evaluating it at a vector of points in the interval. The dot after the name of the polynomial is a universal way to apply a function to every element of an array (known as vectorization).

tt = LinRange(0,30,500)   # 500 times from 0 to 30 years
yy = p.(tt)               # use dot to apply to all elements of the vector

500-element Array{Float64,1}:
984.736
985.8633861620615
986.9890395778162
988.1129601566497
989.2351478079472
990.3556024410941
991.4743239654758
992.5913122904778
993.7065673254856
994.8200889798843
995.9318771630594
997.0419317843964
998.1502527532807
⋮
1327.2573255559355
1327.5283639685024
1327.797625414837
1328.0651098043252
1328.3308170463522
1328.5947470503033
1328.8568997255638
1329.1172749815196
1329.3758727275556
1329.6326928730575
1329.8877353274104
1330.141


Now note the use of plot! to add to the current plot, rather than replacing it.

plot!(tt,yy, label="interpolant")


Let’s redo it, this time continuing the curve outside of the original date range.

scatter(t,y, label="actual", legend=:topleft,
xlabel="years since 1980", ylabel="population (millions)", title="Population of China")
tt = LinRange(-10,50,500)
plot!(tt,p.(tt), label="interpolant")


While the interpolation is plausible, the extrapolation to the future is highly questionable! As a rule, extrapolation more than a short distance beyond the original interval is not reliable.