{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Interpolating the population of China"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"using FundamentalsNumericalComputation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We create two vectors for data about the population of China. The first has the years of census data, the other has the numbers of millions of people."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"year = 1980:10:2010 \n",
"pop = [984.736, 1148.364, 1263.638, 1330.141];"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It's convenient to measure time in years since 1980. We use `.-` to subtract a scalar from a vector elementwise."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"t = year .- 1980\n",
"y = pop;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we have four data points $(t_1,y_1),\\dots,(t_4,y_4)$, so $n=4$ and we seek an interpolating cubic polynomial. We construct the associated Vandermonde matrix:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4×4 Array{Int64,2}:\n",
" 1 0 0 0\n",
" 1 10 100 1000\n",
" 1 20 400 8000\n",
" 1 30 900 27000"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"V = [ t[i]^j for i=1:4, j=0:3 ]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To solve for the vector of polynomial coefficients, we use a backslash:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4-element Array{Float64,1}:\n",
" 984.736\n",
" 18.766600000000025\n",
" -0.23968500000000276\n",
" -6.949999999993395e-5"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"c = V \\ y"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The algorithms used by the backslash operator are the main topic of this chapter. For now, observe that the coefficients of the cubic polynomial vary over several orders of magnitude, which is typical in this context. By our definitions, these coefficients are given in ascending order of power in $t$. \n",
"\n",
"\n",
"We can use the resulting polynomial to estimate the population of China in 2005:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1303.0119375"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"p = Polynomial(c) # construct a polynomial\n",
"p(2005-1980) # apply the 1980 time shift"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The official figure is 1297.8, so our result is not bad. \n",
"\n",
"\n",
"We can visualize the interpolation process. First, we plot the data as points. We'll shift the $t$ variable back to actual years."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"scatter(t,y, label=\"actual\", legend=:topleft,\n",
" xlabel=\"years since 1980\", ylabel=\"population (millions)\", title=\"Population of China\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We want to superimpose a plot of the polynomial. We do that by evaluating it at a vector of points in the interval. The dot after the name of the polynomial is a universal way to apply a function to every element of an array (known as **vectorization**)."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"500-element Array{Float64,1}:\n",
" 984.736\n",
" 985.8633861620615\n",
" 986.9890395778162\n",
" 988.1129601566497\n",
" 989.2351478079472\n",
" 990.3556024410941\n",
" 991.4743239654758\n",
" 992.5913122904778\n",
" 993.7065673254856\n",
" 994.8200889798843\n",
" 995.9318771630594\n",
" 997.0419317843964\n",
" 998.1502527532807\n",
" ⋮\n",
" 1327.2573255559355\n",
" 1327.5283639685024\n",
" 1327.797625414837\n",
" 1328.0651098043252\n",
" 1328.3308170463522\n",
" 1328.5947470503033\n",
" 1328.8568997255638\n",
" 1329.1172749815196\n",
" 1329.3758727275556\n",
" 1329.6326928730575\n",
" 1329.8877353274104\n",
" 1330.141"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tt = LinRange(0,30,500) # 500 times from 0 to 30 years\n",
"yy = p.(tt) # use dot to apply to all elements of the vector "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now note the use of `plot!` to add to the current plot, rather than replacing it."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"plot!(tt,yy, label=\"interpolant\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's redo it, this time continuing the curve outside of the original date range."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"scatter(t,y, label=\"actual\", legend=:topleft,\n",
" xlabel=\"years since 1980\", ylabel=\"population (millions)\", title=\"Population of China\")\n",
"tt = LinRange(-10,50,500) \n",
"plot!(tt,p.(tt), label=\"interpolant\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"While the interpolation is plausible, the extrapolation to the future is highly questionable! As a rule, extrapolation more than a short distance beyond the original interval is not reliable."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Julia (faststart)",
"language": "julia",
"name": "julia-fast"
},
"language_info": {
"file_extension": ".jl",
"mimetype": "application/julia",
"name": "julia",
"version": "3.7.3-final"
}
},
"nbformat": 4,
"nbformat_minor": 4
}