Floating point in Julia

In Julia, 1 and 1.0 are different values, because they have different types:

@show typeof(1);
@show typeof(1.0);
typeof(1) = Int64
typeof(1.0) = Float64

The standard choice for floating-point values is Float64, which is double precision using 64 binary bits.

bitstring(1.0)
"0011111111110000000000000000000000000000000000000000000000000000"

The first bit determines the sign of the number:

[bitstring(1.0); bitstring(-1.0)]
2-element Array{String,1}:
 "0011111111110000000000000000000000000000000000000000000000000000"
 "1011111111110000000000000000000000000000000000000000000000000000"

The next 11 bits determine the exponent (scaling) of the number, and so on.

[bitstring(1.0); bitstring(2.0)]
2-element Array{String,1}:
 "0011111111110000000000000000000000000000000000000000000000000000"
 "0100000000000000000000000000000000000000000000000000000000000000"

Floating-point values have three parts: the sign bit, the exponent, and the mantissa or significand. These are all directly accessible.

x = 1.0; @show x,sign(x),exponent(x),significand(x);
(x, sign(x), exponent(x), significand(x)) = (1.0, 1.0, 0, 1.0)
x = 0.125; @show x,sign(x),exponent(x),significand(x);
(x, sign(x), exponent(x), significand(x)) = (0.125, 1.0, -3, 1.0)

The spacing between floating-point values in \([2^e,2^{e+1})\) is \(2^e \epsilon_\text{mach}\), where \(\epsilon_\text{mach}\) is known as machine epsilon. You can get it from the eps function in Julia.

eps()
2.220446049250313e-16

Because double precision allocates 52 bits to the mantissa, the default value of machine epsilon is \(2^{-52}\).

log2(eps())
-52.0

The spacing between adjacent floating-point values is proportional to the magnitude of the value itself. This is how relative precision is kept roughly constant throughout the range of values. You can get the adjusted spacing by calling eps with a value.

eps(2.0^20)
2.3283064365386963e-10
log2(ans)
-32.0
bitstring(47.0)
"0100000001000111100000000000000000000000000000000000000000000000"
bitstring(47.0+eps(47.0))
"0100000001000111100000000000000000000000000000000000000000000001"

A common mistake is to think that \(\epsilon_\text{mach}\) is the “smallest floating-point number.” In fact, the scaling of values is limited by the exponent, not the mantissa. The actual range of positive values in double precision is

@show [floatmin(),floatmax()];
[floatmin(), floatmax()] = [2.2250738585072014e-308, 1.7976931348623157e308]

For the most part you can mix integers and floating-point values and get what you expect.

1/7
0.14285714285714285
37.3 + 1
38.3
2^(-4)
0.0625

There are some exceptions. A floating-point value can’t be used as an index into an array, for example, even if it is numerically equal to an integer. In such cases you use Int to convert it.

@show 5.0,Int(5.0);
(5.0, Int(5.0)) = (5.0, 5)

If you try to convert a noninteger floating-point value into an integer you get an InexactValue error. This occurs whenever you try to force a type conversion that doesn’t make clear sense.