Data Science with Julia

THE PURPOSE of this chapter is to expose the reader to the core parts of the Julia language. While it is intended to serve as an introduction to the language, it should also prove useful as a reference as the reader moves through the remainder of the book. The topics covered can be considered under six headings: variable names, operators, types, data structures, control flow, and functions. Furthermore, the chapter is laid out so that the contents build in a cumulative fashion, e.g., by the time one reaches the material on functions, all of the other material is being, or could usefully be, used. Accordingly, the material in this chapter is best read in order upon first reading.

In Julia, variable names are associated with a particular value. This value is stored so it can be used in future computations or other operations. Variable names are case sensitive and Unicode names (in UTF-8 encoding) may be used. Specifically, Unicode math symbols can be entered by using the L^AT_EX2ε symbol followed by a tab. In general, variable names must begin with a letter (uppercase or lowercase), an underscore _ or a Unicode code point > 00A0, and subsequent characters can include the ! symbol, digits and other Unicode points. It is also possible to redefine constants (e.g., π); however, built-in statements cannot be used as variable names (example below).

## Some playing around with variable names

## These are all correct

z = 100

y = 10.0

s = "Data Science"

ϑ = 5.0

µ = 1.2

datascience = true

data_science = true

## These are incorrect and will return an error

if = true

else = 1.5

The recommended convention is that variable names are typed in lowercase letters and words are separated by an underscore. In general, this convention will be followed herein.

In programming languages, operators are symbols used to do specific mathematical, logical or relational operations. Julia operators will be familiar to R and Python users. There are four main categories of operators in Julia: arithmetic, updating, numeric comparison, and bitwise. Examples of the first three are given in the following code block.

x = 2

y = 3

z = 4

## Arithmetic operators

x + y

# 5

x^y

# 8

## Updating operators

x += 2

# 4

y -= 2

# 1

z *= 2

# 8

## Numeric Comparison

x == y

# false

x != y

# true

x <= z

# true

When constructing expressions with multiple operators, the order in which these operators are applied to the expression is known as operator precedence. Operator precedence is an important practical consideration and can lead to unexpected results. The following code block uses the variables instantiated above to evaluate three different expressions. The first one is evaluated solely based on the operator precedence. The following two are forced to evaluate in specific ways based on the parentheses () included in the expression. Parentheses are evaluated first in the precedence hierarchy. We recommend programming expressions with parentheses to minimize bugs and improve code clarity.

## Operator precedence

x*y+z^2

# 68

x*(y+(z^2))

# 260

(x*y)+(z^2)

# 68

Numeric literals are representations of numbers in code. Numeric primitives are representations of numbers as objects in memory. Julia has many primitive numeric types, e.g., Int32, Int64, Float32, and Float64. Julia offers full support for real and complex numbers. The internal variable Sys.WORD_SIZE displays the architecture type of the computer (e.g., 32 bit or 64 bit). The minimum and maximum values of numeric primitives can be displayed with the functions typemin() and typemax(), respectively. They take the name of numeric primitives as an argument and are detailed in the following code block. The default size of the primitive depends on the type of computer architecture.

## Computer's architecture type

Sys.WORD_SIZE

# 64

## Size of the default primitive

typemax(Int)

# 9223372036854775807

## Size of a specific primitive

## same as the default

typemax(Int64)

# 9223372036854775807

# Note that the above results are machine-specific.

A signed type can hold positive or negative integers, and uses the leftmost bit to identify the sign of the integer (e.g., Int64). An unsigned type can hold positive values only and stores larger values by using the leading bit as part of the value (e.g., UInt128). Boolean values are 8-bit integers, with false being 0 and true being 1. When doing arithmetic with integers, occasionally one will encounter overflow errors. This occurs when the result of an arithmetic expression is a value outside the representable range of the numeric primitive being used. This can happen if the result is larger or smaller than its allowable size. Examples are given in the code block below. If this is a possibility in a particular application, consider using unsigned integers or arbitrary precision integers, available in Julia as the BigInt type.

## Some examples of the Int type

# Integers

literal_int = 1

println("typeof(literal_int): ", typeof(literal_int))

# typeof(literal_int): Int64

# Boolean values

x = Bool(0)

y = Bool(1)

## Integer overflow error

x = typemax(Int64)

# 9223372036854775808

x += 1

# -9223372036854775807

x == typemax(Int64)

#false

## Integer underflow error

x = typemin(Int64)

# -9223372036854775808

x -= 1

# 9223372036854775807

x == typemin(Int64)

#false

Floats are similar to scientific notation. They are made up of three components: a signed integer whose length determines the precision (the significand); the base used to represent the number (usually base 10); and a signed integer that changes the magnitude of the floating point number (the exponent). The value of a float is determined by multiplying the significand by the base raised to the power of the exponent. Float64 literals are distinguished by having an e before the power, and can be defined in hexadecimal. Float32 literals are distinguished by having an f in place of the e. There are three Float64 values that do not occur on the real line:

1. Inf, positive infinity: a value larger than all finite floating point numbers, equal to itself, and greater than every other floating point value but NaN.

2. −Inf, negative infinity: a value less than all finite floating point numbers, equal to itself, and less than every other floating point value but NaN.

3. NaN, not a number: a value not equal to any floating point value, and not ==, < or > than any floating point value, including itself.

It is good practice to check for NaN, Inf and −Inf values in floating point operations. The following code block gives some examples of how to do this.

## Some examples of floats

x1 = 1.0

x64 = 15e-5

x32 = 2.5f-4

typeof(x32)

# Float32

## digit separation using an _

9.2_4 == 9.24

# true

isnan(0/0)

# true

isinf(1/0)

# true

isinf(-11/0)

# true

y1 = 2*3

# 6

isnan(y1)

# false

isinf(y1)

# false

y2 = 2/0

# Inf

isnan(y2)

# false

isinf(y2)

# true

The IEEE 754 standard (Zuras et al., 2008) sets out the technical standard for floating point arithmetic. The Julia Float type and all operations performed on it adhere to this standard. Representations of Float64 numbers are not evenly spaced, with more occurring closer to zero on the real number line. This is due to machine epsilon, an upper-bound on the rounding error in floating point arithmetic. It is defined to be the smallest value of z such that 1+z ≠ 1. In Julia, the value of epsilon for a particular machine can be found via the eps() function. The spacing between floating point numbers and the value of machine epsilon is important to understand because it can help avoid certain types of errors. Integer overflow errors have been mentioned, but there are also float underflow errors, which occur when the result of a calculation is smaller than machine epsilon or when numbers of similar precision are subtracted. We give more details in Appendix D, and readers are directed to Higham (2002) if further reading on the topic is desired.

## Some examples of machine epsilon

eps()

# 2.220446049250313e-16

## spacing between a floating point number x and adjacent number is

## at most eps * abs(x)

n1 =[1e-25, 1e-5, 1., 1e5, 1e25]

for i in n1

println(*(i, eps()))

end

# 2.2204460492503132e-41

# 2.2204460492503133e-21

# 2.220446049250313e-16

# 2.220446049250313e-11

# 2.2204460492503133e9

In Julia, a string is a sequence of Unicode code points, using UTF-8 encoding. The first 128 Unicode characters are the ASCII characters. Characters in strings have an index value within the string. It is worth noting that Julia indices start at position 1, similar to R but different to Python, which starts its indices at position 0. The keyword end can be used to represent the last index. Herein, we will deal with ASCII characters only. Note that String is the built-in type for strings and string literals, and Char is the built-in type used to represent single characters. In fact, Char is a numeric value representing a Unicode code point. The value of a string cannot be changed, i.e., strings are immutable, and a new string must be built from another string. Strings are defined by double or triple quotes.

## String examples

s1 = "Hi"

# "Hi"

s2 = """I have a "quote" character"""

# "I have a \"quote\" character"

Strings can be sliced using range indexing, e.g., my_string[4:6] would return a substring of my_string containing the 4th, 5th and 6th characters of my_string. Concatenation can be done in two ways: using the string() function or with the * operator. Note this is a somewhat unusual feature of Julia — many other languages use + to perform concatenation. String interpolation takes place when a string literal is defined with a variable inside its instantiation. The variable is prepended with $. By using variables inside the string’s definition, complex strings can be built in a readable form, without multiple string multiplications.

## Some examples of strings

str = "Data science is fun!"

str[1]

# 'D'

str[end]

#'! '

## Slicing

str[4:7]

# "a sc"

str[end-3: end]

# "fun! "

## Concatenation

string(str, " Sure is :)")

# "Data science is fun! Sure is :)"

str * " Sure is :)"

# "Data science is fun! Sure is :)"

## Interpolation

"1 + 2 = $(1 + 2)"

#"1 +2 = 3"

word1 = "Julia"

word2 = "data"

word3 = "science"

"$word1 is great for $word2 $word3"

#"Julia is great for data science"

Strings can be compared lexicographically using comparison operators, e.g., ==, >, etc. Lexicographical comparison involves sequentially comparing string elements with the same position, until one pair of elements falsifies the comparison, or the end of the string is reached. Some useful string functions are:

• findfirst(pat, str) returns the indices of the characters in the string str matching the pattern pat.

• occursin(substr, str) returns true/false depending on the presence/absence of substr in str.

• repeat(str, n) generates a new string that is the original string str repeated n times.

• replace(str, ptn => rep) searches string str for the pattern ptn and, if it is present, replaces it with rep.

Julia fully supports regular expressions (regexes). Regexes in Julia are fully Perl compatible and are used to hunt for patterns in string data. They are defined as strings with a leading r outside the quotes. Regular expressions are commonly used with the following functions:

• occursin(regex, str) returns true/false if the regex has a match in the string str.

• match(regex, str) returns the first match of regex in the string. If there is no match, it returns the special Julia value nothing.

• eachmatch(regex, str) returns all the matches of regex in the string str as an array.

Regexes are a very powerful programming tool for working with text data. However, an in-depth discussion of them is beyond the scope of this book, and interested readers are encouraged to consult Friedl (2006) for further details.

## Lexicographical comparison

s1 = "abcd"

s2 = "abce"

s1 == s2

# false

s1 < s2

# true

s1 > s2

#false

## String functions

str = "Data science is fun!"

findfirst("Data", str)

# 1:4

occursin("ata", str)

# true

replace(str, "fun" => "great")

# "Data science is great! "

## Regular expressions

## match alpha-numeric characters at the start of the str

occursin(r"^[a-zA-Z0-9]", str)

# true

## match alpha-numeric characters at the end of the str

occursin(r"[a-zA-Z0-9]$", str)

# false

## matches the first non-alpha-numeric character in the string

match(r"[^a-zA-Z0-9]", str)

#RegexMatch(" ")

## matches all the non-alpha-numeric characters in the string

collect(eachmatch(r"[^a-zA-Z0-9]", str))

#4-element Array{RegexMatch,1}:

# RegexMatch(" ")

# RegexMatch("!")

Tuples are a Julia type. They are an abstraction of function arguments without the function. The arguments are defined in a specific order and have well-defined types. Tuples can have any number of parameters, and they do not have field names. Fields are accessed by their index, and tuples are defined using brackets () and commas. A very useful feature of tuples in Julia is that each element of a tuple can have its own type. Variable values can be assigned directly from a tuple where the value of each variable corresponds to a value in the tuple.

## A tuple comprising only floats

tup1 = (3.0, 9.1, 0.8, 1.9)

tup1

# (3.0, 9.1, 0.8, 1.9)

typeof(tup1)

# NTuple{4,Float64}

## A tuple comprising strings and floats

tup2 = ("Data", 2.5, "Science", 8.8)

typeof(tup2)

# Tuple{String,Float64,String,Float64}

## variable assignment

a,b,c = ("Fast", 1, 5.2)

#"Fast"

# 1

#5.2

An array is a multidimensional grid that stores objects of any type. To improve performance, arrays should contain only one specific type, e.g., Int. Arrays do not require vectorizing for fast array computations. The array implementation used by Julia is written in Julia and relies on the compiler for performance. The compiler uses type inference to make optimized code for array indexing, which makes programs more readable and easier to maintain. Arrays are a subtype of the AbstractArray type. As such, they are laid out as a contiguous block of memory. This is not true of other members of the AbstractArray type, such as SparseMatrixCSC and SubArray.

The type and dimensions of an array can be specified using Array{T}(D), where T is any valid Julia type and D is the dimension of the array. The first entry in the tuple D is a singleton that specifies how the array values are initialized. Users can specify undef to create an uninitialized array, nothing to create arrays with no values, or missing to create arrays of missing values. Arrays with different types can be created with type Any.

## A vector of length 5 containing integers

a1 = Array{Int64}(undef, 5)

## A 2x2 matrix containing integers

a2 = Array{Int64}(undef, (2,2))

#2x2 Array{Int64,2}:

#493921239145 425201762420

#416611827821 104

## A 2x2 matrix containing Any type

a2 = Array{Any}(undef, (2,2))

#2x2 Array{Any,2}:

# #undef #undef

In Julia, [] can also be used to generate arrays. In fact, the Vector(), Matrix() and collect() functions can also be used.

## A three-element row "vector"

a4 = [1,2,3]

## A 1x3 column vector -- a two-dimensional array

a5 = [1 2 3]

## A 2x3 matrix, where ; is used to separate rows

a6 = [80 81 82 ; 90 91 92]

## Notice that the array a4 does not have a second dimension, i.e., it is

## neither a 1x3 vector nor a 3x1 vector. In other words, Julia makes a

## distinction between Array{T,1} and Array{T,2}.

# 3-element Array{Int64,1}:

# 1

# 2

# 3

## Arrays containing elements of a specific type can be constructed like:

a7 = Float64[3.0 5.0 ; 1.1 3.5]

## Arrays can be explicitly created like this:

Vector(undef, 3)

# 3-element Array{Any,1}:

# #undef

Matrix(undef, 2,2)

# 2x2 Array{Any,2}:

# #undef #undef

## A 3-element Float array

a3 = collect(Float64, 3:-1:1)

# 3-element Array{Float64,1}:

# 3.0

# 2.0

# 1.0

Julia has many built-in functions that generate specific kinds of arrays. Here are some useful ones:

• rand(T, d1, ..): if T is Float, a d1 -dimensional array of random numbers between 0 and 1 is returned; if an array is specified as the first argument, d1 random elements from the array are returned.

• randn(T, d1, ..) is a d1 -dimensional array of random numbers from the standard normal distribution with mean zero and standard deviation 1.

• MatrixT(I, (n,n)) is the n×n identity matrix. The identity operator I is available in the LinearAlgebra.jl package.

Arrays can easily be concatenated in Julia. There are two functions commonly used to concatenate arrays:

• vcat(A1, A2, ..) concatenates arrays vertically, i.e., stacks A1 on top of A2.

• hcat(A1, A2, ..) concatenates arrays horizontally, i.e., adds A2 to the right of A1.

The following code block illustrates some useful array functions as well as slicing. Slicing for arrays works similarly to slicing for strings.

## Create a 2x2 identity matrix

using LinearAlgebra

imat = Matrix{Int8}(I, (2,2))

## return random numbers between 0 and 1

rand(2)

#2-element Array{Float64,1}:

# 0.86398

# 0.491484

B = [80 81 82 ; 90 91 92]

# 2x3 Array{Int64,2}:

# 80 81 82

# 90 91 92

## return random elements of B

rand(B,2)

#2-element Array{Int64,1}:

# 80

# 91

## The number of elements in B

length(B)

# 6

## The dimensions of B

size(B)

# (2, 3)

## The number of dimensions of B

ndims(B)

# 2

## A new array with the same elements (data) as B but different dimensions

reshape(B, (3, 2))

# 3x2 Array{Int64,2}:

# 80 91

# 90 82

# 81 92

## A copy of B, where elements are recursively copied

B2 = deepcopy(B)

## When slicing, a slice is specified for each dimension

## The first two rows of the first column done two ways

B[1:2, ]

# 2-element Array{Int64,1}:

# 80

# 90

B[1:2,1]

## The first two rows of the second column

B[1:2,2]

# 2-element Array{Int64,1}:

# 81

# 91

## The first row

B[1,:]

# 3-element Array{Int64,1}:

# 80

# 81

# 82

## The third element

B[3]

#81

# Another way to build an array is using comprehensions

A1 = [sqrt(i) for i in [16,25,64]]

# 3-element Array{Float64,1}:

# 4.0

# 5.0

# 8.0

A2 = [i^2 for i in [1,2,3]]

# 3-element Array{Int64,1}:

# 1

# 4

# 9

From a couple of examples in the above code block, we can see that Julia counts array elements by column, i.e., the kth element of the n × m matrix X is the kth element of the nm-vector vec(X). Array comprehensions, illustrated above, are another more sophisticated way of building arrays. They generate the items in the array with a function and a loop. These items are then collected into an array by the brackets [] that surround the loop and function.

In Julia, dictionaries are defined as associative collections consisting of a key-value pair, i.e., the key is associated with a specific value. These key-value pairs have their own type in Julia, Pairtypeof(key), typeof(value) which creates a Pair object. Alternatively, the => symbol can be used to separate the key and value to create the same Pair object. One use of Pair objects is in the instantiation of dictionaries. Dictionaries in Julia can be used analogously to lists in R. Dictionaries are created using the keyword Dict and types can be specified for both the key and the value. The keys are hashed and are always unique.

## Three dictionaries, D0 is empty, D1 and D2 are the same

D0 = Dict()

D1 = Dict(1 => "red", 2 => "white")

D2 = Dict{Integer, String}(1 => "red", 2 => "white")

## Dictionaries can be created using a loop

food = ["salmon", "maple syrup", "tourtiere"]

food_dict = Dict{Int, String}()

## keys are the foods index in the array

for (n, fd) in enumerate(food)

food_dict[n] = fd

end

## Dictionaries can also be created using the generator syntax

wine = ["red", "white", "rose"]

wine_dict = Dict{Int,String}(i => wine[i] for i in 1:length(wine))

Values can be accessed using [] with a value of a dictionary key inserted between them or get(). The presence of a key can be checked using haskey() and a particular key can be accessed using getkey(). Keys can also be modified, as illustrated in the below code block. Here, we also demonstrate adding and deleting entries from a dictionary as well as various ways of manipulating keys and values. Note that the following code block builds on the previous one.

## Values can be accessed similarly to an array, but by key:

food_dict[1]

## The get() function can also be used; note that "unknown" is the

## value returned here if the key is not in the dictionary

get(food_dict, 1, "unknown")

get(food_dict, 7, "unknown")

## We can also check directly for the presence of a particular key

haskey(food_dict, 2)

haskey(food_dict, 9)

## The getkey() function can also be used; note that 999 is the

## value returned here if the key is not in the dictionary

getkey(food_dict, 1, 999)

## A new value can be associated with an existing key

food_dict

food_dict[1] = "lobster"

## Two common ways to add new entries:

food_dict[4] = "bannock"

get! (food_dict, 4, "bannock")

## The advantage of get!() is that is will not add the new entry if

## a value is already associated with the the key

get!(food_dict, 4, "toutiere")

## Just deleting entries by key is straightforward

delete!(food_dict,4)

## But we can also delete by key and return the value associated with

## the key; note that 999 is returned here if the key is not present

deleted_fd_value = pop!(food_dict,3, 999)

# Keys can be coerced into arrays

collect(keys(food_dict))

# Values can also be coerced into arrays

collect(values(food_dict))

# We can iterate over both keys and values

for (k, v) in food_dict

println("food_dict: key: ", k, " value: ", v)

end

# We could also just loop over keys

for k in keys(food_dict)

println("food_dict: key: ", k)

end

# Or could also just loop over values

for v in values(food_dict)

println("food_dict: value: ", v)

end

In Julia, a compound expression is one expression that is used to sequentially evaluate a group of subexpressions. The value of the last subexpression is returned as the value of the expression. There are two ways to achieve this: begin blocks and chains.

## A begin block

b1 = begin

c = 20

d = 5

c * d

end

println("b1: ", b1)

# 100

## A chain

b2 = (c = 20 ; d = 5 ; c * d)

println("b2: ", b2)

# 100

Conditional evaluation allows parts of a program to be evaluated, or not, based on the value of a Boolean expression, i.e., an expression that produces a true/false value. In Julia, conditional evaluation takes the form of an if-elseif-else construct, which is evaluated until the first Boolean expression evaluates to true or the else statement is reached. When a given Boolean expression evaluates to true, the associated block of code is executed. No other code blocks or condition expressions within the if-elseif-else construct are evaluated. An if-elseif-else construct returns the value of the last executed statement. Programmers can use as many elseif blocks as they wish, including none, i.e., an if-else construct. In Julia, if, elseif and else statements do not require parentheses; in fact, their use is discouraged.

# An if-else construct

k = 1

if k == 0

"zero"

else

"not zero"

end

# not zero

# An if-elseif-else construct

k = 11

if k % 3 == 0

elseif k % 3 == 1

else

end

# 2

An alternative approach to conditional evaluation is via shortcircuit evaluation. This construct has the form a ? b : c, where a is a Boolean expression, b is evaluated if a is true, and c is evaluated if a is false. Note that ? : is called the "ternary operator", it associates from right to left, and it can be useful for short conditional statements. Ternary operators can be chained together to accommodate situations analogous to an if-elseif-else construct with one or more ifelse blocks.

# A short-circuit evaluation

b= 10; c = 20;

println("SCE: b < c: ", b < c ? "less than" : "not less than")

# A short-circuit evaluation with nesting

d = 10; f = 10;

println("SCE: chained d vs e: ",

d < f ? "less than " :

d > f ? "greater than" : "equal")

# Note that we do not use e in the above example because it is a literal

# in Julia (the exponential function); while it can be overwritten, it is

# best practice to avoid doing so.

# ERROR: UndefVarError: e not defined

using Base.MathConstants

# e = 2.7182818284590…

Two looping constructs exist in Julia: for loops and while loops. These loops can iterate over any container, such as a string or an array. The body of a loop ends with the end keyword. Variables referenced inside loops are typically in the local scope of the loop. When using variables defined outside the body of the loop, pre-append them with the global keyword inside the body of the loop. A for loop can operate over a range object representing a sequence of numbers, e.g., 1:5, which it uses to get each index to loop through the range of values in the range, assigning each one to an indexing variable. The indexing variable only exists inside the loop. When looping over a container, for loops can access the elements of the container directly using the in operator. Rather than using simple nesting, nested for loops can be written as a single outer loop with multiple indexing variables forming a Cartesian product, e.g., if there are two indexing variables then, for each value of the first index, each value of the second index is evaluated.

str = "Julia"

## A for loop for a string, iterating by index

for i = 1:length(str)

print(str[i])

end

## A for loop for a string, iterating by container element

for s in str

print(s)

end

## A nested for loop

for i in str, j = 1:length(str)

println((i, j))

end

# ('J', 1)

# ('J', 2)

#..

# ('a', 4)

# ('a', 5)

## Another nested for loop

odd = [1,3,5]

even = [2,4,6]

for i in odd, j in even

println("i*j: $(i*j)")

end

# i*j: 2

# i*j: 4

# ..

# i*j: 20

# i*j: 30

A while loop evaluates a conditional expression and, as long as it is true, the loop evaluates the code in the body of the loop. To ensure that the loop will end at some stage, an operation inside the loop has to falsify the conditional expression. Programmers must ensure that a while loop will falsify the conditional expression, otherwise the loop will become “infinite” and never finish executing.

## Example of an infinite while loop (nothing inside the loop can falsify

## the condition x<10)

n=0

x=1

while x<10:

global n

n=n+1

end

## A while loop to estimate the median using an MM algorithm

using Distributions, Random

Random.seed!(1234)

iter = 0

N = 100

x = rand(Normal(2,1), N)

psi = fill!(Vector{Float64}(undef,2), 1e9)

while(true)

global iter, x, psi

iter += 1

if iter == 25

println("Max iteration reached at iter=$iter")

break

end

num, den = (0,0)

## elementwise operations in wgt

wgt = (abs.(x .- psi[2])).^-1

num = sum(wgt .* x)

den = sum(wgt)

psi = circshift(psi, 1)

psi[2] = num / den

dif = abs(psi[2] - psi[1])

if dif < 0.001

print("Converged at iteration $iter")

break

end

# gives an estimate of the median

median(x)

# 1.959

psi[2]

# 1.956

When writing loops, it is often advantageous to allow a loop to terminate early, before it has completed. In the case of a while loop, the loop would be broken before the test condition is falsified. When iterating over an iterable object with a for loop, it is stopped before the end of the object is reached. The break keyword can accomplish both tasks. The following code block has two loops, a while loop that calculates the square of the index variable and stops when the square is greater than 16. Note that without the break keyword, this is an infinite loop. The second loop does the same thing, but uses a for loop to do it. The for loop terminates before the end of the iterable range object is reached.

## break keyword

i = 0

while true

global i

sq = i^2

println("i: $i --- sq: = $sq")

if sq > 16

break

end

i += 1

end

# i: 0 --- sq: = 0

# i: 1 --- sq: = 1

# i: 2 --- sq: = 4

# i: 3 --- sq: = 9

# i: 4 --- sq: = 16

# i: 5 --- sq: = 25

for i = 1:10

sq = i^2

println("i: $i --- sq: = $sq")

if sq > 16

break

end

In some situations, it might be the case that a programmer wants to move from the current iteration of a loop immediately into the next iteration before the current one is finished. This can be accomplished using the continue keyword.

## continue keyword

for i in 1:5

if i % 2 == 0

continue

end

sq = i^2

println("i: $i --- sq: $sq")

end

# i: 1 --- sq: 1

# i: 3 --- sq: 9

# i: 5 --- sq: 25

In real world scenarios, continue could be used multiple times in a loop and there could be more complex code after the continue keyword.

Exceptions are unexpected conditions that can occur in a program while it is carrying out its computations. The program may not be able to carry out the required computations or return a sensible value to its caller. Usually, exceptions terminate the function or program that generates it and prints some sort of diagnostic message to standard output. An example of this is given in the following code block, where we try and take the logarithm of a negative number and the log() function throws an exception.

## Generate an exception

log(-1)

# ERROR: DomainError with -1.0:

# log will only return a complex result if called with a complex argument.

# Try log(Complex(x)).

# Stacktrace:

# [1] throw_complex_domainerror(::Symbol, ::Float64) at ./math.jl:31

# [2] log(::Float64) at ./special/log.jl:285

# [3] log(::Int64) at ./special/log.jl:395

# [4] top-level scope at none:0

In the above code block, the log() function threw a DomainError exception. Julia has a number of built-in exceptions that can be thrown and captured by a Julia program. Any exception can be explicitly thrown using the throw() function.

## throw()

for i in [1, 2, -1, 3]

if i < 0

throw(DomainError())

else

println("i: $(log(i))")

end

# i: 0.0

# i: 0.6931471805599453

# ERROR: MethodError: no method matching DomainError()

# Closest candidates are:

# DomainError(::Any) at boot.jl:256

# DomainError(::Any, ::Any) at boot.jl:257

# Stacktrace:

# [1] top-level scope at ./none:3

## error

for i in [1, 2, -1, 3]

if i < 0

error("i is a negative number")

else

println("i: $(log(i))")

end

# i: 0.0

# i: 0.6931471805599453

# ERROR: i is a negative number

# Stacktrace:

# [1] top-level scope at ./none:3

In the previous code block, we throw the DomainError() exception when the input to log() is negative. Note that DomainError() requires the brackets () to return an exception object. Without them, it is referring to the exception type. The error() function can be used in a similar way. It produces an object of type ErrorException that will immediately stop all execution of the Julia program.

If we want to test for an exception and handle it gracefully, we can use a try-catch statement to do this. These statements allow us to catch an exception, store it in a variable if required, and try an alternative way of processing the input that generated the exception.

## try/catch

for i in [1, 2, -1, "A"]

try log(i)

catch ex

if isa(ex, DomainError)

println("i: $i --- Domain Error")

log(abs(i))

else

println("i: $i")

println(ex)

error("Not a DomainError")

end

# i: -1 --- Domain Error

# i: A

# MethodError(log, ("A",), 0x00000000000061f0)

# ERROR: Not a DomainError

# Stacktrace:

# [1] top-level scope at ./none:10

In the previous code block, the exception is stored in the ex variable and when the error is not a DomainError(), its value is returned along with the ErrorException defined by the call to error(). Note that try-catch blocks can degrade the performance of code because of the overhead they require. For high-performance code, it is better to use standard conditional evaluation to handle known exceptions.

A function is an object that takes argument values as a tuple and maps them to a return value. Functions are first-class objects in Julia. They can be:

A first-class object is one that accommodates all operations other objects support. Operations typically supported by first-class objects in all programming languages are listed above. The basic syntax of a function is illustrated in the following code block.

function add(x,y)

return(x+y)

end

In Julia, function names are all lowercase, without underscores, but can include Unicode characters. It is best practice to avoid abbreviations, e.g., fibonacci() is preferable to fib(). The body of the function is the part contained on the lines between the function and end keywords. Parenthesis syntax is used to call a function, e.g., add(3, 5) returns 8. Because functions are objects, they can be passed around like any value and, when passed, the parentheses are omitted.

addnew = add

addnew(3,5)

# 8

Functions may also be written in assignment form, in which case the body of the function must be a single expression. This can be a very useful approach for simple functions because it makes code much easier to read.

Argument passing is done by reference. Modifications to the input data structure (e.g., array) inside the function will be visible outside it. If function inputs are not to be modified by a function, a copy of the input(s) should be made inside the function before doing any modifications. Python and other dynamic languages handle their function arguments in a similar way.

## Argument passing

function f1!(x)

x[1] = 9999

return(x)

end

ia = Int64[0,1,2]

println("Array ia: ", ia)

# Array ia: [0, 1, 2]

f1!(ia)

println("Argument passing by reference: ", ia)

# Argument passing by reference: [9999, 1, 2]

By default, the last expression that is evaluated in the body of a function is its return value. However, when the function body contains one or more return keywords, it returns immediately when a return keyword is evaluated. The return keyword usually wraps an expression that provides a value when returned. When used with the control flow statements, the return keyword can be especially useful.

## A function with multiple options for return

function gt(g1, g2)

if(g1 >g2)

return("$g1 is largest")

elseif(g1<g2)

return("$g2 is largest")

else

return("$g1 and $g2 are equal")

end

gt(2,4)

# "4 is largest"

The majority of Julia operators are actually functions and can be called with parenthesized argument lists, just like other functions.

## These are equivalent

2*3

# 6

*(2,3)

# 6

Functions can also be created without a name, and such functions are called anonymous functions. Anonymous functions can be used as arguments for functions that take other functions as arguments.

## map() applies a function to each element of an array and returns a new

## array containing the resulting values

a = [1,2,3,1,2,1]

mu = mean(a)

sd = std(a)

## centers and scales a

b = map(x -> (x-mu)/sd, a)

Julia accommodates optional arguments by allowing function arguments to have default values, similar to R and many other languages. The value of an optional argument does not need to be specified in a function call.

## A function with an optional argument. This is a recursive function,

## i.e., a function that calls itself, for computing the sum of the first n

## elements of the Fibonacci sequence: 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, …

function fibonacci(n=20)

if (n<=1)

return 1

else

return fibonacci(n-1)+fibonacci(n-2)

end

## Sum the first 12 elements of the Fibonacci sequence

fibonacci(12)

# 233

## Because the optional argument defaults to 20, these are equivalent

fibonacci()

fibonacci(20)

Function arguments determine its behaviour. In general, the more arguments a function has, the more varied its behaviour will be. Keyword arguments are useful because they help manage function behaviour; specifically, they allow arguments to be specified by name and not just position in the function call. In the below code block, an MM algorithm is demonstrated. Note that we have already used an MM algorithm in Section 2.5.3.1, but now we construct an MM algorithm as a function. MM algorithms are blueprints for algorithms that either iteratively minimize a majorizing function or iteratively maximize a minorizing function — see Hunter and Lange (2000, 2004) for further details.

## A function with a keyword argument

## Arguments after the ; are keyword arguments

## The default values are evaluated from left-to-right.

## This allows keyword arguments to refer to previously defined keywords

## Keyword arguments can have explicit types

## estimate the median of a 1D array using an MM algorithm

## for clarity (too many m's!) we use an _ in the function name

function mm_median(x, eps = 0.001; maxit = 25, iter::Int64=Int(floor(eps)))

## initalizations

psi = fill!(Vector{Float64}(undef,2), 1e2)

while(true)

iter + = 1

if iter == maxit

println("Max iteration reached at iter=$iter")

break

end

num, den = (0,0)

## use map() to do elementwise operations in wgt

wgt = map(d -> (abs(d - psi[2]))"(-1), x)

num = sum(map(*, wgt, x))

den = sum(wgt)

psi = circshift(psi, 1)

psi [2] = num / den

dif = abs(psi[2] - psi[1])

if dif < eps

print("Converged at iteration $iter")

break

end

return(Dict(

"psi_vec" => psi,

"median" => psi [2]

))

end

## Run on simulated data

using Distributions, Random

Random.seed!(1234)

N = Int(1e3)

dat = rand(Normal(0,6), N)

## Function calls using different types of arguments

median(dat)

# 0.279

mm_median(dat, 1e-9)["median"]

# Max iteration reached at iter=25

mm_median(dat, maxit=50)["median"]

# Converged at iteration 26

# 0.296

mm_median(dat, 1e-9, maxit=100)["median"]

# Converged at iteration 36

# 0.288

1. Write programs as a series of functions because: functions are testable and reusable, they have well-defined inputs and outputs, and code inside functions usually runs much faster because of how the Julia compiler works.

3. Functions with ! at the end of their names modify their arguments instead of copying them. Julia library functions often come in two versions, distinguished by the !.

5. When writing functions that take numbers as arguments, use the Int type when possible. When combined in numerical computations, they change the resulting Julia type of the result less frequently. This is known as type promotion.

6. If function arguments have specific types, the function call should force its arguments to be the required type.

## Tip3: Function with a ! in the name

a1 = [2,3,1,6,2,8]

sort!(a1)

#6-element Array{Int64,1}:

# 1

# 2

# 3

# 6

# 8

## Tip 4

## Do not wrap abs() in an anonymous function

A = [1, -0.5, -2, 0.5]

map(x -> abs(x), A)

# Rather, do this

## abs() is not wrapped in an anonymous function

map(abs, A)

##Tip 5: Type promotion

times1a(y) = *(y, 1)

times1b(y) = *(y, 1.0)

println("times1a(1/2): ", times1a(1/2))

println("times1a(2): ", times1a(2)) ## preserves type

println("times1a(2.0): ", times1a(2.0))

println("times1b(1/2): ", times1b(1/2))

println("times1b(2): ", times1b(2)) ## changes type

println("times1b(2.0): ", times1b(2.0))

## Tip6: Function with typed arguments

times1c(y::Float64) = *(y, 1)

times1c(float(23))