A short introduction to R



Juan C. Rocha
Stockholm Resilience Centre, Stockholm University

About Juan



  • Born in Colombia
  • Ecologist
    • Research assistant and consultant: UniAndes, WWF, Inter American Development Bank
  • MSc Ecosystems Governance
  • PhD Sustainability Science
  • Research interests:
    • Regime Shifts
    • Complex systems
    • Networks
    • Collective action

Slides: juanrocha.se/presentations/R_Intro
Recommended browser: Chrome (for nicer fonts)

Learning objectives

What you will learn

Session Content When
1 How R language works: Data types, functions, and other basics April
2 Data cleaning, wrangling, and visualization May
3 Tools for reproducible research and workflows June
4 Useful packages for SES research: clustering and ordination September
5 Tools for geographic analysis and mapping October
6 Tools for time series analysis November
7 Tools for network analysis December
8 Tools for text analysis/text mining January
9 Web-based interactive data tools February
10 Multilingual programming March

What you won’t learn

  • Statistics*
  • Machine learning
  • Advanced programming



You will get a soft intro to each topic, but each topic is worth a course on its own

Why R?

  • Wrong question!…However:
  • Open source, dynamic language & welcoming community of developers
  • Many at SRC / environmental science community speak R
  • R is a programming language with strong rooting in Statistics and widely used in the social and natural sciences

Why programming?

  • Empower you to do different kinds of science
    • Accessing data
    • Applying state-of-the-art statistical techniques
    • Updating as new data becomes available
    • Communicating to others
  • Helps structuring your thinking
  • Reproducibility
  • Laziness is a core value: don’t waste time in repetition

There are >350 programming languages (Valverde 2016)

How many languages do you speak?

R language

R is an object oriented programming languange. Learning R takes time and practice as with any other language. The more you use it, the better you become at speaking it. Do you have R already installed? Do you have RStudio? If not please follow:

Outline

  • How does R look?
  • R as calculator
  • Basic objects and classes
  • Assignments and comparisons
  • Functions & Packages
  • Graphics
  • Getting help

Good looking R

  • In basic R you will have the console and some scripts
  • In RStudio you have:
    1. Console
    2. Scripts
    3. Environment
    4. Plots

RStudio

R as calculator

You can use R to calculate simple and complex operations:

4 + 5
[1] 9

You can store the results of any operation by assigning it to an object. The assignment operator <- does it for you

a <- (35 * 43)/3
a
[1] 501.6667

Some times people uses = instead of <- for doing assignments

b = 45/9  # But <- is preferred
b
[1] 5

Note you can make comments on you code with #

Objects & Classes

Object classess depending on dimensions and type
Dimensions Homogeneous Heterogeneous
1 Vector List
2 Matrix Data Frame
3 Array NA

One can think of a vector as a line of objects of the same kind

a <- c(1, 3, 5, 8)  # numeric vector

# Let's check out
class(a)
[1] "numeric"

vector: 1d object

b <- c("Celinda", "Juan", "Miriam")  # character vector
c <- c(TRUE, TRUE, FALSE)  # logical vector

# Check
class(b)
[1] "character"
class(c)
[1] "logical"

Vectors indexing

You can access the value of any element of your vectors by referring to their position using []

a[2]  # second element of vector a
[1] 3
a[2:4]  # you can use : to make sequence of numbers, here 2 to 4
[1] 3 5 8
b[c]  # you can pass an indexing vector to another vector!
[1] "Celinda" "Juan"   

Unlike other languages, R indexing starts at 1

matrix: 2d objects

m <- matrix(1:12, nrow = 3, ncol = 4, byrow = TRUE)
m
     [,1] [,2] [,3] [,4]
[1,]    1    2    3    4
[2,]    5    6    7    8
[3,]    9   10   11   12
dim(m)  # How big is it?
[1] 3 4
m[3, 4]  # Indexing
[1] 12

array: multi-dim objects

z <- array(1:12, dim = c(2, 2, 3))
z
, , 1

     [,1] [,2]
[1,]    1    3
[2,]    2    4

, , 2

     [,1] [,2]
[1,]    5    7
[2,]    6    8

, , 3

     [,1] [,2]
[1,]    9   11
[2,]   10   12

list: mixed objects

Mix of objects, don’t need to be the same class nor length

l <- list(a, b, c, m)  # unnamed list
l[[1]]
[1] 1 3 5 8
l <- list(a = a, b = b, c = c, m = m)  # named list
l$m  # you can use $ to access elements by name
     [,1] [,2] [,3] [,4]
[1,]    1    2    3    4
[2,]    5    6    7    8
[3,]    9   10   11   12

To access the elements of a list you should use [[]] by position, or $ by name

list: mixed objects

class(l)
[1] "list"
class(l[[4]])
[1] "matrix" "array" 
length(l)
[1] 4

If lists are mixed objects, can you put a list within a list? – Yes!

Data frames

Data frames are one of the most used objects in R. It can have different types of data

dat <- data.frame(books = a[2:4], student = b, pass = c)
dat
  books student  pass
1     3 Celinda  TRUE
2     5    Juan  TRUE
3     8  Miriam FALSE
str(dat)  # revels the structure of your data
'data.frame':   3 obs. of  3 variables:
 $ books  : num  3 5 8
 $ student: chr  "Celinda" "Juan" "Miriam"
 $ pass   : logi  TRUE TRUE FALSE

summary()

  • str() and summary() are very useful functions to understand how your data looks like
  • head() gives you the first 6 rows of your dataset and
  • tail(n = 3) the last 3 ones.

summary(dat)
     books         student             pass        
 Min.   :3.000   Length:3           Mode :logical  
 1st Qu.:4.000   Class :character   FALSE:1        
 Median :5.000   Mode  :character   TRUE :2        
 Mean   :5.333                                     
 3rd Qu.:6.500                                     
 Max.   :8.000                                     

Tricks for later…

tibbles are fancy printing data frames. They do the same as data.frame but have nicer printing options.

tibble::as_tibble(gapminder::gapminder)
# A tibble: 1,704 × 6
   country     continent  year lifeExp      pop gdpPercap
   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
 1 Afghanistan Asia       1952    28.8  8425333      779.
 2 Afghanistan Asia       1957    30.3  9240934      821.
 3 Afghanistan Asia       1962    32.0 10267083      853.
 4 Afghanistan Asia       1967    34.0 11537966      836.
 5 Afghanistan Asia       1972    36.1 13079460      740.
 6 Afghanistan Asia       1977    38.4 14880372      786.
 7 Afghanistan Asia       1982    39.9 12881816      978.
 8 Afghanistan Asia       1987    40.8 13867957      852.
 9 Afghanistan Asia       1992    41.7 16317921      649.
10 Afghanistan Asia       1997    41.8 22227415      635.
# … with 1,694 more rows

Indexing data frames

You can index by position

dat[2, 3]
[1] TRUE
dat
  books student  pass
1     3 Celinda  TRUE
2     5    Juan  TRUE
3     8  Miriam FALSE

Comparison

Remember

m  # our matrix with 1:12
     [,1] [,2] [,3] [,4]
[1,]    1    2    3    4
[2,]    5    6    7    8
[3,]    9   10   11   12
a  # our vector
[1] 1 3 5 8

We can do some element-wise comparison with binary operators

Comparison

x <- stats::rnorm(5)
x < 1
[1]  TRUE  TRUE FALSE  TRUE  TRUE
x[x > 0]
[1] 0.9917544 1.3410942
x1 <- 0.5 - 0.3
x2 <- 0.3 - 0.1
x1
[1] 0.2
x2
[1] 0.2

Assignment & the pipe

The assignment operator <- allows you to update or create new objects

l[[5]] <- l  # A list inside a list
length(l)
[1] 5
class(l)
[1] "list"

Functions

Where the magic starts…

What is magic?

Discuss in pairs for

`

03:00

`

Functions

Everything in R is a function call - John Chambers

# You can see how a function looks by calling it without ()
matrix
function (data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL) 
{
    if (is.object(data) || !is.atomic(data)) 
        data <- as.vector(data)
    .Internal(matrix(data, nrow, ncol, byrow, dimnames, missing(nrow), 
        missing(ncol)))
}
<bytecode: 0x7ff0030aa320>
<environment: namespace:base>
  • Information in () are arguments, the values above are the default values, e.g. byrow = FALSE
  • <environment: namespace:base> means that matrix is a function of the package base

Functions & Packages 📦

We will see more about functions and how to write them in the future. For now, it’s good to know that functions are procedures or routines that were written by other R users and programmers to make your life simple (most of the times!). R comes with a number of basic funcitons by default, such as mean() or matrix(). Functions that work together for certain purposes (e.g. network analysis, linear regression) come in packages that you need to download, install and call every time you want to use them.

# install.packages ('network') # Intall a new package
library(network)  # Call it into your session

'network' 1.17.1 (2021-06-12), part of the Statnet Project
* 'news(package="network")' for changes since last version
* 'citation("network")' for citation information
* 'https://statnet.org' for help, support, and other information

Functions & Packages 📦

# Now we can build a network
m <- matrix(rbinom(25^2, 1, 0.1), 25, 25)
net <- network(m)
net
 Network attributes:
  vertices = 25 
  directed = TRUE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 60 
    missing edges= 0 
    non-missing edges= 60 

 Vertex attribute names: 
    vertex.names 

No edge attributes

Packages 📦

Currently CRAN package repository features 19021 available packages.

Graphics

One of the most powerful tools in R is its graphics capabilities. Visualizations helps you to explore and understand better your data, it inspires new questions and approaches to analysis. Some useful packages:

  • base::plot
  • grid
  • lattice
  • ggplot2

Plots examples: base

Plots examples: ggplot

Plots examples: circlos

Plots examples: networks

Plots examples: ggplot or heatmaps

Getting help

Lerning R is an iterative process between doing something, getting stuck, asking for help, solving your problem, and keep going.

help("mean")

You can also type ?mean or ??mean

Getting help

All languages have dialects

stairs | escalators, lift | elevators, trucks | lorries, pants | trousers

base R

dat[dat$books < 5 & dat$pass == TRUE, ]
  books student pass
1     3 Celinda TRUE


Be patient, diversity requires open mind but is good: redundancy.

Resources

Exercise

  1. Open an RStudio session
  2. Load the gapminder data
  3. Which country has the highest gdpPercap in history?
    • Tip: sort() and order()
  4. Do larger countries in population have higher GDP?
    • Tip: plot()
  5. What is the highest lifeExp, where and when does it happen?
    • Tip: max()
  6. Visualize the life expectancy over time for Sweden
    • Tip: plot(), with()