- Structure of simple functions
- Advanced functions for iteration
if
,ifelse
,for
,while
apply()
and its family (tapply
,lapply
,sapply
,mapply
)
- Homework
- Exercises
Juan C. Rocha
Stockholm Resilience Centre, Stockholm University
if
, ifelse
, for
, while
apply()
and its family (tapply
, lapply
, sapply
, mapply
)gapminder
datacountry
is of the class factor
, discuss in couples what are they and when are useful?
help(factor)
& help(levels)
gapminder
data?for()
, median()
, range()
lifeExp
vs median gdpPercap
??base::plot
gapminder
data?[1] "factor"
[1] 142
base
countries <- levels(gapminder$country)
# when working with loops, it is a good practice to
# declare the objects that will collect your results
ale <- list() # average life expectancy
pop_rng <- list() # population range
med_gdp <- list() # median gdp
for (i in seq_along(countries)) {
ale[[i]] <- avg(
gapminder[gapminder$country == countries[i], "lifeExp"])
pop_rng[[i]] <- range(
gapminder[gapminder$country == countries[i], "pop"])
med_gdp[[i]] <- median(
gapminder[gapminder$country == countries[i],]$gdpPercap, na.rm = TRUE)
}
lifeExp
vs median gdpPercap
for (i in something)
… do something: you know how many iterations you needwhile
some condition is F/T … do something: you don’t know how many iteration it will takepurrr
purrr
:
purrr
All functions are type-stable
map()
: listmap_dbl()
: doublemap_chr()
: charactermap_int()
: integermap_raw()
: raw vectormap_dfr()
: data frame binding rowsmap_dfc()
: data frame binding columnsmodify()
: same as inputpurrr
Call:
lm(formula = gdpPercap ~ year, data = gapminder)
Coefficients:
(Intercept) year
-249693.7 129.8
gapminder |>
ggplot(aes(year, gdpPercap)) +
geom_smooth(method = "lm") + theme_light(base_size = 10)
split(gapminder, ~country) |>
map(~ lm(gdpPercap ~ year, data = .)) |>
map(summary) |>
map(coef) |>
map_dfr(function(x) x["year","Estimate"]) |>
pivot_longer(cols = everything(), names_to = "country", values_to = "slope") |>
arrange(slope)
# A tibble: 142 × 2
country slope
<chr> <dbl>
1 Kuwait -1584.
2 Iraq -48.6
3 Nicaragua -29.4
4 Angola -23.3
5 Djibouti -20.1
6 Madagascar -14.1
7 Congo, Dem. Rep. -12.9
8 Central African Republic -9.81
9 Haiti -9.55
10 Somalia -7.91
# … with 132 more rows
[1] 2 5 8
When a function fails the process typically stops. What if you need completion and keeping track of errors?
Error in log("juan"): non-numeric argument to mathematical function
List of 2
$ result: num 2.3
$ error : NULL
List of 2
$ result: NULL
$ error :List of 2
..$ message: chr "non-numeric argument to mathematical function"
..$ call : language .Primitive("log")(x, base)
..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
When there is an error, the result is null and the error message captured
List of 3
$ :List of 2
..$ result: num 0
..$ error : NULL
$ :List of 2
..$ result: num 2.3
..$ error : NULL
$ :List of 2
..$ result: NULL
..$ error :List of 2
.. ..$ message: chr "non-numeric argument to mathematical function"
.. ..$ call : language .Primitive("log")(x, base)
.. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
List of 2
$ result:List of 3
..$ : num 0
..$ : num 2.3
..$ : NULL
$ error :List of 3
..$ : NULL
..$ : NULL
..$ :List of 2
.. ..$ message: chr "non-numeric argument to mathematical function"
.. ..$ call : language .Primitive("log")(x, base)
.. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
Here you can see the errors by naked eye, but what if you have too many objects?
[[1]]
[1] "a"
[1] 0.000000 2.302585
Advantages:
purrr
offers other adverb functions:
possibly()
quietly()
tidyverse
to summarize numeric and non-numeric variables. Extract for example means, max, min, number of observations and missing values.map()
to automate the process for all variables of the same type
keep()
and discard()
Prepare for next class an explanation of the following:
pmap()
and walk()
functions. Bring an example of when they are useful.All lecture notes are based on Hadley Wickham’s books R for Data Science and Advanced R.