2 Basic Concepts

R is an object oriented language. The main data structures (which are objects) include :

  • Vectors
  • Matrices
  • DataFrame & tibbles
  • Lists
  • Factors

These data objects can be manipulated using functions (which are also objects). There are built in functions, but you can also create customised functions.

R functions are stored in packages in your R library. Packages can be installed that give you additional functionalities (such as machine learning or graphing capabilities).

2.1 Vectors

These can be a single values or multiple values like an array.

Each element in the vector is indexed from 1 to n. You can use these index numbers to extract an element, for example to get the 2nd element in colours vector:

## [1] "orange"

2.2 Matrices

These are 2-dimensional vectors, but can only be numeric.

2.3 Dataframes

A dataframe is a 2-dimensional table. They can be manually created by combining vectors together:

2.4 Tibbles

Tibbles are dataframes that can be easier to use, but you will need the tibble package and other tidyverse packages to work with them.

2.5 Extracting data from Dataframes/Tibbles

Rows and columns can be extracted from dataframes using index number or column names.
To extract the FIRST COLUMN of values as a DataFrame :

## # A tibble: 4 x 1
##   Name        
##   <chr>       
## 1 John Doe    
## 2 Alice Liddel
## 3 Peter Piper 
## 4 Jolie Hope

To extract the FIRST ROW of values as a DataFrame

## # A tibble: 1 x 3
##   Name       Age Gender
##   <chr>    <dbl> <chr> 
## 1 John Doe  25.0 M

To extract the FIRST ROW of FIRST COLUMN of values as a Factor

## # A tibble: 1 x 1
##   Name    
##   <chr>   
## 1 John Doe

If you need more than 1 row/column you can use a semi-colon (:). For example to get the first 3 rows:

## # A tibble: 3 x 3
##   Name           Age Gender
##   <chr>        <dbl> <chr> 
## 1 John Doe      25.0 M     
## 2 Alice Liddel  29.0 F     
## 3 Peter Piper   34.0 M

If you are using a tibble the results will be the same, but the output will always be another tibble.

You can convert a tibble column into a vector/list by adding extra []. This can also be done using the pull() function from dplyr.

## [1] "John Doe"     "Alice Liddel" "Peter Piper"  "Jolie Hope"
## [1] "John Doe"

2.6 Lists

Lists can be used to group objects together. They can contain different types of object. They are a bit like dictionaries in Python. Each item in a List can be given a name.

You can access Lists in the same way as vectors (by index number or name), but the results will also be in a list structure. To prevent this you need to use double brackets:

## [1] 2019

2.7 Factors

These are like lists but more complicated - you will normally want to convert them to regular vectors/lists.
If you have a Factor you can convert it to a vector list:

## [1] "North" "South" "West"  "West"  "South"
## [1] 100 200 300 300 200

2.8 Dates

Dates are regarded as the number of days since 1st Jan 1970. To store a date you can use the as.Date() function which accepts dates written in the format ‘YYYY-MM-DD’. Other formats can be used is specified.

2.9 Style Guide

  • Make sure you use correct upper/lower case spelling
  • The three main data types are numeric, character and factor
  • Use the setwd(‘C:/…’) function to indicate the working directory for your files
  • Filepath reference must have forward slashes …/…/…/
  • Use == when evaluating equilavence, eg. if (a==b) …
  • To time a code add the following before and after : ptm <- proc.time() proc.time() - ptm
  • To add comments: