Data Structures in R

Md Zulquar Nain

AMU

February 15, 2024

Data Structure in R

  • For analysis,data should be structured in well-defined manner.
  • In R there are certain structures followed by imported data.
    • Vector
    • Matrices
    • Lists
    • Array
    • Data Frame

Vectors

  • scalars Vs Vectors?
  • groups of values of same data types
  • Simplest function to define a vector c(value1,value2,...)
  • All the operators and functions discussed earlier can be used for vectors also
  • However, operations are performed element by element
# define a vector `vec` with function `c`
vec <- c(2, 4,-4,0)
vec
[1]  2  4 -4  0
print(vec)
[1]  2  4 -4  0

Vectors

  • Other way to generate vectors
Special functions to create vectors
Functions Explanations
numeric(n) vector with n zeros
rep(x,n) Vector with n equal elements of x
seq(x)/seq(1:x) Sequence from 1 to x
seq(f,x)/seq(f:x) Sequence from 1 to x
seq(f,x,s) Sequence from 1 to x in steps s

Basic Operations of a Vector

Functions Explanations
length(v) Number of elements in vector v
max(v) Largest Number in vector v
min(v) Smallest number in vector v
sum(v) Sum of the elements in vector v
prod(v) Product of elements in vector v
sort(v) Sorting of the elements of vector v

Basic Operations of a Vector

# define a vector `vec` with function `c`
vec <- c(2, 4,-4,0,1,3)
vec
[1]  2  4 -4  0  1  3
#length
length(vec)
[1] 6
# largest number
max(vec)
[1] 4
#smallest number
min(vec)
[1] -4
#sum of elements
sum(vec)
[1] 6
#product of elements
prod(vec)
[1] 0
# sorting
sort(vec)
[1] -4  0  1  2  3  4

Special Types of Vectors

  • Character Vectors
  • Logical Vectors
  • Factors
# Character Vectors
sname <- c("Raju","abdul","Soha")
sname
[1] "Raju"  "abdul" "Soha" 
# logical Vectors
a <- c(7, 2, 6, 10, 4, 1, 3)
a
[1]  7  2  6 10  4  1  3
b <- a<3 | a>=6
b
[1]  TRUE  TRUE  TRUE  TRUE FALSE  TRUE FALSE

Special Types of Vectors

  • Factors
  • Rate my teaching😂
  • Good=1, Very good=2, Best=3
  • ten students’ response are recorded in numbers
# ratings in numbers
qt <- c(3, 2, 1, 1, 2, 3, 2, 1, 3, 3)
labelnames <- c("good", "very good", "best")
#creating a factor vector
fqt <- factor(qt,labels = labelnames)
qt
 [1] 3 2 1 1 2 3 2 1 3 3
fqt
 [1] best      very good good      good      very good best      very good
 [8] good      best      best     
Levels: good very good best

Matrices

  • A matrix is a collection of data elements arranged in a two-dimensional rectangular layout.
  • all the elements in a matrix are of the same data type
  • Function matrix is used to create matrix
m1 <- matrix(c(1, 2, 3, 4, 5, 6), nrow= 3, ncol = 2, byrow = TRUE)
# nrow- specify number of rows
# ncol-specify the number of columns
#byrow- fill the matrix in rows with the data supplied 
m1
     [,1] [,2]
[1,]    1    2
[2,]    3    4
[3,]    5    6
m2 <- matrix(c(1, 2, 3, 4, 5, 6), nrow= 3, ncol = 2, byrow = FALSE)
m2
     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6

Other functions to create matrices

  • Other ways to create matrix
  • By using functions dim, cbind and rbind
# Use of `dim` function
x <- 1:12
dim(x) <- c(3,4) 
x
     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12
# Use of `dim` function
dim(x) <- c(4,3)
x
     [,1] [,2] [,3]
[1,]    1    5    9
[2,]    2    6   10
[3,]    3    7   11
[4,]    4    8   12
# Use of `rbind` function
# rows must have same length 
r1 <- 1:3
r2 <- c(4, 5, 6)
rm <- rbind(r1, r2)  
rm 
   [,1] [,2] [,3]
r1    1    2    3
r2    4    5    6
# Use of `cbind` function
# columns must have same length 
c1 <- 1:3
c2 <- c(10, 9, 8)
c3 <- c(20, 5, 15)
cm <- cbind(c1, c2, c3)  
cm 
     c1 c2 c3
[1,]  1 10 20
[2,]  2  9  5
[3,]  3  8 15

Names and indexing in Matrices

  • Column names and row names
  • Extracting a specific element(s)
cm 
     c1 c2 c3
[1,]  1 10 20
[2,]  2  9  5
[3,]  3  8 15
# Names of columns and rows
colnames(cm) <- c("Eco", "Hist", "Pol")

rownames(cm) <- c("SI", "SII", "SIII")
cm
     Eco Hist Pol
SI     1   10  20
SII    2    9   5
SIII   3    8  15
# Extracting `Pol` in `SIII`
cm[3,3]
[1] 15
  • Extract first row and all columns
  • Extract second column and all rows
  • Extract all rows, second and third columns
# Extract first row and all columns
cm[1,]
 Eco Hist  Pol 
   1   10   20 
# Extract second column and all rows
cm[,2]
  SI  SII SIII 
  10    9    8 
# Extract first row and second column
cm[,c(2,3)]
     Hist Pol
SI     10  20
SII     9   5
SIII    8  15

Matrix manipulations

  • all the mathematical functions available for vectors are applicable on a matrix
  • all operations are applied on each element in a matrix
  • Multiply the matrix m1 with 3
# ratings in numbers
m1 <- matrix(c(1, 2, 3, 4, 5, 6), nrow= 3, ncol = 2, byrow = TRUE)
# Multiplying matrix m1 by 3
m3 <- m1*3
m1
     [,1] [,2]
[1,]    1    2
[2,]    3    4
[3,]    5    6
m3
     [,1] [,2]
[1,]    3    6
[2,]    9   12
[3,]   15   18
  • Mathematical operations-for example matrix multiplication
  • Rule of dimensionality must satisfy
# ratings in numbers
m1 <- matrix(c(1, 2, 3, 4, 5, 6), nrow= 3, ncol = 2, byrow = TRUE)
# Multiplying matrix m1 by 3
m1
     [,1] [,2]
[1,]    1    2
[2,]    3    4
[3,]    5    6
dim(m1)
[1] 3 2
  • create another matrix with 2 rows and 3 columns
m4 <- matrix(c(1, 2, 3, 4, 5, 6), nrow= 2, ncol = 3, byrow = TRUE)
dim(m4)
[1] 2 3
# Multiplying matrix m1 with m4 
m5 <- m1%*%m4
m5
     [,1] [,2] [,3]
[1,]    9   12   15
[2,]   19   26   33
[3,]   29   40   51

Lists

  • In R a list is a generic collection of objects
  • Each component can have different types of data
  • mylist <- list(name1=component1, name2=component2, ...)
# generate a list of object

mylist <- list(A=seq(10, 30,5), student="aadil", idm=diag(3))

#print my list

mylist
$A
[1] 10 15 20 25 30

$student
[1] "aadil"

$idm
     [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    1    0
[3,]    0    0    1
# generate a list of object

mylist <- list(A=seq(10, 30,5), student="aadil", idm=diag(3))

#print vector names
names(mylist)
[1] "A"       "student" "idm"    
# generate a list of object

mylist <- list(A=seq(10, 30,5), student="aadil", idm=diag(3))

#print component A
mylist$A
[1] 10 15 20 25 30

Data Frames

  • An object consisting of several variables
  • Rectangular shape
  • rows representing observational units
  • columns representing the variables
  • Is it matrix?
  • No
  • Matrix include only one type of data
  • Data frame can include different types of data
  • Use the command data.frame or as.data.frame to transform into data frame
# Creating a data frame

# Define a vector 
year <- c( 1991, 1992, 1993, 1994, 1995)

GDP <- c(200, 250, 300, 320, 400)

INV <- c(100, 150, 100, 120, 200)

macro_mat <- cbind(GDP, INV)

rownames(macro_mat) <- year

macro_mat
     GDP INV
1991 200 100
1992 250 150
1993 300 100
1994 320 120
1995 400 200
# create data frame

macro <- as.data.frame(macro_mat)
macro
     GDP INV
1991 200 100
1992 250 150
1993 300 100
1994 320 120
1995 400 200

Accessing Data frame

  • Accessing a single variable
  • Generating a new variable
# Existing data frame
macro
     GDP INV
1991 200 100
1992 250 150
1993 300 100
1994 320 120
1995 400 200
#Accessing the GDP data Only
macro$GDP
[1] 200 250 300 320 400
# Generating a new variable in the data frame
macro$lnGDP <- log(macro$GDP)

# Using `with` function
macro$lnINV <- with(macro,log(INV))

#Using `attach()` function
attach(macro)
macro$total <- GDP+INV
detach(macro)

# Results
macro
     GDP INV    lnGDP    lnINV total
1991 200 100 5.298317 4.605170   300
1992 250 150 5.521461 5.010635   400
1993 300 100 5.703782 4.605170   400
1994 320 120 5.768321 4.787492   440
1995 400 200 5.991465 5.298317   600

Creating a subset of data frame

#Existing data frame
macro
     GDP INV    lnGDP    lnINV total
1991 200 100 5.298317 4.605170   300
1992 250 150 5.521461 5.010635   400
1993 300 100 5.703782 4.605170   400
1994 320 120 5.768321 4.787492   440
1995 400 200 5.991465 5.298317   600
# Taking subset
subdata <- subset(macro, total>400)

subdata
     GDP INV    lnGDP    lnINV total
1994 320 120 5.768321 4.787492   440
1995 400 200 5.991465 5.298317   600

THANKS