Data Types

Md Zulquar Nain

Objects

  • Most statistical software (e.g., SPSS, Stata) operates on datasets, which consist of rows of observations and columns of variables
  • R is an object-oriented programming language (like Python,JavaScript).
  • So, what is an object?
  • In computer language - complex definition
  • Anything to which you assigned a value

Important

Objects are like boxes in which we can put things: data, functions, and even other objects. Ben Skinner

Data Objects

  • Data usually obtained from external sources
  • Collected by researchers using survey
  • In R import those data files into R Objects

Data Types

  • Use function typeof() to check the type of data
  • the function typeof() returns the Data type or data structure
  • Data Types are
    • Double
    • Integer
    • Complex
    • Logical
    • Character
    • Factor
    • Date and time

Data Types

  • Double
    • Doubles are numbers like 2.0, 2.2, 2.999
    • May or may not include decimals
    • Mostly used to represent continuous variables like wage, height, age
    • In R by default all numbers are Double
x <- 10.5
is.double(x)
[1] TRUE

Data Types

  • Integer
    • Integers are natural numbers
    • Mostly used for counting variables
    • However, in R by default treated as Double
x <- 6
typeof(x)
[1] "double"
x <- as.integer(9)
typeof(x)
[1] "integer"

Data Types

  • Complex
    • Complex like \(x+yi\)
x <- -1
typeof(x)
[1] "double"
sqrt(x)
[1] NaN
# Complex
y <--1+(0+0i)
sqrt(y)
[1] 0+1i

Data Types

  • Logical
    • Logical: a variable of logical type data has values like TRUE or FALSE
    • Usually used to represent conditional statements
x <- 20
y <- 15
a <- x > y
a
[1] TRUE
typeof(a)
[1] "logical"

Data Types

  • Character
    • Characters represents a string values in R
    • Usually a string or collection of characters are kept indouble quotes
    • Anything in a double quote is considered a string in R
x <- "You are smart"
print(x)
[1] "You are smart"
y <- "5"
typeof(y)
[1] "character"

Data Types

  • Factor
    • Factor represents categorical data
    • For example gender(Male, Female), employed and unemployed
    • Also handy in case of panel and longitudinal data
    • Factor objects can be created from character or numeric objects
# Created from Characters
cat <- c("A","B","C")
# Use functions to convert into factor type
cat=as.factor(cat)
cat 
[1] A B C
Levels: A B C
levels(cat)
[1] "A" "B" "C"

Data Types

  • Date and Time
    • We need quite often especially when dealing with time series models
    • Function as.Date is used to create Date and Time objects
    • See help(as.Date)
today <-"25-03-2023" 
today <- as.Date(today,"%d-%m-%Y")
today
[1] "2023-03-25"
today1 <- Sys.Date()
today1
[1] "2024-02-15"
today2 <- format(today1, "%d %B %Y")
today2
[1] "15 February 2024"

Missing Values in R

  • In R missing data is represented by NA meaning Not Available
  • Another symbol appears in R is NaN meaning Not a Number
  • NULL in R represent an object with ZERO length
  • -Inf and Inf represents negative and positive infinity
# Create a vector using function `c()`
ownd <- c("10","30","missing")
# converting to double
ownd <- as.double(ownd)
  # check for the missing value
is.na(ownd)
[1] FALSE FALSE  TRUE

THANKS