Data visualization is the process to transform the information (data) into a visual presentation for example graph.
Data visualization is the process to transform the information (data) into a visual presentation for example graph.
Why visualisation/Plotting
?
Data visualization is the process to transform the information (data) into a visual presentation for example graph.
Why visualisation/Plotting
?
An image speaks louder than words
Data visualization is the process to transform the information (data) into a visual presentation for example graph.
Why visualisation/Plotting
?
An image speaks louder than words
Data visualizations make data easier for the human brain to understand
Data visualization is the process to transform the information (data) into a visual presentation for example graph.
Why visualisation/Plotting
?
An image speaks louder than words
Data visualizations make data easier for the human brain to understand
visualization also makes it easier to detect patterns, trends, and outliers in groups of data
Data visualization is the process to transform the information (data) into a visual presentation for example graph.
Why visualisation/Plotting
?
An image speaks louder than words
Data visualizations make data easier for the human brain to understand
visualization also makes it easier to detect patterns, trends, and outliers in groups of data
Good data visualizations should place meaning into complicated datasets so that their message is clear and concise.
Data visualization is the process to transform the information (data) into a visual presentation for example graph.
Why visualisation/Plotting
?
An image speaks louder than words
Data visualizations make data easier for the human brain to understand
visualization also makes it easier to detect patterns, trends, and outliers in groups of data
Good data visualizations should place meaning into complicated datasets so that their message is clear and concise.
R
R
Visualisation / Plotting is one of greatest strength of R
R
Visualisation / Plotting is one of greatest strength of R
Limited scope of this course
R
Visualisation / Plotting is one of greatest strength of R
Limited scope of this course
R
Visualisation / Plotting is one of greatest strength of R
Limited scope of this course
Basic plot function in R
is plot()
Each graph function has - number of options
R
Visualisation / Plotting is one of greatest strength of R
Limited scope of this course
Basic plot function in R
is plot()
Each graph function has - number of options
grpahical parameters par()
Data
Materials to visualise that is data. No data no visualisation!
Data
Materials to visualise that is data. No data no visualisation!
Mapping: Contextual relationship
Mapping depends on what YOU want to show!
Import
Import
We have learned in previous lectures
Import
We have learned in previous lectures
Import
We have learned in previous lectures
We will learn!
Import
We have learned in previous lectures
We will learn! A basic graph
Import
We have learned in previous lectures
We will learn! A basic graph
plot(cars$speed, cars$dist, pch = 19, col = 'red', las = 1, xlab="speed", ylab="Distance", main = "Speed Vs Distance")
R
R
R
Type data()
and executeR
R
Type data()
and executedata(cars)
R
R
Type data()
and executedata(cars)
R
is plot()
R
R
Type data()
and executedata(cars)
R
is plot()
data(cars)
data(cars)
data(cars)
Do you remember? head()
; tail()
; nrow()
data(cars)
Do you remember? head()
; tail()
; nrow()
head(cars, 2)
## speed dist## 1 4 2## 2 4 10
tail(cars, 2)
## speed dist## 49 24 120## 50 25 85
data(cars)
Do you remember? head()
; tail()
; nrow()
head(cars, 2)
## speed dist## 1 4 2## 2 4 10
tail(cars, 2)
## speed dist## 49 24 120## 50 25 85
ncol(cars)
## [1] 2
str(cars)
## 'data.frame': 50 obs. of 2 variables:## $ speed: num 4 4 7 7 8 9 10 10 10 11 ...## $ dist : num 2 10 4 22 16 10 18 26 34 17 ...
Plot()
data(cars)
contains two variables speed
and distance
Plot()
data(cars)
contains two variables speed
and distance
Plotting speed and distance
Plot()
data(cars)
contains two variables speed
and distance
Plotting speed and distance
plot(cars$speed,cars$dist)
Plot()
data(cars)
contains two variables speed
and distance
Plotting speed and distance
plot(cars$speed,cars$dist)
Here, cars$speed
is for x-axis
and cars$dist
is for y-axis
In cars$speed
, cars
is name of the data file and speed
is variable name
plot()
is command to plot
Plot()
Plot()
# output-location: fragmentx <- seq(-pi,pi,0.1)plot(x, sin(x))
Plot()
# output-location: fragmentx <- seq(-pi,pi,0.1)plot(x, sin(x))
Here x
is for x-axis
(a generated data using seq
command )
sin(x)
is for y-axis
Plot()
plot(cars$speed, cars$dist, xlab = "Speed", ylab = "Distance", main = "Speed Vs Distance" )
Plot()
plot(cars$speed, cars$dist, xlab = "Speed", ylab = "Distance", main = "Speed Vs Distance" )
Here to add the label, we have added the highlighted codes.
Names of the label should always be in ""
Plot()
type
"p" - points"l" - lines"b" - both points and lines"c" - empty points joined by lines"o" - overplotted points and lines"s" and "S" - stair steps"h" - histogram-like vertical lines"n" - does not produce any points or lines
plot(x, sin(x),main="The Sine Function",ylab="sin(x)",type="l" )
Plot()
col="color name"
Plot()
col="color name"
plot(cars$speed, cars$dist, xlab = "Speed", ylab = "Distance", main = "Speed Vs Distance", col="red" )
Plot()
col="color name"
plot(cars$speed, cars$dist, xlab = "Speed", ylab = "Distance", main = "Speed Vs Distance", col="red" )
Plot()
col="color name"
plot(cars$speed, cars$dist, xlab = "Speed", ylab = "Distance", main = "Speed Vs Distance", col="red" )
AR
contains data of average rainfall in a day of a week.AR
contains data of average rainfall in a day of a week.AR <- c( 12, 15, 11, 16, 18, 15, 14 )
barplot(AR)
AR
contains data of average rainfall in a day of a week.AR <- c( 12, 15, 11, 16, 18, 15, 14 )
barplot(AR)
There are many other parameters can be added to barplot()
Use ?barplot()
to explore
barplot(AR,main = "Average rainfall in a Day",xlab = "Centimeters (cm)",ylab = "Day",names.arg = c("Mon", "Tues", "Wed", "Thu", "Fri", "Sat", "Sun"),border="blue",col="red",density=20,horiz = TRUE,cex.names = .8)#To change the size of label
barplot(AR,main = "Average rainfall in a Day",xlab = "Centimeters (cm)",ylab = "Day",names.arg = c("Mon", "Tues", "Wed", "Thu", "Fri", "Sat", "Sun"),border="blue",col="red",density=20,horiz = TRUE,cex.names = .8)#To change the size of label
See the highlighted codes
Output in next slide
MM
## [1] 17 16 18 17 18 19 18 16 18 18
MM
## [1] 17 16 18 17 18 19 18 16 18 18
MM
## [1] 17 16 18 17 18 19 18 16 18 18
MM
## [1] 17 16 18 17 18 19 18 16 18 18
Does it serve pupose?
No
First convert the data into categorical representation using table()
Check out ?table()
table(MM)
## MM## 16 17 18 19 ## 2 2 5 1
First convert the data into categorical representation using table()
Check out ?table()
table(MM)
## MM## 16 17 18 19 ## 2 2 5 1
barplot(table(MM),main="Marks of 10 Students",xlab="Marks",ylab="Count",border="blue",col="red",density=10)
First convert the data into categorical representation using table()
Check out ?table()
table(MM)
## MM## 16 17 18 19 ## 2 2 5 1
barplot(table(MM),main="Marks of 10 Students",xlab="Marks",ylab="Count",border="blue",col="red",density=10)
print(titanic_surv)
## train.Pclass## train.Survived 1 2 3## 0 80 97 372## 1 136 87 119
Here, 1, 2, and 3 represents 1st, 2nd and 3rd class in the train
0 and 1 is for the passenger did not survived and survived respectively in the Titanic mishap
barplot(titanic_surv, main = "Survival of Each Class", xlab = "Class", ylab = "No of Passenger", col = c("red","green"))legend("topleft", c("Not survived","Survived"), fill = c("red","green"))
hist()
hist()
Histogram is a visual representation of the distribution of a dataset
We will use the data(AirPassengers)
in built in R
Explore the function hist()
using ?hist()
A Basic Histogram
put the name of your dataset in between the parentheses like hist(AirPassengers)
Histogram for a specific variable can be drawn as hist(datasetName$VariableName)
hist()
Histogram is a visual representation of the distribution of a dataset
We will use the data(AirPassengers)
in built in R
Explore the function hist()
using ?hist()
A Basic Histogram
put the name of your dataset in between the parentheses like hist(AirPassengers)
Histogram for a specific variable can be drawn as hist(datasetName$VariableName)
hist(AirPassengers)
hist()
hist()
hist(AirPassengers, main="Histogram for Air Passengers", xlab="Passengers", border="blue", col="green", xlim=c(100,700), las=1, breaks=5)
hist()
hist()
hist(AirPassengers, main="Histogram for Air Passengers", xlab="Passengers", border="blue", col="green", xlim=c(100,700), las=1, breaks=5)
xlim=c()
& ylim=c()
fixes the range of X
and Y
axes
Inside c()
sets starting and ending points
las=1
rotates the label of Y-axis
Checkout ?las
breaks
is for the size/width of Histogram BINS
Chechout ?breaks
Pie Chart
Pie Chart
Pie chart is drawn using the pie()
function in R
programming
This function takes in a vector of non-negative numbers.
Basic Syntax of is pie(x, labels, radius, main, col, clockwise)
x is a vector containing the numeric values used in the pie chart.
labels is used to give description to the slices.
radius indicates the radius of the circle of the pie chart.(value between −1 and +1).
main indicates the title of the chart.
col indicates the color palette.
clockwise is a logical value indicating if the slices are drawn clockwise or anti clockwise.
?pie()
Pie Chart
Pie Chart
In pie()
, scores$Obt.Marks
is the vector of positive numbers for which pie-chart
is drawn
scores$Subjects
is the labels
Note: scores$Subjects
shows that Subjects
variable has been selected fromscores
dataset
Pie Chart
pie(scores$Obt.Marks, scores$Subjects)
print(scores)
## Subjects Obt.Marks## 1 Math 70## 2 Eng 80## 3 Urdu 60## 4 Sc 80## 5 Soc. 90
Pie Chart
piepercent<- round(100*(scores$Obt.Marks)/sum((scores$Obt.Marks)), 1) # %age calculationpie(scores$Obt.Marks, labels = piepercent, # Labels main = "Scores pie chart", # Title of chart col = rainbow(length(scores$Obt.Marks))) # Color of chartlegend("topright", # legend position scores$Subjects, # legend labels cex = 0.8, # size of legend texts fill = rainbow(length(scores$Obt.Marks))) # legend color
In case of more than two variables and to find the correlation between one variable versus the remaining ones
we use scatterplot matrix. pairs()
function creates matrices of scatterplots.
pairs(formula, data)
We will use data(mtcars)
available within R
; explor ?mtcars
## mpg cyl disp hp drat wt qsec vs am gear carb## Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4## Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
pairs(~wt+mpg+disp+cyl,data = mtcars, main = "Scatterplot Matrix")
par()
For drawing multiple graphs in a single plot- use par()
Checkout ?par()
par()
For drawing multiple graphs in a single plot- use par()
Checkout ?par()
par()
For drawing multiple graphs in a single plot- use par()
Checkout ?par()
par(mfrow=c(1,2)) # set the plotting area into a 1*2 array (1 Row and 2 Col) barplot(scores$Obt.Marks, names.arg = scores$Subjects, main="Barplot", las=2) # Bar plotpie(scores$Obt.Marks, scores$Subjects, main="Piechart", radius=1) # Pie Chart
par()
For drawing multiple graphs in a single plot- use par()
Checkout ?par()
par(mfrow=c(1,2)) # set the plotting area into a 12 array (1 Row and 2 Col) barplot(scores$Obt.Marks, names.arg = scores$Subjects, main="Barplot", las=2) # Bar plotpie(scores$Obt.Marks, scores$Subjects, main="Piechart", radius=1) # Pie Chart
mfrow
used to specify the number of subplot we need.par()
For drawing multiple graphs in a single plot- use par()
Checkout ?par()
par(mfrow=c(1,2)) # set the plotting area into a 12 array (1 Row and 2 Col) barplot(scores$Obt.Marks, names.arg = scores$Subjects, main="Barplot", las=2) # Bar plotpie(scores$Obt.Marks, scores$Subjects, main="Piechart", radius=1) # Pie Chart
Here parameter mfrow
used to specify the number of subplot we need.
It takes in a vector of form c(m, n)
which divides the given plot into m*n
array of subplots.
par()
For drawing multiple graphs in a single plot- use par()
Checkout ?par()
par(mfrow=c(1,2)) # set the plotting area into a 12 array (1 Row and 2 Col) barplot(scores$Obt.Marks, names.arg = scores$Subjects, main="Barplot", las=2) # Bar plotpie(scores$Obt.Marks, scores$Subjects, main="Piechart", radius=1) # Pie Chart
Here parameter mfrow
used to specify the number of subplot we need.
It takes in a vector of form c(m, n)
which divides the given plot into m*n
array of subplots.
For the above example, to plot the two graphs side by side, we have m=1
and n=2
.
par()
- Explore it for more control parametersAll types of graphs (bar plot, pie chart, histogram)
etc. can be saved.
Graphs can be saved as bitmap image( i.e. .png, jpeg, tiff etc) which are fixed size
Graphs can be also saved as vector image (.pdf, .eps) which are easily resizable
We will use the temperature
column of built-in dataset airquality
.jpeg
jpeg(file="saving_plot1.jpeg") # File namehist(Temp, col="darkgreen")dev.off() # TO call off
Saved Graph
.jpeg
jpeg(file="saving_plot1.jpeg") # File namehist(Temp, col="darkgreen")dev.off() # TO call off
Saved Graph
Image will be saved in working/default directory
we need to call the function dev.off()
after all the plotting, to save the file and return control to the screen
The resolution of the image by default will be 480×480 pixel.
.png
png(file= "saving_plot2.png",width=600, height=350)hist(Temp, col="gold")dev.off()
Saved Graph
You can specify the full path tp save the image at desired plcae (as above)
You can also specify the resolution at desired level using arguments width
and height
.bmp
Size of the plot can be specified in different units such as in inch, cm or mm with the argument units
and ppi with res
.
The following code saves a bmp file of size 6x4 inch
and 100 ppi
.
bmp(file="saving_plot3.bmp",width=6, height=4, units="in", res=100)hist(Temp, col="steelblue")dev.off()
Saved Graph
.pdf
bmp(file="saving_plot4.pdf",width=6, height=4, units="in", res=100)hist(Temp, col="violet")dev.off()
.pdf
bmp(file="saving_plot4.pdf",width=6, height=4, units="in", res=100)hist(Temp, col="violet")dev.off()
Saved Graph
R
R
R
This presentation is not exhaustive.
Adopt learning by doing approach
R
This presentation is not exhaustive.
Adopt learning by doing approach
Make use of Google
and R Documentation
R
This presentation is not exhaustive.
Adopt learning by doing approach
Make use of Google
and R Documentation
It was about basic R
plotting
R
This presentation is not exhaustive.
Adopt learning by doing approach
Make use of Google
and R Documentation
It was about basic R
plotting
Plotting has become more exciting and easy using package-ggplot2
in R
--
hist(mtcars$mpg, breaks = 10,col="red")
hist(mtcars$mpg, breaks = 10,col="red")
Histograms can be a poor method for determining the shape of a distribution because it is so strongly affected by the number of bins used.
Kernal density plots are usually a much more effective way to view the distribution of a variable
Histograms can be a poor method for determining the shape of a distribution because it is so strongly affected by the number of bins used.
Kernal density plots are usually a much more effective way to view the distribution of a variable
d <- density(mtcars$mpg) # returns the density dataplot(d) # plots the results
Histograms can be a poor method for determining the shape of a distribution because it is so strongly affected by the number of bins used.
Kernal density plots are usually a much more effective way to view the distribution of a variable
d <- density(mtcars$mpg) # returns the density dataplot(d) # plots the results
dotchart(mtcars$mpg, labels=row.names(mtcars), cex=.7, main="Gas Milage 4 Car Models", xlab="Miles Per Gallon")
dotchart(mtcars$mpg, labels=row.names(mtcars), cex=.7, main="Gas Milage 4 Car Models", xlab="Miles Per Gallon")
Dotplot: Grouped Sorted and Colored
Sort by mpg, group and color by cylinder
Dotplot: Grouped Sorted and Colored
Sort by mpg, group and color by cylinder
x <- mtcars[order(mtcars$mpg),] # sort by mpgx$cyl <- factor(x$cyl) # it must be a factorx$color[x$cyl==4] <- "red"x$color[x$cyl==6] <- "blue"x$color[x$cyl==8] <- "darkgreen"dotchart(x$mpg,labels=row.names(x),cex=.7,groups= x$cyl, main="Gas Milage for Car Models\ngrouped by cylinder", xlab="Miles Per Gallon", gcolor="black", color=x$color)
Dotplot: Grouped Sorted and Colored
Sort by mpg, group and color by cylinder
x <- mtcars[order(mtcars$mpg),] # sort by mpgx$cyl <- factor(x$cyl) # it must be a factorx$color[x$cyl==4] <- "red"x$color[x$cyl==6] <- "blue"x$color[x$cyl==8] <- "darkgreen"dotchart(x$mpg,labels=row.names(x),cex=.7,groups= x$cyl, main="Gas Milage for Car Models\ngrouped by cylinder", xlab="Miles Per Gallon", gcolor="black", color=x$color)
x <- 1:10 # Create example datay1 <- c(3, 1, 5, 2, 3, 8, 4, 7, 6, 9)plot(x, y1, type = "l", main = "This is my Line Plot", xlab = "My X-Values", ylab = "My Y-Values")
x <- 1:10 # Create example datay1 <- c(3, 1, 5, 2, 3, 8, 4, 7, 6, 9)plot(x, y1, type = "l", main = "This is my Line Plot", xlab = "My X-Values", ylab = "My Y-Values")
x <- 1:10 # Create example datay1 <- c(3, 1, 5, 2, 3, 8, 4, 7, 6, 9)plot(x, y1, type = "l", lwd= 5, main = "This is my Line Plot", xlab = "My X-Values", ylab = "My Y-Values")
x <- 1:10 # Create example datay1 <- c(3, 1, 5, 2, 3, 8, 4, 7, 6, 9)plot(x, y1, type = "l", lwd= 5, main = "This is my Line Plot", xlab = "My X-Values", ylab = "My Y-Values")
x <- 1:10 # Create example datay1 <- c(3, 1, 5, 2, 3, 8, 4, 7, 6, 9)y2 <- c(5, 1, 4, 6, 2, 3, 7, 8, 2, 8) y3 <- c(3, 3, 3, 3, 4, 4, 5, 5, 7, 7)plot(x, y1, type = "l", lwd= 5, main = "This is my Line Plot", xlab = "My X-Values", ylab = "My Y-Values")lines(x, y2, type = "l", col = "red")lines(x, y3, type = "l", col = "green")legend("topleft", legend = c("Line y1", "Line y2", "Line y3"), col = c("black", "red", "green"), lty = 1)
x <- 1:10 # Create example datay1 <- c(3, 1, 5, 2, 3, 8, 4, 7, 6, 9)y2 <- c(5, 1, 4, 6, 2, 3, 7, 8, 2, 8) y3 <- c(3, 3, 3, 3, 4, 4, 5, 5, 7, 7)plot(x, y1, type = "l", lwd= 5, main = "This is my Line Plot", xlab = "My X-Values", ylab = "My Y-Values")lines(x, y2, type = "l", col = "red")lines(x, y3, type = "l", col = "green")legend("topleft", legend = c("Line y1", "Line y2", "Line y3"), col = c("black", "red", "green"), lty = 1)
box plot gives us a visual representation of the quartiles within numeric data
box plot shows the median (second quartile), first and third quartile, minimum, and maximum
box plot gives us a visual representation of the quartiles within numeric data
box plot shows the median (second quartile), first and third quartile, minimum, and maximum
boxplot(mpg~cyl,data=mtcars, main="Car Milage Data", xlab="Number of Cylinders", ylab="Miles Per Gallon")
boxplot(mpg~cyl,data=mtcars, main="Car Milage Data", xlab="Number of Cylinders", ylab="Miles Per Gallon")
boxplot(mpg~cyl,data=mtcars, col= c("cyan","blue","steelblue"), main="Car Milage Data", xlab="Number of Cylinders", ylab="Miles Per Gallon")
boxplot(mpg~cyl,data=mtcars, col= c("cyan","blue","steelblue"), main="Car Milage Data", xlab="Number of Cylinders", ylab="Miles Per Gallon")
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |