Statistical approach of analyzing datasets and summarizing their main characteristics.
Statistics and visualization techniques used to explore data and discover patterns.
main idea is to better understand the data.
Indian sectoral Indices
The dataset comprises three sectoral indices from 2020 to 2025
# Define the list of sector symbols (from Yahoo Finance)
symbols <- c("^NSEBANK", "^CNXIT", "^CNXFMCG")
# Download data for each symbol from Yahoo Finance
# getSymbols() fetches the historical price data
# Here map() is for looping that is downloads stock data for each symbol and collects all the results into a list called data_list
# .x is just a placeholder. Typically, .x is used inside functions like purrr::map() or lapply(), where it represents each element of a list or vector you are looping over.
data_list <- map(
symbols,
~ getSymbols(.x, src = "yahoo", from = "2020-01-01",
to = Sys.Date(), auto.assign = FALSE)
)
# Assign readable names to the downloaded data
# This matches the downloaded list to sector names
names(data_list) <- c("NIFTY_BANK", "NIFTY_IT", "NIFTY_FMCG")
# Convert each stock data into a tidy format
# Extract columns, rename them, add the Date and Sector Index
sector_tbl <- map2(data_list, names(data_list), ~ {
# Convert xts object to tibble (dataframe)
df <- as_tibble(as.data.frame(.x))
# Rename the columns for clarity
colnames(df) <- c("Open", "High", "Low", "Close", "Volume", "Adjusted")
# Add the Date column (index of xts object)
df <- df %>% mutate(Date = index(.x))
# Add a new column "Index" to identify the sector
df %>%
mutate(Index = .y) %>%
select(Date, Index, Close) # Keep only relevant columns
})
# Combine all sectors into one dataframe
# Bind rows of all sector tibbles into a single dataframe
sector_df <- bind_rows(sector_tbl)
# Ensure the Date column is in Date format
sector_df$Date <- as.Date(sector_df$Date)
# View the structure of sector_df
glimpse(sector_df)
Rows: 3,912
Columns: 3
$ Date <date> 2020-01-02, 2020-01-03, 2020-01-06, 2020-01-07, 2020-01-08, 202…
$ Index <chr> "NIFTY_BANK", "NIFTY_BANK", "NIFTY_BANK", "NIFTY_BANK", "NIFTY_B…
$ Close <dbl> 32443.85, 32069.25, 31237.15, 31399.40, 31373.65, 32092.40, 3209…
Interpretation: All sectors show upward trends with some volatility, especially Banking.
Rows: 3,849
Columns: 4
Groups: Index [3]
$ Date <date> 2020-01-03, 2020-01-03, 2020-01-03, 2020-01-06, 2020-01-06, 20…
$ Index <chr> "NIFTY_BANK", "NIFTY_IT", "NIFTY_FMCG", "NIFTY_BANK", "NIFTY_IT…
$ Close <dbl> 32069.25, 15936.60, 30109.25, 31237.15, 15879.80, 29799.30, 313…
$ Return <dbl> -0.0115460901, 0.0144464844, -0.0051856270, -0.0259469619, -0.0…
Interpretation: Returns show deviation from perfect normality, common in financial data.
Df Sum Sq Mean Sq F value Pr(>F)
Index 2 0.0000 1.895e-05 0.095 0.909
Residuals 3846 0.7682 1.997e-04
Interpretation: If p-value < 0.05, we reject the null hypothesis and conclude that mean returns differ across sectors.
# Reshape returns to wide format
returns_wide <- sector_returns %>%
select(Date, Index, Return) %>%
pivot_wider(names_from = Index, values_from = Return)
# Correlation matrix
cor_matrix <- cor(returns_wide %>% select(-Date), use = "complete.obs")
# Correlation plot
corrplot(cor_matrix, method = "color", type = "upper",
addCoef.col = "black", tl.cex = 0.8, number.cex = 0.7,
title = "Correlation Between Sector Returns", mar = c(0,0,1,0))
Interpretation: Positive correlations imply that sectors often move together. Banking and IT sectors show moderate correlation.
Call:
lm(formula = NIFTY_BANK ~ NIFTY_IT + NIFTY_FMCG, data = returns_wide)
Residuals:
Min 1Q Median 3Q Max
-0.083107 -0.006657 -0.000179 0.006313 0.069545
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.336e-05 3.708e-04 -0.144 0.886
NIFTY_IT 2.890e-01 2.798e-02 10.329 <2e-16 ***
NIFTY_FMCG 5.722e-01 3.930e-02 14.559 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.0132 on 1268 degrees of freedom
(18 observations deleted due to missingness)
Multiple R-squared: 0.3215, Adjusted R-squared: 0.3205
F-statistic: 300.5 on 2 and 1268 DF, p-value: < 2.2e-16
Interpretation: - Coefficients tell us how much Bank returns change with IT and FMCG. - If p-values < 0.05, predictors are significant. - Adjusted R-squared shows goodness of fit.
Summary: - All sectors have shown an overall upward trend. - Returns are not perfectly normal. - There are significant differences in sector returns. - Sector returns are moderately correlated. - IT and FMCG sector returns are useful predictors of Bank sector returns.
Conclusion: Investors can use sector correlations and predictive models for portfolio diversification and risk management.