Stock Random Walks

Jan 15, 2018

Introduction

Recently a student in another course came to my office looking for someone “who could explain the Monte Carlo simulation” to her. I was caught a bit off-guard since (a) it was 10 minutes before my geometry class and (b) there is no single Monte Carlo simulation.

After a brief discussion, I found out she wanted to predict stock prices using Monte Carlo simulation, but she thought that the Monte Carlo simulation provided the prediction - she couldn’t say how the actual predictions were being made which is the crucial part.

Aside on Monte Carlo

If you are familiar with Monte Carlo simulations, skip this, but if not it may be worth reading.

A Monte Carlo simulation is a process of using the outcomes of a random process to better understand the probability distribution of the process. The method of creating the outcomes if dependent on the situation (although it should utilize some type of random sampling).

In my Computer Science classes, I have students use a Monte Carlo simulation to calculate $π$ (I usually do this Intro Stats too). This involves choosing $x$ and $y$ values between $- 1$ and $1$ (uniform distribution) and seeing how many $(x, y)$ pairs are inside the unit circle. For a sufficiently large number of points, the ratio of the number inside to the total should be the same as the ratio of area of the unit circle to the area of the surrounding square (where all possible points lie).

In Bayesian modelling, Markov Chain Monte Carlo simulations are run to get a sufficient understanding of the posterior probability distribution. This distribution is usually multivariate and except in particular circumstances doesn’t have a nice analytic definition.

Random Walks

One way that we could use a Monte Carlo simulation to predict stock prices is to use a random walk to generate the predicted stock prices. There are many ways we could do this, some using lots of economics sophistication, but we’ll focus on the simpliest case to make the general process clear.

A random walk is a random process that describes movement from a starting point over a number of steps through a space. For stocks, if we use the current price as the starting point then selecting normally distributed random numbers with mean $0$ , then cumulatively sum the random numbers and add to the base price, we form a random walk. More complex models could add (a) trends, (b) seasonality, (c) other distribution structures or combinations of the above.

We’ll do the simple case $p r i c e a t s t e p t = b a s e p r i c e + \sum_{k = 1}^{t} r n o r m (n, μ = 0, σ = ?)$ where $n$ is the length of the forecast and we’ll use stock data from Johnson and Johnson (NYSE:JNJ).

JNJ Prediction

The Data

I downloaded weekly data for Johnson and Johnson from Yahoo finance. First, we’ll get rid of a couple coloumns and reduce the date range to 2017 and the start of 2018.

library(readr)
jnj_all <- read_csv("../../static/files/jnj-week.csv", 
    col_types = cols(Date = col_date(format = "%Y-%m-%d")))
library(dplyr)

#Get 2017 (and early 2018) data
jnj17 <- jnj_all %>% select(Date, Close, High, Low) %>%
  filter(Date> as.Date("2017-01-01")) %>% arrange(Date)

#plot
library(ggplot2)
ggplot(jnj17, aes(x=Date,y=Close)) + geom_line() + 
  ggtitle("JNJ Stock Price since 1/1/2017")

Single Random Walk

First, we’ll build a single random walk. A Monte Carlo simulation will need lots of random walks, but if we can do one, lots should be easy.

Do simplify things, I’m going to add an “index” variable instead of working explicitly with dates.

jnj17$idx <- 1:length(jnj17$Close)
jnj17$type <- "Actual"

Now, let’s make a random walk to predict the next 25 weeks of stock closing values. We’ll assume that the prices should have normally distributed changes around the most recent price and that the standard deviation will be half the average of the weekly ranges over the last year(ish). This last bit is pretty arbitrary, we could use a standard deviation $1$ , or something else justified by economics.

n<- length(jnj17$Close)
rw <- jnj17$Close[n]+cumsum(rnorm(25, mean = 0, sd = 0.5*mean(jnj17$High - jnj17$Low)))

#build new data.frame
rwData <- data.frame(idx=(n+1):(n+25), Close=rw, type=rep("RW",25))

#table
library(knitr)
kable(rwData)

idx	Close	type
56	144.5784	RW
57	146.2035	RW
58	144.7451	RW
59	142.6243	RW
60	141.8303	RW
61	142.1619	RW
62	140.6649	RW
63	141.3005	RW
64	142.0169	RW
65	142.9144	RW
66	144.5575	RW
67	145.5519	RW
68	148.1797	RW
69	146.6338	RW
70	147.9975	RW
71	147.0684	RW
72	147.9050	RW
73	146.7997	RW
74	145.7820	RW
75	145.8861	RW
76	147.7343	RW
77	146.6477	RW
78	149.7615	RW
79	149.4796	RW
80	146.7478	RW

#plot
rbind.data.frame(select(jnj17, idx,Close,type), rwData) %>%
  ggplot(aes(x=idx,y=Close, col=type))+geom_line()+
  ggtitle("JNJ Actual and Predicted Price")

This is likely a bad prediction at any given index. The hope is that lots of similarly constructed predictions will give insight into the probability distribution of the future JNJ stock prices. This means we’ll need lots of random walks.

Multiple Random Walks

We just need to replicate what we did previously for an arbitrary number of times. To automate this, we’ll make a function to give a data frame with our random walk data, this will work with any similarly structured data (other stock data from Yahoo finance).

randWalk <- function(typeName, len, obsData){
    n<- length(obsData$Close)
    rw <- obsData$Close[n]+cumsum(rnorm(len, mean = 0, sd = 0.5*mean(obsData$High - obsData$Low)))

    #build new data.frame
    rwData <- data.frame(idx=(n+1):(n+len), Close=rw, type = rep(typeName,len))
    return(rwData)
}

#doing 7 random walks because of the colorblind palette
rwList <- lapply(1:7, function(x) {randWalk(paste("RW",x,sep=""),25,jnj17)})

rwDF <- as.data.frame(bind_rows(rwList))
jnjPred <- rbind.data.frame(select(jnj17,idx,Close,type), rwDF)

#store colorblind palette
cbbPalette <- c("#000000", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")

ggplot(jnjPred, aes(x=idx,y=Close,col=type)) + 
  geom_line() + ggtitle("JNJ Predictions with Multiple Random Walks") + 
  scale_color_manual(values=cbbPalette)

The collection of random walks are a random sample of all JNJ stock price predictions for the next 25 weeks. Because of how we build our predictions, we clearly see oscilation about the most recent actual close. By using a more informative prediction process, we may see more informative predictions but this would just alter our randWalk function. We can use this to clean up the graph a bit, we can plot the mean of the random walks and their range at each index.

rwDFreduced <- group_by(rwDF, idx) %>% 
  summarise(meanPred=mean(Close), high = max(Close), low=min(Close)) %>% 
  mutate(Close = meanPred, type="Prediction")

ggplot(jnj17, aes(x=idx,y=Close,col=type)) + geom_line() +
  geom_ribbon(data=rwDFreduced, aes(x=idx,ymin=low,ymax=high), fill="grey70", inherit.aes = FALSE) + 
  geom_line(data=rwDFreduced, aes(x=idx,y=Close, col=type)) + 
  ggtitle("JNJ 7 Random Walks Prediction Ribbon")

Due to the lack of any economic theory, I wouldn’t put much weight in this prediction but it would be easy to incorporate that into the random walk and the Monte Carlo simulation won’t change. Additionally, each time this code is re-run, the above ribbon can change noticeably.

With the ribbon, there’s no need to limit ourselves to 7 random walks. Let’s do more for a real Monte Carlo simulation (and maybe a better, or at least more stable, prediction).

rwList <- lapply(1:100, function(x) {randWalk(paste("RW",x,sep=""),25,jnj17)})

rwDF <- as.data.frame(bind_rows(rwList))
rwDFreduced <- group_by(rwDF, idx) %>% 
  summarise(meanPred=mean(Close), high = max(Close), low=min(Close)) %>% 
  mutate(Close = meanPred, type="Prediction")

ggplot(jnj17, aes(x=idx,y=Close,col=type)) + geom_line() +
  geom_ribbon(data=rwDFreduced, aes(x=idx,ymin=low,ymax=high), fill="grey70", inherit.aes = FALSE) + 
  geom_line(data=rwDFreduced, aes(x=idx,y=Close, col=type)) + 
  ggtitle("JNJ 100 Random Walks Prediction Ribbon")

With so many random walks, it’s no surprise the prediction line (the mean of the random walks) is nearly flat, this is the Central Limit Theorem in action.

R ggplot forecast