Stock Data & AR Models

Stock Data & AR Models
One-year autoregressive stock price projection of GameStop, starting from 1/1/2025

The first step in this tutorial is to get access to wrds (Wharton Research Data Services) which can be done here if you are a Kelley Student. You can also use Polygon or other stock data APIs.
Assuming you are using wrds, you'll need to get data this way:

import wrds
db = wrds.Connection()
💡
Do not forget to pip install wrds

Now, let's get data for a specific stock. WRDS allows you to pass in SQL to their API endpoints, so we'll do that. In this case, we'll be looking at GameStop stock (ticker: GME hence the WHERE ticker = 'GME' condition).

gamestop_data = db.raw_sql("""
    SELECT
        dlycaldt,
        dlyret as daily_return,
        dlyprc as price,
        dlyvol as volume
    FROM crsp.dsf_v2
    WHERE ticker = 'GME'
    AND dlyret IS NOT NULL
    AND dlycaldt > '2000-01-01'
    ORDER BY dlycaldt
""")

import pandas as pd
df = pd.DataFrame(gamestop_data)

df['date'] = pd.to_datetime(df['dlycaldt'])
df.set_index('date', inplace=True)

Now, to make sure that there are no issues with our data by viewing the df:

df

Now, let's graph the distribution of daily returns for the security:

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# extracting the returns
returns = df['daily_return'].dropna()

plt.figure(figsize=(10, 6))
sns.histplot(data=returns, bins=50, kde=True, stat='density')
plt.title('Distribution of GameStop Daily Returns')
plt.xlabel('Daily Return')
plt.ylabel('Density')
plt.grid(True, alpha=0.3)

mean_return = returns.mean()
median_return = returns.median()
plt.axvline(mean_return, color='red', linestyle='--', label=f'Mean: {mean_return:.2%}')
plt.axvline(median_return, color='green', linestyle='--', label=f'Median: {median_return:.2%}')
plt.legend()

plt.gca().xaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{x:.0%}'))
plt.tight_layout()
plt.show()

print("Summary Statistics of Daily Returns:")
print(f"Mean: {returns.mean():.4%}")
print(f"Median: {returns.median():.4%}")
print(f"Standard Deviation: {returns.std():.4%}")
print(f"Min: {returns.min():.4%}")
print(f"Max: {returns.max():.4%}")
print(f"Skewness: {returns.skew():.4f}")
print(f"Kurtosis: {returns.kurtosis():.4f}")

Summary Statistics of Daily Returns:
Mean: 0.1762%
Median: 0.0248%
Standard Deviation: 5.1278%
Min: -60.0000%
Max: 134.8358%
Skewness: 6.6967
Kurtosis: 149.1572

But there's a massive problem with what I just did.

We used raw return, which is NOT additive. We need to use log returns instead.
A 50% gain, and then a 33% loss are equivalent on the log scale as seen below:

log(150/100) + log(100/150) = 0

But not in raw form:

0.5 - 0.333 = .167

So if you were to forego using log returns, you'd be wildly overestimating the true returns of a stock, thus making your model worse than useless.
Changing the above code block from returns to log returns...

# Calculate log returns
df['log_return'] = np.log(df['price'] / df['price'].shift(1))
returns = df['log_return'].dropna()

Summary Statistics of Daily Log Returns:
Mean: 0.0071%
Median: 0.0000%
Standard Deviation: 5.7459%
Min: -145.6116%
Max: 85.3716%
Skewness: -2.2947
Kurtosis: 103.5788

Massive difference...
Now, let's look at returns over time:

import matplotlib.pyplot as plt

plt.figure(figsize=(12, 6))
plt.plot(df.index, df['daily_return'], label='Daily Returns')
plt.title('GameStop (GME) Daily Returns')
plt.xlabel('Date')
plt.ylabel('Daily Return')
plt.grid(True, alpha=0.3)
plt.legend()

plt.xticks(rotation=45)

plt.tight_layout()

plt.show()

Notice the volatility spikes? An option seller's biggest nightmare is to price an option for GameStop in the volatility regime of 2016, and experience a massive uptick in volatility. This massive swing in volatility in one security almost bankrupted a hedge fund in 2021.

The Black-Scholes Option Pricing Model

You've seen the above formula a million times, but what does this look like in Python, and what do we need to use the model?
First, calculate the necessary quantities from the dataset above:

returns = df['log_return'].dropna()
current_price = df['price'].iloc[-1]  # Most recent price

# Calculate annualized volatility
daily_std = returns.std()
annual_volatility = daily_std * np.sqrt(252)
print(f"Annualized Volatility: {annual_volatility:.4%}")

Now, set the appropriate parameters:

S0 = current_price  # Current stock price
K = 30  # Strike price ($30 strike price call option)
T = 20 / 252  # one month to expiry 
r = 0.0433  # Risk-free rate - EFFR as of 4/3/25
sigma = annual_volatility  # Volatility
q = 0  # Dividend yield (assumed 0)

A few key points about the parameters:

  • You can change the parameter k at your will - this is the strike price. You could even calculate it for a vector of k by using a for loop or list comprehension.
  • The parameter T is typically expressed in the number of trading days.
  • The parameter r can be either the Effective Fed Funds Rate (EFFR) or the 10-year treasury yield (US10Y)
  • sigma is set to the standard deviation of log returns, which we calculated above
  • q is set to 0 unless the security has a known dividend yield

    Running the code now becomes trivial:
import numpy as np
from scipy.stats import norm

def black_scholes_call(S0, K, T, r, sigma, q=0):
    d1 = (np.log(S0 / K) + (r - q + 0.5 * sigma**2) * T) / (sigma * np.sqrt(T))
    d2 = d1 - sigma * np.sqrt(T)
    call_price = S0 * np.exp(-q * T) * norm.cdf(d1) - K * np.exp(-r * T) * norm.cdf(d2)
    return call_price

bs_price = black_scholes_call(S0, K, T, r, sigma)
print(f"Black-Scholes Call Price: ${bs_price:.2f}")

print(f"\nParameters Used:")
print(f"Current Price (S0): ${S0:.2f}")
print(f"Strike Price (K): ${K:.2f}")
print(f"Time to Expiration: {T:.4f} years (20 trading days)")
print(f"Risk-Free Rate: {r:.2%}")
print(f"Volatility: {sigma:.4%}")

Annualized Volatility: 91.2135%
Black-Scholes Call Price: $3.11

Parameters Used:
Current Price (S0): $30.00
Strike Price (K): $30.00
Time to Expiration: 0.0794 years (20 trading days)
Risk-Free Rate: 4.33%
Volatility: 91.2135%

Interpretation: A 30-day to expiry (20 trading days) call option for GameStop with a strike price of $30 is worth $3.11 today.