Stock Data & AR Models

The first step in this tutorial is to get access to wrds
(Wharton Research Data Services) which can be done here if you are a Kelley Student. You can also use Polygon or other stock data APIs.
Assuming you are using wrds
, you'll need to get data this way:
import wrds
db = wrds.Connection()
pip install wrds
Now, let's get data for a specific stock. WRDS allows you to pass in SQL to their API endpoints, so we'll do that. In this case, we'll be looking at GameStop stock (ticker: GME hence the WHERE ticker = 'GME'
condition).
gamestop_data = db.raw_sql("""
SELECT
dlycaldt,
dlyret as daily_return,
dlyprc as price,
dlyvol as volume
FROM crsp.dsf_v2
WHERE ticker = 'GME'
AND dlyret IS NOT NULL
AND dlycaldt > '2000-01-01'
ORDER BY dlycaldt
""")
import pandas as pd
df = pd.DataFrame(gamestop_data)
df['date'] = pd.to_datetime(df['dlycaldt'])
df.set_index('date', inplace=True)
Now, to make sure that there are no issues with our data by viewing the df
:
df
Now, let's graph the distribution of daily returns for the security:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
# extracting the returns
returns = df['daily_return'].dropna()
plt.figure(figsize=(10, 6))
sns.histplot(data=returns, bins=50, kde=True, stat='density')
plt.title('Distribution of GameStop Daily Returns')
plt.xlabel('Daily Return')
plt.ylabel('Density')
plt.grid(True, alpha=0.3)
mean_return = returns.mean()
median_return = returns.median()
plt.axvline(mean_return, color='red', linestyle='--', label=f'Mean: {mean_return:.2%}')
plt.axvline(median_return, color='green', linestyle='--', label=f'Median: {median_return:.2%}')
plt.legend()
plt.gca().xaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{x:.0%}'))
plt.tight_layout()
plt.show()
print("Summary Statistics of Daily Returns:")
print(f"Mean: {returns.mean():.4%}")
print(f"Median: {returns.median():.4%}")
print(f"Standard Deviation: {returns.std():.4%}")
print(f"Min: {returns.min():.4%}")
print(f"Max: {returns.max():.4%}")
print(f"Skewness: {returns.skew():.4f}")
print(f"Kurtosis: {returns.kurtosis():.4f}")

Summary Statistics of Daily Returns:
Mean: 0.1762%
Median: 0.0248%
Standard Deviation: 5.1278%
Min: -60.0000%
Max: 134.8358%
Skewness: 6.6967
Kurtosis: 149.1572
But there's a massive problem with what I just did.
We used raw return, which is NOT additive. We need to use log returns instead.
A 50% gain, and then a 33% loss are equivalent on the log scale as seen below:
log(150/100) + log(100/150) = 0
But not in raw form:
0.5 - 0.333 = .167
So if you were to forego using log returns, you'd be wildly overestimating the true returns of a stock, thus making your model worse than useless.
Changing the above code block from returns to log returns...
# Calculate log returns
df['log_return'] = np.log(df['price'] / df['price'].shift(1))
returns = df['log_return'].dropna()

Summary Statistics of Daily Log Returns:
Mean: 0.0071%
Median: 0.0000%
Standard Deviation: 5.7459%
Min: -145.6116%
Max: 85.3716%
Skewness: -2.2947
Kurtosis: 103.5788
Massive difference...
Now, let's look at returns over time:
import matplotlib.pyplot as plt
plt.figure(figsize=(12, 6))
plt.plot(df.index, df['daily_return'], label='Daily Returns')
plt.title('GameStop (GME) Daily Returns')
plt.xlabel('Date')
plt.ylabel('Daily Return')
plt.grid(True, alpha=0.3)
plt.legend()
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

Notice the volatility spikes? An option seller's biggest nightmare is to price an option for GameStop in the volatility regime of 2016, and experience a massive uptick in volatility. This massive swing in volatility in one security almost bankrupted a hedge fund in 2021.
The Black-Scholes Option Pricing Model
You've seen the above formula a million times, but what does this look like in Python, and what do we need to use the model?
First, calculate the necessary quantities from the dataset above:
returns = df['log_return'].dropna()
current_price = df['price'].iloc[-1] # Most recent price
# Calculate annualized volatility
daily_std = returns.std()
annual_volatility = daily_std * np.sqrt(252)
print(f"Annualized Volatility: {annual_volatility:.4%}")
Now, set the appropriate parameters:
S0 = current_price # Current stock price
K = 30 # Strike price ($30 strike price call option)
T = 20 / 252 # one month to expiry
r = 0.0433 # Risk-free rate - EFFR as of 4/3/25
sigma = annual_volatility # Volatility
q = 0 # Dividend yield (assumed 0)
A few key points about the parameters:
- You can change the parameter
k
at your will - this is the strike price. You could even calculate it for a vector ofk
by using a for loop or list comprehension. - The parameter
T
is typically expressed in the number of trading days. - The parameter
r
can be either the Effective Fed Funds Rate (EFFR) or the 10-year treasury yield (US10Y) sigma
is set to the standard deviation of log returns, which we calculated aboveq
is set to 0 unless the security has a known dividend yield
Running the code now becomes trivial:
import numpy as np
from scipy.stats import norm
def black_scholes_call(S0, K, T, r, sigma, q=0):
d1 = (np.log(S0 / K) + (r - q + 0.5 * sigma**2) * T) / (sigma * np.sqrt(T))
d2 = d1 - sigma * np.sqrt(T)
call_price = S0 * np.exp(-q * T) * norm.cdf(d1) - K * np.exp(-r * T) * norm.cdf(d2)
return call_price
bs_price = black_scholes_call(S0, K, T, r, sigma)
print(f"Black-Scholes Call Price: ${bs_price:.2f}")
print(f"\nParameters Used:")
print(f"Current Price (S0): ${S0:.2f}")
print(f"Strike Price (K): ${K:.2f}")
print(f"Time to Expiration: {T:.4f} years (20 trading days)")
print(f"Risk-Free Rate: {r:.2%}")
print(f"Volatility: {sigma:.4%}")
Annualized Volatility: 91.2135%
Black-Scholes Call Price: $3.11
Parameters Used:
Current Price (S0): $30.00
Strike Price (K): $30.00
Time to Expiration: 0.0794 years (20 trading days)
Risk-Free Rate: 4.33%
Volatility: 91.2135%
Interpretation: A 30-day to expiry (20 trading days) call option for GameStop with a strike price of $30 is worth $3.11 today.