Stock Data & AR Models

The first step in this tutorial is to get access to WRDS (Wharton Research Data Services) which can be done here if you are a Kelley Student. You can also use Polygon or other stock data APIs

Assuming you are using WRDS, you'll need to get data this way:

!pip install wrds

import wrds
db = wrds.Connection()

Now, let's get data for a specific stock. WRDS allows you to pass in SQL to their API endpoints, so we'll do that. In this case, we'll be looking at GameStop stock.

gamestop_data = db.raw_sql("""
    SELECT
        dlycaldt,
        dlyret as daily_return,
        dlyprc as price,
        dlyvol as volume
    FROM crsp.dsf_v2
    WHERE ticker = 'GME'
    AND dlyret IS NOT NULL
    AND dlycaldt > '2000-01-01'
    ORDER BY dlycaldt
""")

import pandas as pd
df = pd.DataFrame(gamestop_data)

df['date'] = pd.to_datetime(df['dlycaldt'])
df.set_index('date', inplace=True)

Now, to make sure that there are no issues with our data by spot checking:

df

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# extracting the returns
returns = df['daily_return'].dropna()

plt.figure(figsize=(10, 6))
sns.histplot(data=returns, bins=50, kde=True, stat='density')
plt.title('Distribution of GameStop Daily Returns')
plt.xlabel('Daily Return')
plt.ylabel('Density')
plt.grid(True, alpha=0.3)

mean_return = returns.mean()
median_return = returns.median()
plt.axvline(mean_return, color='red', linestyle='--', label=f'Mean: {mean_return:.2%}')
plt.axvline(median_return, color='green', linestyle='--', label=f'Median: {median_return:.2%}')
plt.legend()

plt.gca().xaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{x:.0%}'))
plt.tight_layout()
plt.show()

print("Summary Statistics of Daily Returns:")
print(f"Mean: {returns.mean():.4%}")
print(f"Median: {returns.median():.4%}")
print(f"Standard Deviation: {returns.std():.4%}")
print(f"Min: {returns.min():.4%}")
print(f"Max: {returns.max():.4%}")
print(f"Skewness: {returns.skew():.4f}")
print(f"Kurtosis: {returns.kurtosis():.4f}")

Summary Statistics of Daily Returns:
Mean: 0.1762%
Median: 0.0248%
Standard Deviation: 5.1278%
Min: -60.0000%
Max: 134.8358%
Skewness: 6.6967
Kurtosis: 149.1572

Is there a slight (actually massive) problem with this?

YES

We used raw return, which is NOT additive. We need to use log returns instead.

A 50% gain, and then a 33% loss are equivalent on the log scale as seen below:

log(150/100) + log(100/150) = 0

But not in raw form:

50% - 33% = 17%

Now, let's look at returns over time:

import matplotlib.pyplot as plt

plt.figure(figsize=(12, 6))
plt.plot(df.index, df['daily_return'], label='Daily Returns')
plt.title('GameStop (GME) Daily Returns')
plt.xlabel('Date')
plt.ylabel('Daily Return')
plt.grid(True, alpha=0.3)
plt.legend()

plt.xticks(rotation=45)

plt.tight_layout()

plt.show()

Notice where the volatility is? An option seller's biggest nightmare is to price an option for GameStop like it's 2016, and experience a massive uptick in volatility. This massive swing in volatility in one stock bankrupted several people in late 2021.