Finance Data Project¶


K.D.P. Kulugammana¶

In [2]:
pip install pandas-datareader
Collecting pandas-datareader
  Downloading pandas_datareader-0.10.0-py3-none-any.whl (109 kB)
     ---------------------------------------- 0.0/109.5 kB ? eta -:--:--
     --- ------------------------------------ 10.2/109.5 kB ? eta -:--:--
     --- ------------------------------------ 10.2/109.5 kB ? eta -:--:--
     ---------- -------------------------- 30.7/109.5 kB 262.6 kB/s eta 0:00:01
     ---------- -------------------------- 30.7/109.5 kB 262.6 kB/s eta 0:00:01
     ------------- ----------------------- 41.0/109.5 kB 163.4 kB/s eta 0:00:01
     --------------------------- --------- 81.9/109.5 kB 286.7 kB/s eta 0:00:01
     ------------------------------------ 109.5/109.5 kB 351.8 kB/s eta 0:00:00
Requirement already satisfied: lxml in c:\users\acer\anaconda3\lib\site-packages (from pandas-datareader) (4.9.3)
Requirement already satisfied: pandas>=0.23 in c:\users\acer\anaconda3\lib\site-packages (from pandas-datareader) (2.0.3)
Requirement already satisfied: requests>=2.19.0 in c:\users\acer\anaconda3\lib\site-packages (from pandas-datareader) (2.31.0)
Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\acer\anaconda3\lib\site-packages (from pandas>=0.23->pandas-datareader) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in c:\users\acer\anaconda3\lib\site-packages (from pandas>=0.23->pandas-datareader) (2023.3.post1)
Requirement already satisfied: tzdata>=2022.1 in c:\users\acer\anaconda3\lib\site-packages (from pandas>=0.23->pandas-datareader) (2023.3)
Requirement already satisfied: numpy>=1.21.0 in c:\users\acer\anaconda3\lib\site-packages (from pandas>=0.23->pandas-datareader) (1.24.3)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\acer\anaconda3\lib\site-packages (from requests>=2.19.0->pandas-datareader) (2.0.4)
Requirement already satisfied: idna<4,>=2.5 in c:\users\acer\anaconda3\lib\site-packages (from requests>=2.19.0->pandas-datareader) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\acer\anaconda3\lib\site-packages (from requests>=2.19.0->pandas-datareader) (1.26.16)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\acer\anaconda3\lib\site-packages (from requests>=2.19.0->pandas-datareader) (2023.7.22)
Requirement already satisfied: six>=1.5 in c:\users\acer\anaconda3\lib\site-packages (from python-dateutil>=2.8.2->pandas>=0.23->pandas-datareader) (1.16.0)
Installing collected packages: pandas-datareader
Successfully installed pandas-datareader-0.10.0
Note: you may need to restart the kernel to use updated packages.
In [3]:
from pandas_datareader import data, wb
import pandas as pd
import numpy as np
import datetime
%matplotlib inline
Data¶

Using the pandas datareader, will retrieve stock information for the specified banks.

  • Bank of America
  • CitiGroup
  • Goldman Sachs
  • JPMorgan Chase
  • Morgan Stanley
  • Wells Fargo
In [17]:
pip install yfinance
Requirement already satisfied: yfinance in c:\users\acer\anaconda3\lib\site-packages (0.2.33)
Requirement already satisfied: pandas>=1.3.0 in c:\users\acer\anaconda3\lib\site-packages (from yfinance) (2.1.4)
Requirement already satisfied: numpy>=1.16.5 in c:\users\acer\anaconda3\lib\site-packages (from yfinance) (1.24.3)
Requirement already satisfied: requests>=2.31 in c:\users\acer\anaconda3\lib\site-packages (from yfinance) (2.31.0)
Requirement already satisfied: multitasking>=0.0.7 in c:\users\acer\anaconda3\lib\site-packages (from yfinance) (0.0.11)
Requirement already satisfied: lxml>=4.9.1 in c:\users\acer\anaconda3\lib\site-packages (from yfinance) (4.9.3)
Requirement already satisfied: appdirs>=1.4.4 in c:\users\acer\anaconda3\lib\site-packages (from yfinance) (1.4.4)
Requirement already satisfied: pytz>=2022.5 in c:\users\acer\anaconda3\lib\site-packages (from yfinance) (2023.3.post1)
Requirement already satisfied: frozendict>=2.3.4 in c:\users\acer\anaconda3\lib\site-packages (from yfinance) (2.3.10)
Requirement already satisfied: peewee>=3.16.2 in c:\users\acer\anaconda3\lib\site-packages (from yfinance) (3.17.0)
Requirement already satisfied: beautifulsoup4>=4.11.1 in c:\users\acer\anaconda3\lib\site-packages (from yfinance) (4.12.2)
Requirement already satisfied: html5lib>=1.1 in c:\users\acer\anaconda3\lib\site-packages (from yfinance) (1.1)
Requirement already satisfied: soupsieve>1.2 in c:\users\acer\anaconda3\lib\site-packages (from beautifulsoup4>=4.11.1->yfinance) (2.4)
Requirement already satisfied: six>=1.9 in c:\users\acer\anaconda3\lib\site-packages (from html5lib>=1.1->yfinance) (1.16.0)
Requirement already satisfied: webencodings in c:\users\acer\anaconda3\lib\site-packages (from html5lib>=1.1->yfinance) (0.5.1)
Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\acer\anaconda3\lib\site-packages (from pandas>=1.3.0->yfinance) (2.8.2)
Requirement already satisfied: tzdata>=2022.1 in c:\users\acer\anaconda3\lib\site-packages (from pandas>=1.3.0->yfinance) (2023.3)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\acer\anaconda3\lib\site-packages (from requests>=2.31->yfinance) (2.0.4)
Requirement already satisfied: idna<4,>=2.5 in c:\users\acer\anaconda3\lib\site-packages (from requests>=2.31->yfinance) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\acer\anaconda3\lib\site-packages (from requests>=2.31->yfinance) (1.26.16)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\acer\anaconda3\lib\site-packages (from requests>=2.31->yfinance) (2023.7.22)
Note: you may need to restart the kernel to use updated packages.
In [5]:
start = datetime.datetime(2006, 1, 1)
end = datetime.datetime(2016, 1, 1)
In [22]:
import yfinance as yf

# Fetch Bank of America stock data from Yahoo Finance using yfinance
BAC = yf.download("BAC", start=start, end=end)
C = yf.download("C", start=start, end=end)
GS = yf.download("GS", start=start, end=end)
JPM = yf.download("JPM", start=start, end=end)
MS = yf.download("MS", start=start, end=end)
WFC = yf.download("WFC", start=start, end=end)
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed

List of the ticker symbols (as strings) in alphabetical order.

In [ ]:
import yfinance as yf
# Fetch data for multiple financial institutions from Yahoo Finance using yfinance
tickers = ['BAC', 'C', 'GS', 'JPM', 'MS', 'WFC']
df = yf.download(tickers, start=start, end=end)

# Display the first few rows of the DataFrame
df.tail()

Use pd.concat to concatenate the bank dataframes together to a single data frame called bank_stocks. Set the keys argument equal to the tickers list.

In [23]:
bank_stocks = pd.concat([BAC, C, GS, JPM, MS, WFC],axis=1,keys=tickers)
In [ ]:
bank_stocks.head()
In [30]:
bank_stocks.columns.names = ['Bank Ticker','Stock Info']
In [27]:
bank_stocks.head()
Out[27]:
Bank Ticker BAC C ... MS WFC
Stock Info Open High Low Close Adj Close Volume Open High Low Close ... Low Close Adj Close Volume Open High Low Close Adj Close Volume
Date
2006-01-03 46.919998 47.180000 46.150002 47.080002 32.168701 16296700 490.000000 493.799988 481.100006 492.899994 ... 56.740002 58.310001 33.585461 5377000 31.600000 31.975000 31.195000 31.900000 19.369587 11016400
2006-01-04 47.000000 47.240002 46.450001 46.580002 31.827045 17757900 488.600006 491.000000 483.500000 483.799988 ... 58.349998 58.349998 33.608505 7977800 31.799999 31.820000 31.365000 31.530001 19.144922 10870000
2006-01-05 46.580002 46.830002 46.320000 46.639999 31.868053 14970700 484.399994 487.799988 484.000000 486.200012 ... 58.020000 58.509998 33.700642 5778000 31.500000 31.555000 31.309999 31.495001 19.123680 10158000
2006-01-06 46.799999 46.910000 46.349998 46.570000 31.820234 12599800 488.799988 489.000000 482.000000 486.200012 ... 58.049999 58.570000 33.735214 6889800 31.580000 31.775000 31.385000 31.680000 19.236012 8403800
2006-01-09 46.720001 46.970001 46.360001 46.599998 31.840734 15619400 486.000000 487.399994 483.000000 483.899994 ... 58.619999 59.189999 34.092335 4144500 31.674999 31.825001 31.555000 31.674999 19.232977 5619600

5 rows × 36 columns

Exploratory Data Analysis¶


The max Close price for each bank's stock throughout the time period

In [31]:
bank_stocks.xs(key='Close',axis=1,level='Stock Info').max()
Out[31]:
Bank Ticker
BAC     54.900002
C      564.099976
GS     247.919998
JPM     70.080002
MS      89.300003
WFC     58.520000
dtype: float64

Create a new empty DataFrame called returns. This dataframe will contain the returns for each bank's stock. returns are typically defined by:

$$r_t = \frac{p_t - p_{t-1}}{p_{t-1}} = \frac{p_t}{p_{t-1}} - 1$$
In [32]:
returns = pd.DataFrame()
In [33]:
for tick in tickers:
    returns[tick+' Return'] = bank_stocks[tick]['Close'].pct_change()
returns.head()
Out[33]:
BAC Return C Return GS Return JPM Return MS Return WFC Return
Date
2006-01-03 NaN NaN NaN NaN NaN NaN
2006-01-04 -0.010620 -0.018462 -0.013812 -0.014183 0.000686 -0.011599
2006-01-05 0.001288 0.004961 -0.000393 0.003029 0.002742 -0.001110
2006-01-06 -0.001501 0.000000 0.014169 0.007046 0.001025 0.005874
2006-01-09 0.000644 -0.004731 0.012030 0.016242 0.010586 -0.000158

Pairplot using seaborn of the returns dataframe.

In [34]:
import seaborn as sns
sns.pairplot(returns[1:])
C:\Users\ACER\anaconda3\Lib\site-packages\seaborn\axisgrid.py:118: UserWarning: The figure layout has changed to tight
  self._figure.tight_layout(*args, **kwargs)
Out[34]:
<seaborn.axisgrid.PairGrid at 0x2002738ebd0>
In [35]:
# Worst Drop (4 of them on Inauguration day)
returns.idxmin()
Out[35]:
BAC Return   2009-01-20
C Return     2009-02-27
GS Return    2009-01-20
JPM Return   2009-01-20
MS Return    2008-10-09
WFC Return   2009-01-20
dtype: datetime64[ns]
In [36]:
# Best Single Day Gain
# citigroup stock split in May 2011, but also JPM day after inauguration.
returns.idxmax()
Out[36]:
BAC Return   2009-04-09
C Return     2008-11-24
GS Return    2008-11-24
JPM Return   2009-01-21
MS Return    2008-10-13
WFC Return   2008-07-16
dtype: datetime64[ns]
In [37]:
returns.std() # Citigroup riskiest
Out[37]:
BAC Return    0.036647
C Return      0.038672
GS Return     0.025390
JPM Return    0.027667
MS Return     0.037819
WFC Return    0.030238
dtype: float64
In [40]:
returns.loc['2015-01-01':'2015-12-31'].std()
Out[40]:
BAC Return    0.016163
C Return      0.015289
GS Return     0.014046
JPM Return    0.014017
MS Return     0.016249
WFC Return    0.012591
dtype: float64

Distplot using seaborn of the 2015 returns for Morgan Stanley

In [42]:
sns.distplot(returns.loc['2015-01-01':'2015-12-31']['MS Return'],color='green',bins=100)
C:\Users\ACER\AppData\Local\Temp\ipykernel_14776\1960598734.py:1: UserWarning: 

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

  sns.distplot(returns.loc['2015-01-01':'2015-12-31']['MS Return'],color='green',bins=100)
Out[42]:
<Axes: xlabel='MS Return', ylabel='Density'>

Distplot using seaborn of the 2008 returns for CitiGroup

In [46]:
sns.set_style('whitegrid')
In [47]:
sns.distplot(returns.loc['2008-01-01':'2008-12-31']['C Return'],color='red',bins=100)
C:\Users\ACER\AppData\Local\Temp\ipykernel_14776\1612079653.py:1: UserWarning: 

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

  sns.distplot(returns.loc['2008-01-01':'2008-12-31']['C Return'],color='red',bins=100)
Out[47]:
<Axes: xlabel='C Return', ylabel='Density'>

More Visualization¶


In [49]:
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
%matplotlib inline

# Optional Plotly Method Imports
import plotly
import cufflinks as cf
cf.go_offline()

Line plot showing Close price for each bank for the entire index of time.

In [50]:
for tick in tickers:
    bank_stocks[tick]['Close'].plot(label = tick , figsize = (12,4))
plt.legend()    
Out[50]:
<matplotlib.legend.Legend at 0x2002f4e9cd0>
In [51]:
bank_stocks.xs(key='Close',axis=1,level='Stock Info').plot()
Out[51]:
<Axes: xlabel='Date'>
In [52]:
bank_stocks.xs(key='Close',axis=1,level='Stock Info').iplot()

Moving Averages¶


Analyze the moving averages for these stocks in the year 2008.

The rolling 30 day average against the Close Price for Bank Of America's stock for the year 2008

In [53]:
plt.figure(figsize=(12,6))
BAC['Close'].loc['2008-01-01':'2009-01-01'].rolling(window=30).mean().plot(label='30 Day Avg')
BAC['Close'].loc['2008-01-01':'2009-01-01'].plot(label='BAC CLOSE')
plt.legend()
Out[53]:
<matplotlib.legend.Legend at 0x200317e3650>

Heatmap & Clustermap¶

In [57]:
# Assuming 'Stock Info' is a level in your multi-index DataFrame
correlation_matrix = bank_stocks.xs('Close', level='Stock Info', axis=1).corr()
In [58]:
correlation_matrix
Out[58]:
Bank Ticker BAC C GS JPM MS WFC
Bank Ticker
BAC 1.000000 0.971516 0.550898 0.103874 0.944218 0.008542
C 0.971516 1.000000 0.434123 0.003515 0.933609 -0.068536
GS 0.550898 0.434123 1.000000 0.685286 0.683792 0.499897
JPM 0.103874 0.003515 0.685286 1.000000 0.250427 0.940269
MS 0.944218 0.933609 0.683792 0.250427 1.000000 0.131835
WFC 0.008542 -0.068536 0.499897 0.940269 0.131835 1.000000
In [59]:
sns.heatmap(bank_stocks.xs(key='Close',axis=1,level='Stock Info').corr(),annot=True)
Out[59]:
<Axes: xlabel='Bank Ticker', ylabel='Bank Ticker'>
In [60]:
sns.clustermap(bank_stocks.xs(key='Close',axis=1,level='Stock Info').corr(),annot=True)
Out[60]:
<seaborn.matrix.ClusterGrid at 0x20032bbdad0>

In [61]:
close_corr = bank_stocks.xs('Close',axis=1,level='Stock Info').corr()
close_corr.iplot(kind='heatmap',colorscale='rdylbu')
In [63]:
BAC[['Open', 'High', 'Low', 'Close']].loc['2015-01-01':'2016-01-01'].iplot(kind='candle')

Simple Moving Averages plot of Morgan Stanley for the year 2015.

In [65]:
MS['Close'].loc['2015-01-01':'2016-01-01'].ta_plot(study='sma',periods=[13,21,55],title='Simple Moving Averages')

Bollinger Band Plot for Bank of America for the year 2015.

In [66]:
BAC['Close'].loc['2015-01-01':'2016-01-01'].ta_plot(study='boll')