Downloading Free Price Data for All S&P500 Stocks in 50 Lines of
Downloading Free Price Data for All S&P500 Stocks in 50 Lines of Python Code
Or any other list of publicly traded stocks!
If you’re interested in trading or investing, then you have no-doubt heard about the S&P500 index, which composes of the 500 largest companies listed on the US stock-market exchanges.
Fortunately, using modern python packages such as?pandas
?and?yfinance
?has made it incredibly easy to download the stock-price history for each constituent of the index (or any list of publicly traded stocks for that matter!). So if you’re looking to test your trading or investing ideas against this basket of stocks, then obtaining the price history is surely the start!
In this post I will go through:
How to scrape the list of the up-to-date constituents of the S&P500 index.
How to use the?
yfinance
?library to download the trading history for given list of tickers
If that sounds good to you, then do read on ??
Obtaining the List of S&P500 Stocks
Thanks to our old friend Wikipedia and the excellent?pandas
?package, obtaining the list of S&P500 stocks can be achieved super easily. In fact, this takes a whole three lines of code!
import pandas as pd
URL = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
tickers = pd.read_html(URL)[0]['Symbol'].tolist()
The URL in the code above is the Wikipedia page which has the up-to-date list of S&P500 stocks. By using the?pd.read_html()
?functionality we can let pandas do the hard work for us (without context, saying “l(fā)et pandas do the hard work for us” sounds bad). Since there are two tables of information on this webpage, we must use the index position of 0 to obtain the list we desire.
What about other lists of stocks?!
I got your back. Another fantastic free resource is the?Nasdaq stock screener?where you can filter by market cap. However, you will have to manually filter and download the csv file for this (see the screen capture below):
Once the csv file is downloaded, it’s super easy to get the list of tickers by using pandas:
import pandas as pd
tickers = pd.read_csv("small_cap.csv")['Symbol'].tolist()
This assumes you called the file “small_cap” and run the code from the same directory as the csv file.
How to Use yfinance to Download Stock Data
The?yfinance
?package uses Yahoo’s publicly available APIs to call the data from the Yahoo Finance website. You can use this to grab stock-price data, generic information about the company, its fundamentals and even recent news.
The package surely deserves an entire article to itself, but here I will limit the discussion to getting the price data. For those who are interested in the full features, I highly recommend checking out the?documentation.
OK, how do we download a single ticker?
Easy:
import yfinance as yf
df = yf.download('TSLA')
df.to_csv('TSLA.csv')
This above code will download the entire daily price history for Tesla (TSLA), and store it in a pandas dataframe, the last line saves the dataframe as a csv file in the directory you run the code.
You can modify the above code to look at intraday data too by using the?interval
?keyword argument, for example:
df = yf.download('TSLA', interval = '5m')
the catch with intraday is that you can only get the last 60 days of data ??
Lastly, you can cap the download to consider only a certain period of time, for example, only the last year:
df = yf.download('TSLA', period = '1y')
However, I just take in the entire history and then use pandas to filter to the date-ranges I want when backtesting.
What about a list of tickers?
This is a little more complicated, but nothing too serious. In fact, as the article title claimed, you can do this all within 50 lines of code:
import os
import pandas as pd
import yfinance as yf
S_AND_P_URL = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
def get_ticker_data(tickers: list):
? ? '''
? ? Obtain the price data for all tickers specified. The outcome is a csv file
? ? of price data for each ticker in the data folder.
? ??
? ? Parameters
? ? ----------
? ? tickers : list
? ? ? ? A list of the tickers to download the data for
? ? '''
? ? ? ??
? ? data = yf.download(
? ? ? ? tickers = tickers,
? ? ? ? interval = '1d',
? ? ? ? group_by = 'ticker',
? ? ? ? threads = True,
? ? )
? ??
? ? for ticker in tickers:
? ? ? ??
? ? ? ? try:
? ? ? ? ? ? df = data.loc[:, ticker.upper()].dropna()
? ? ? ? ? ? df.to_csv(f'data/{ticker}.csv', index = True)
? ? ? ? except:
? ? ? ? ? ? print(f'Ticker {ticker} failed to download.')
? ? ? ? ? ??
? ? return
def get_s_and_p_tickers() -> list:
? ? '''
? ? Get a list of all tickers currently in the S&P500 index.
? ? '''
? ? return pd.read_html(S_AND_P_URL)[0]['Symbol'].tolist()
? ??
if __name__ == '__main__':
? ??
? ? # Check if a directory exists called 'data', if not, create it
? ? if not os.path.isdir('data'):
? ? ? ? os.mkdir('data')
? ??
? ? # Download one csv file per ticker and place it in the data directory? ? ? ??
? ? get_ticker_data(get_s_and_p_tickers())
? ??
The above code performs the following:
Checks the directory to see if a folder called “data” exists, if not, the code creates it (and then stores all the stock-price data inside it).
The?
get_s_and_p_tickers()
?function scrapes the list of S&P500 tickers from Wikipedia and outputs as a list.This list is fed into?
get_ticker_data()
?which uses the?yfinance
?package to download the entire daily price history for each ticker, and saves each ticker data as a csv file in the data folder.
Note: The above code uses the?threads
?keyword argument for faster downloading.
There is no requirement to download the S&P tickers, you can modify this to your preference; the?get_ticker_data
?function only requires is a list of tickers you wish to download??
By the way, if you’re interested in backtesting trading/investing ideas over a large sample of data, you may wish to consider my article on using?numba
?to offer some performance benefits over using pure Python code!
Supercharge Backtesting Speed Using Numba
A simple way to squeeze out extra performance in your Python backtests
medium.com
Wrapping Up
In this article a very short guide has been shown to download a list of the latest constituents in the S&P500 index, and then obtain the entire history of daily price-action for each of the tickers.
Thank you for reading, I hope you enjoyed the article! Please feel free to connect with me on?LinkedIn, I’d love to hear if/how you use the code??
If you are thinking getting a medium account, then please consider supporting me and thousands of other writers by?signing up for a membership. For full disclosure, signing up through this link grants me a portion of your membership fee, at no additional cost to you (a win-win for sure!).
Or if you’d like another way to support my content creation, then you could
Because I work a full-time job, go to the gym, trade, and write on Medium, I use a lot of coffee!