As of this writing, there are 766,000 U.S. and international time series from 101 sources.
FRED API
FRED conveniently has an API, which you can use to retrieve the time series data in XML or JSON format.
Python + requests
I used Python to access the API, with the requests library.
In this post I'm going to go over how I gathered all of the time series data from the API.
import requests
api_key = 'xxxxxxx'
Anatomy of the FRED API
- Sources - data sources
- Releases - release of data from a source
- Series (time series)
- Series observation values
- Categories
- Tags
Each of these has an API endpoint.
How do we go about gathering all 766,000 time series, along with their associated data?
It took a bit of strategizing.
Gather all of the series IDs
Each time series has a unique series ID, so the first thing we need to do is to get all of these IDs.
If you look at the API docs, there is no endpoint to simply get all of the series IDs, so you have to get them in a roundabout way.
Sources and releases in FRED
FRED has sources of the time series economic data, which are are just the data source, often from the government, or another example source would be Equifax.
Releases are collections of data made available from a source.
Equifax has one release: Equifax Credit Quality
On the release page, you can see a list of series that are associated with that release, which are the time series that we want.
Game plan to get all of the series IDs from the sources and releases
- Get all source IDs - there is an API endpoint for that!
- Get all release IDs for each source
- Get all series IDs from each release
Step 1: Get all sources
Start with this API endpoint to get all sources.
endpoint = 'https://api.stlouisfed.org/fred/sources'
params = {
'api_key': api_key,
'file_type': 'json'
}
response = requests.get(endpoint,params=params)
The result is JSON data which you can just iterate through to get the source ID for each source.
Step 2: Get all releases for each source
Now we have the list of source IDs, and can use those to go through and get each release ID.
There is an endpoint to get all of the releases for a source ID.
endpoint = 'https://api.stlouisfed.org/fred/source/releases'
source_id = 1
params = {
'source_id': source_id,
'api_key': api_key,
'file_type': 'json'
}
response = requests.get(endpoint,params=params)
In this example we're using a source ID of 1 and making the request to the release endpoint with it.
Then iterate through the response JSON, and get the release IDs from each result.
Step 3: Get all series for each release
Now you have a list of release IDs and can use them to get a list of series for each release.
Here is the endpoint to get all series for a release ID.
endpoint = 'https://api.stlouisfed.org/fred/release/series'
params = {
'release_id': release_id,
'api_key': api_key,
'file_type': 'json'
}
response = requests.get(endpoint,params=params)
After going through each release ID and processing the series results, you will have all of the series IDs.
There should be 766,000 (or whatever number is on the FRED homepage when you access the API) series IDs.
Note the actual number of series IDs might not be exactly 766,000, or whatever the number is when you are collecting the data, but it should be pretty close.
Series observations
Now you have all of the series IDs, and can use those to get the observation values for each time series.
I'm sticking with my Equifax example from earlier, and one of the series is Equifax Subprime Credit Population for New York County, NY.
The series ID is: EQFXSUBPRIME036061.
In this series, the observation values are for the percent of the population in New York County with a credit score below 660.
Most FRED API endpoints have a limit of 1,000 records that you can retrieve with each API call, but this one has a limit of 100,000, so we can specify that with limit=100000
in the request parameters.
I'm not going to go into pagination in this post, but there is an offset
API parameter that you can use to paginate through the results.
endpoint = 'https://api.stlouisfed.org/fred/series/observations'
series_id = 'EQFXSUBPRIME036061'
params = {
'series_id': series_id,
'api_key': api_key,
'file_type': 'json',
'limit': 100000
}
response = requests.get(endpoint,params=params)
And then process the results.
Real-time periods
You might want to specify a real-time period in your API requests.
Economic indicators can change and be updated over time, and a real-time period indicates a period when a particular value is known.
Unemployment rates
One example is unemployment rates, where an initial unemployment rate for a given date or period is calculated, but then more data comes in later from people who were unable to successfully file for unemployment during that period.
The unemployment rate is later revised for the period when there is new data.
During the pandemic a lot of people had trouble filing for unemployment and were delayed when various unemployment systems went down.
Those people might not be counted in the initial unemployment numbers for the period when they became unemployed.
Specify a real-time period in FRED API requests
You can specify a real-time start and a real-time end as request parameters - the default are both set to the current date.
If you want to get all of the observation values, you will need to specify a real-time start and/or end, which are both available API request parameters.
- To get all observation values from the first-available, specify a real-time start of 1776-07-04.
- A real-time end of 9999-12-31 will get you all observations up until the most recent available.
- Read more about real-time periods in the docs.
params = {
'series_id': series_id,
'realtime_start': '1776-07-04'
'api_key': api_key,
'file_type': 'json',
'limit': 100000
}
If you just specify a real-time start date, then it will provide the observations available from that start date up until the current date.
Alternatively you could specify the real-time end date and it would provide observations available from the first available to your specified real-time end.
FRED vs ALFRED
If you use the default of today's date, you will get the most accurate information about the past that is known today, which is FRED.
Specifying a real-time period in the past will get you the data values that were known during that time period, which is considered to be ALFRED or Archival FRED. Read more here.
Categories
One priority I had was to minimize the number of API calls to get all of this data.
To get all categories for each of the time-series, I could have used the endpoint to get categories for a series id, but that would require 766,000 API calls to get categories for each individual series ID.
Instead, I fetched all of the category IDs and then got the series with that category.
The categories in FRED are organized as a tree.
If you look at the API docs, there is an endpoint for getting child categories for each parent category.
Traverse the category tree
The root category has an ID of 0 (zero).
endpoint = 'https://api.stlouisfed.org/fred/category/children'
category_id = 0 #root category ID
params = {
'category_id': category_id,
'api_key': api_key,
'file_type': 'json',
}
response = requests.get(endpoint,params=params)
So you can traverse the tree by starting with the root category ID and getting the children, then getting their children, and so on.
endpoint = 'https://api.stlouisfed.org/fred/category/series'
params = {
'category_id': category_id,
'api_key': api_key,
'file_type': 'json'
}
response = requests.get(endpoint,params=params)
Process the results and you will have all of the category IDs.
Get all series for each category
Now you can iterate through the category IDs and get the series for each category ID.
endpoint = 'https://api.stlouisfed.org/fred/category/series'
params = {
'category_id': category_id,
'api_key': api_key,
'file_type': 'json'
}
response = requests.get(endpoint,params=params)
And now you have all of the categories for each series.
There were about 5-6,000 categories in total.
Tags
Tags were much more straightforward!
There is an endpoint to get all tags.
endpoint = 'https://api.stlouisfed.org/fred/tags'
params = {
'api_key': api_key,
'file_type': 'json'
}
response = requests.get(endpoint,params=params)
From this you end up with a list of all of the tag names.
This endpoint uses the tag name instead of an ID number.
Get all series for each tag
Now you can iterate through the tag names to get the series for each tag name.
endpoint = 'https://api.stlouisfed.org/fred/tags/series'
params = {
'tag_names': tag_name,
'api_key': api_key,
'file_type': 'json'
}
response = requests.get(endpoint,params=params)
And then you have all of the tags for each series.
Thanks for reading!
If you have any questions, reach out to me on Twitter @LVNGD or in the comments.