Deploying a Flask app on AWS Lambda with Zappa

cloud grid

I recently needed to deploy a Flask application on AWS Lambda, which is a "serverless" web architecture.

Serverless, as opposed to setting up a regular HTTP web server with something like Amazon's EC2.

For those unfamiliar: AWS = Amazon Web Services.

Why serverless?

One big reason for going serverless can be cost.

There are still servers involved in a serverless infrastructure such as AWS Lambda, but you only have to pay for the time you use.

Using a regular web server like EC2 usually means setting it up to run 24/7, just waiting for HTTP requests to come in.

But your AWS Lambda function will only run when a request comes in, and you won't pay for all of the other time.

Flask + AWS Lambda + API Gateway

I used Zappa, which makes deployment a breeze!

Zappa is a Python package that packages up your application and your local virtual environment, and deploys it to AWS Lambda. It does a lot more than just that - you can read more in the docs.

After initializing Zappa and configuring a few things in a zappa_settings.json file, deployment could be as simple as:

zappa deploy dev

So why am I writing this post if it was a breeze?

Configuring AWS Lambda + API Gateway

Well, because first you need to configure several things in AWS, which can be a major headache if you're new to AWS, which I basically was.

Setting all of this up involved several AWS components that needed to be able to work together in harmony.

  • Lambda
  • API Gateway
  • Relational Database Service, or RDS (I'm using Postgres)
  • S3
  • Virtual Private Cloud, or VPC

API Gateway

API Gateway allows you to create an HTTP endpoint for your Lambda function, allowing you to route HTTP requests to Lambda.

Zappa sets up most of this automatically.

For example, with the above command zappa deploy dev the endpoint is dev.

What this post is NOT

In this post I'm not going to write out every step I did, because I used a couple of great blog posts to get through parts of the process.

The first thing you'll obviously need to do is add Zappa to your project, and the docs are pretty straightforward for getting started with that.

Also there will be a Part II to this post, because I'm not going to get into setting up domains on AWS Route 53 here.

  • I needed to set up the Flask app to run on a subdomain of a Squarespace site, and I will go through the specifics of what I had to do for that in Part II, which includes transferring all of the DNS records for that site to Route 53.

Stay tuned for that!

Back to AWS Lambda

This post will mainly focus on getting everything configured in AWS to deploy your application.

The first post that got me most of the way to a functioning deployment is from this blog:

The post goes through deployment of a Hello, World! Flask app on AWS Lambda and covers the basics that will get you up and running:

  1. Creating AWS access keys.
  2. Configuring AWS users, groups and roles with permissions needed for Zappa to work.
  3. Initializing and configuring Zappa.
  4. Deploying the app.

But this basic app doesn't use a database or anything.

So I used the next post to set up AWS RDS:

I'm using Postgres, and while the above post is using MySQL, it's pretty similar to set up either one through the AWS CLI with the following command:

aws rds create-db-instance \
  --db-instance-identifier YOUR_DATABASE_INSTANCE_IDENTIFIER \
  --db-instance-class db.t2.micro \
  --engine postgres \
  --allocated-storage 5 \
  --no-publicly-accessible \
  --db-name YOUR_DATABASE_NAME \
  --master-username YOUR_DATABASE_USERNAME \
  --master-user-password YOUR_DATABASE_PASSWORD \
  --backup-retention-period 3

After running that command, you will see some output with configuration information for the database, including:

  • An endpoint, which you will need later.
  • The port, which for postgres will likely be 5432.

The endpoint will look something like this:

zappadatabase.yvsrfvqeusf5.us-east-1.rds.amazonaws.com

Environment variables

As in the above blog post, I also used python-dotenv for environment variables.

And stored them in .env.

DB_NAME=<your_db_name>
DB_USER=<your_db_username>
DB_PASS=<your_db_password>
DB_HOST=<aws_endpoint_from_earlier>
DB_PORT=<database_port_from_earlier_likely_5432_if_postgres>

Side notes

  1. This Flask app is structured using the Application Factory pattern.
  2. I used SQLAlchemy for the database ORM.

In config.py I loaded the environment like this:

import os
from dotenv import load_dotenv

BASEDIR = os.path.abspath(os.path.dirname(__file__))
load_dotenv(os.path.join(BASEDIR, '.env'))

Then, in init.py I could access the database variables like this:

import os

DB_HOST = os.environ.get("DB_HOST")
DB_PORT = int(os.environ.get("DB_PORT")
DB_USER = os.environ.get("DB_USER")
DB_PASS = os.environ.get("DB_PASS")
DB_NAME = os.environ.get("DB_NAME")

These can now be used to initialize the Postgres database URI:

SQLALCHEMY_DATABASE_URI = 'postgresql://{}:{}@{}:{}/{}'.format(os.environ.get('DB_USER'),os.environ.get('DB_PASS'),os.environ.get('DB_HOST'),os.environ.get('DB_PORT'),os.environ.get('DB_NAME'))

The database tables are created automatically in init.py and it goes something like this:

from flask import Flask
from flask_sqlalchemy import SQLAlchemy

db = SQLAlchemy()

app = Flask(__name__)
app.config.from_object('config.Config')
db.init_app(app)
db.create_all()

Remotely connecting to RDS Postgres with psql

I also wanted to be able to remotely connect to the database with psql, which required adding another VPC security group.

The blog post I linked above goes through configuring the VPC so that your Lambda function can access the database.

You just need to add another security group so that you can remotely access the database with psql.

Large Projects: slim_handler=True

Everything went mostly smoothly for me, until I needed to add the slim_handler option to my Zappa config.

As mentioned earlier, Zappa packages up your project to upload to AWS, and size matters here!

  • If your project package size is less than 50MB, it will be uploaded directly to AWS Lambda as a zip file.
  • However if the package is greater than 50MB, you will need to set slim_handler = True in your zappa_settings, and it will be uploaded to S3 instead and pulled from there at run time.

So basically, if your application package is small enough, it gets uploaded directly to Lambda and runs directly from there when HTTP requests come in.

But larger packages will be stored in S3, so a request will be sent to the S3 bucket url at run time.

When I added slim_handler=true to my config, I could only get 504 timeout errors from any requests to the S3 bucket.

None of the logs from my Flask code were showing up, because the application couldn't even access it in the S3 bucket to run it.

I'm far too embarrassed to estimate how long I was pulling my hair out trying to figure out the problem...

After some searching, I finally found another blog post that briefly mentioned setting up a VPC endpoint for S3, to allow internet access from the lambda function to S3.

So that was another thing to configure.

I just needed to set up a VPC endpoint for S3, and that solved the problem.

To do that, go to the VPC dashboard in AWS and follow the docs.

The endpoint is type Gateway.

This is actually the main reason I wrote this post, because it was a small thing that caused endless amounts of frustration.

Pandas

After that, I could at least see my logs in zappa tail, and now the errors had to do with not being able to install NumPy and Pandas.


Can anyone else relate to how exhilirating it is when you finally at least get a different error message after hours of banging your head against the wall?

Programming involves a total roller coaster of emotions sometimes...


Anyway, this one was pretty easy to solve, thankfully.

There are package size limits for uploading code directly to AWS Lambda(50MB) vs an S3 bucket, but overall the limit for /tmp directory storage is 512 MB, so your project can't be larger than that.

You can have up to 5 lambda layers in your project, so a separate layer for Pandas can be a good choice.

There are a lot of pre-built lambda layers out there, so I grabbed one for Pandas with Python 3.8 here.

  • Make sure that the layer you pick has the same AWS region as yours. My region is us-east-1.

Then I just added the layer's arn to my Zappa config:

    "dev": {
                ...
        "aws_region": "us-east-1",
        "slim_handler": "true",
        "layers": ["arn:aws:lambda:us-east-1:770693421928:layer:Klayers-python38-pandas:33"]
    }

Then updated the deployment...and it worked!

Thanks for reading!

I hope this was helpful to someone.

As I mentioned, the other thing I had to figure out was how to run the Flask app on a subdomain of an existing Squarespace site.

I will do a separate write-up for that, so stay tuned!

If you have questions or comments, or any tips for things I could have done better here, please leave them below, or reach out on Twitter @LVNGD.

Tagged In
blog comments powered by Disqus

Recent Posts

mortonzcurve.png
Computing Morton Codes with a WebGPU Compute Shader
May 29, 2024

Starting out with general purpose computing on the GPU, we are going to write a WebGPU compute shader to compute Morton Codes from an array of 3-D coordinates. This is the first step to detecting collisions between pairs of points.

Read More
webgpuCollide.png
WebGPU: Building a Particle Simulation with Collision Detection
May 13, 2024

In this post, I am dipping my toes into the world of compute shaders in WebGPU. This is the first of a series on building a particle simulation with collision detection using the GPU.

Read More
abstract_tree.png
Solving the Lowest Common Ancestor Problem in Python
May 9, 2023

Finding the Lowest Common Ancestor of a pair of nodes in a tree can be helpful in a variety of problems in areas such as information retrieval, where it is used with suffix trees for string matching. Read on for the basics of this in Python.

Read More
Get the latest posts as soon as they come out!