Posts about open-data

nyc map outline graphic
Using ogr2ogr to convert Shapefiles to GeoJSON

In this post we will use the ogr2ogr command line tool from GDAL to convert a shapefile of NYC zip code boundary data to GeoJSON format, as well as convert the projected coordinates to latitude and longitude, in one line of code.

Read More
Finding the nearest NYC subway station with a Voronoi map

A Voronoi diagram divides up a space into regions of influence based on a set of points. In this post we will generate a Voronoi diagram from a map of NYC subway station locations, which can be used to find the closest subway station to any location.

Read More
coronavirus graphic
Overview of the COVID-19 Open Research Dataset (CORD-19) + Kaggle Challenge

This is an overview of the COVID-19 Open Research Dataset (CORD-19), which is a corpus of research papers related to the coronavirus pandemic, and the Kaggle challenge to develop tools to process them using natural language processing techniques.

Read More
Feature Engineering with Python + Pandas: An Introduction

Feature Engineering is an important skill in data science, and is the process of taking raw data and turning it into features that can be used as inputs for training machine learning algorithms. We will look at 311 noise complaints data in this post.

Read More
Data cleaning with Python + Pandas: An Introduction

Cleaning up dirty, corrupted data with Python and Pandas. Dirty, corrupted data leads to dirty and corrupt analysis and conclusions. Who wants that? In this post we will go through a cleaning checklist with Pandas and a dataset from NYC Open Data.

Read More
Point in Polygon search with GeoDjango

Determining if a point lies in a polygon is a pretty common task in computational geometry. In this post we will use it to answer questions like 'which NYC neighborhood is this apartment building in?' using GeoDjango and data from NYC Open Data.

Read More