This is Part II of my post on image similarity in Python with perceptual hashing. In this post, we will use Spotify's Annoy library to perform nearest neighbors search on a collection of images to find similar images to a query image.
Read More
It seems impossible at this point for anyone not to have heard about or been impacted by the global coronavirus pandemic.
Even Jared Leto is up to speed.
Everyone is trying to make sense of things, especially the medical community on the front lines of it all.
So much new research is coming out about COVID-19 that it is hard - even impossible - to sift through all of it in a meaningful way.
The White House and several research groups - listed here - have released the COVID-19 Open Research Dataset (CORD-19), which is a massive dataset of scholarly papers related to COVID-19 and other coronaviruses.
The COVID-19 Open Research Dataset Challenge has been launched on Kaggle as well.
I'm going to go over a bit about the Kaggle challenge, as well as mention some other hackathons that are happening for COVID-19-related projects.
CORD-19 is a corpus with over 29,000 scholarly articles, including over 13,000 with full text, about COVID-19 and other coronaviruses like SARS-CoV-2.
The dataset will be updated weekly with new research as it is published.
The goal is for researchers to apply Natural Language Processing (NLP) techniques to develop tools for text and data mining.
Tools are needed that will help the medical community use all of this information to answer important scientific questions and put it to use in the fight against the virus, as well as help to learn more about the pandemic in general.
The dataset contains all COVID-19 and coronavirus-related research (e.g. SARS, MERS, etc.) from the following sources:
This comes from Semantic Scholar's website for the dataset, where you can also download the data.
As mentioned, Kaggle is hosting a competition for this, and they have a bunch of questions that anyone can get started with.
From the Kaggle website, they are sponsoring a $1,000 per task award to the winner whose submission is identified as best meeting the evaluation criteria.
Right now Kaggle lists the following tasks for this challenge - you can find them here.
This is the initial list, but there will probably be more added, so keep checking back.
There are a lot of tools for working with CORD-19 data, as well as other data feeds related to COVID-19.
This is far from an exhaustive list and is just some that I've seen mentioned on reddit or on the website for the CORD-19 dataset.
Along with the Kaggle competition, other hackathons related to the COVID-19 pandemic have been popping up.
Many of us have some extra time on our hands since we're supposed to be staying indoors.
Even if you're not familiar with data science or natural language processing, you could dip your toes in with this and play around with the data.
Any questions or comments, write them below or reach out on Twitter @LVNGD.
This is Part II of my post on image similarity in Python with perceptual hashing. In this post, we will use Spotify's Annoy library to perform nearest neighbors search on a collection of images to find similar images to a query image.
Read MoreKruskal's algorithm finds a minimum spanning tree in an undirected, connected and weighted graph. We will use a union-find algorithm to do this, and generate a random maze from a grid of points.
Read MoreJust in time for Valentine's day, create a puckering lips animation in D3 from an SVG path, using interpolations and .attrTween(). We will go through the steps from generating points from an SVG path, to interpolating lines in D3 to animate them.
Read More