We have been really happy to work with the data.world team and excited to announce a new set of integrations between Algorithmia and data.world! If you aren’t familiar with Algorithmia, we are a community of over 50,000 data scientists + developers publishing now over 3,500 ready-to-use algorithms consumable by REST APIs in many different languages including Python and R. You can write your own algorithms as well for your own personal use or within your organization. We’ll take care of all of the plumbing of the infrastructure needed to make sure your algorithms are immediately ready for production using CPU- or GPU-based compute clusters.
I can’t think of a better pairing for the data.world datasets than with the many algorithms available in the Algorithmia.com directory!
It’s very easy to consume and publish new datasets via Algorithmia. The data.world team has published four helper utility algorithms that you can take advantage of in your own algorithms. Since you can compose, chain, and pipe output to multiple algorithms together easily, you’ll have so many possibilities for processing datasets available from data.world. You can find all of their new algorithms available on their organization page on Algorithmia.com
First-Time Configuration
Before you start incorporating datasets or publishing to your datasets, you must configure your Algorithmia account to store the data.world credentials. The easiest way is with the “data.world configure” helper algorithm that will store your specific credentials into a known location in your Hosted Data storage on Algorithmia.com. Call it once, and you’re all ready to go!
Using Open Datasets in Your Algorithm Pipeline
Once you have your credentials configured, you can simply call the “data.world query” helper algorithm to pull data from data.world directly into any algorithm you create.
Let’s put together a quick NBA Annual Team Attendance History analysis function in Python that takes advantage of several of the most popular Time Series algorithms available on Algorithmia.com. There’s an awesome dataset on data.world with attendance metrics from Gabe Salzer where we can hone in on the Lakers.
import Algorithmia
import json
def apply(foo):
output = {}
client = Algorithmia.client()
input = {
"dataset_key": "gmoney/nba-team-annual-attendance",
"query": "SELECT home_total_attendance FROM `nba_team_annual_attendance` WHERE team='Lakers'",
"query_type": "sql",
"parameters": []
}
# load dataset
algo = client.algo("datadotworld/query")
dataset = algo.pipe(input).result["data"]
# process dataset
all_values = [d["home_total_attendance"] for d in dataset]
metrics = client.algo("TimeSeries/TimeSeriesSummary").pipe({"uniformData": all_values}).result
return metrics
The “TimeSeriesSummary” algorithm then returns a summary of Time Series metrics for the annual attendance of the Lakers:
{
"correlation": 0.33052435441847194,
"geometricMean": 766172.0799467923,
"intercept": 747591.4338235295,
"kurtosis": 15.675479207885754,
"max": 778877,
"mean": 767146.0625000001,
"min": 626901,
"populationVariance": 1322295814.9335918,
"rmse": 34319.671109063434,
"skewness": -3.9439454970253087,
"slope": 2607.283823529394,
"standardDeviation": 37555.94319495252,
"var": 1410448869.262498
}
Very nice!
There is a really awesome interactive Time Series algorithms demo if you want to check out how they work with a other types of datasets
Time Series algorithms are useful for various data sets but we’ve noticed other great options for some of the more popular datasets on data.world like Text Analysis and NLP algorithms including Word2Vec and many others. Check out the many different demos using algorithms available on Algorithmia.com to get a sense of the breadth of algorithms available to you!
Getting Started at Algorithmia
I’m impressed by the number of datasets already available in the data.world directory. We are excited to see how developers are going to leverage pairing the algorithms and microservices available from Algorithmia with the wealth of datasets at data.world! To show how much we appreciate members of the data.world community, we have a special promo code to get 100,000 additional credits after signing up for Algorithmia: DATADOTWORLD
We’re looking forward to seeing how you make use of the new data.world functions!