Hey data people, how many times has this happened to you? You have a hunch about something impacting your business, maybe there’s a region with declining sales or maybe you think there’s some nefarious activity on your website that’s skewing numbers, so you email your DB admin, or analyst or data engineer or scientist to pull some numbers for you. You send an email detailing your hypothesis and the some of the data you need to pursue it:
From: datahoncho@superbiz.com
To: analystpeep@superbiz.com
Subject: Declining Midwest sales?
Hi Analyst Peep! I’ve noticed that our midwest sales are declining. I think it may have something to do with lack of sales person followup due to the recent snowstorm there. Could you pull our sales actuals and customer followups from Salesforce for Oct, Nov and Dec?
What do you get in response to that? The analyst runs a SQL query against your transactional systems to pull your sales data and hits Salesforce for CRM records. These numbers are dutifully exported as CSVs and Analyst Peep hits reply on that email. You, Data Honcho, stoked that your analyst replied so quickly, proceed to open those CSVs in Excel and start noodling. What happens next?
Uh oh… . More questions…
Why don’t the midwest customer names in the two spreadsheets line up?
What product lines are included in those sales figures?
Why are the sales figures out of whack with every other company in the region?
More emails ensue, more people are copied on replies, and finally you get to an initial analysis. This analysis is then screen-shotted and dropped in a Powerpoint only to be emailed further up the chain of command.
Unfortunately, whether we admit it or not, and regardless of the fancy data science pipeline and dashboarding solutions we cobble together, the communication and transport mechanism for 99% of the analytics work we do is emailing spreadsheets. How much business knowledge is trapped in the spreadmart (a clever contraction of spreadsheet and datamart that I learned from data.world advisor and ex-Continuum Analytics CMO, Michele Chambers) that is corporate email? How much time is lost going through old email threads looking for that one nugget that will explain how Bob in Accounting (who’s long since retired) calculated revenue?
We really want to solve that problem here at data.world. We believe in data-driven cultures (we start every day at data.world with a metrics meeting where we review our progress in numbers). Don’t you want to work at a place that values data and facts over conjecture and opinion? Jim Barksdale, the storied Silicon Valley exec, once famously said: “If we have data, let’s look at data. If all we have are opinions, let’s go with mine.” In order to get data and break the shackles of opinion and ego, you have to make your data accessible and your analysis reproducible to a broad array of people in your organization. You have to operate with a level of transparency that’s not possible when all the analysis is trapped in emailed Excel spreadsheets with limited distributions or esoteric Jupyter Notebooks locked in Github.
We believe that to facilitate data-driven decision-making. You need a data catalog that enables better data teamwork. We hope that data.world can help foster teamwork by increasing inclusion, context and connectivity, thus creating more data driven cultures and decision making. How does it do that though?
First, we need your help: STOP EMAILING SPREADSHEETS. Seriously. Next time you’re tempted to email a data request to an analyst or data scientist, start a data.world Data Project instead. What will you get if that happens?
1.Post your data request in the project summary or objective and invite people who you think can help you collect the data you need. You’ll see a permanent record of what’s happening in your activity stream so when new people are brought in, they figure out where to start (building inclusion and context from the get go):
2. When the initial pull of the data is ready, just have your analysts either drag and drop it into that data project or use our amazing Excel Add-In to push the data up. The activity stream will update, you’ll also get amazing previews of the data, and data.world will assess the quality of the data with our Data Inspector, saving you from round-tripping it back into Excel for basic exploration:
Notice that uploading a file updates the activity stream. In addition to publishing the activity stream, we also version what’s going on in your dataset and Project so you can track your steps and reproduce the analysis.
3. Want to fix those problems from the Data Inspector? Use that new Excel add-in I just mentioned to sync changes to the basic dataset back up.
4. Have questions? Post a discussion topic (with full data-aware embeds!). Everyone in your project will be alerted and, better yet, they all have access to the same assets for the project so context isn’t lost rifling around for lost attachments.
5. Want to go deeper? Immediately analyze those Excel, CSV and JSON files with SQL (no standing up databases or installing complicated software). Share your queries and integrate them into Tableau or Google Data Studio.
6. Complete the loop. Take analysis from Python Notebooks, R markdown, Tableau or Google Data Studio and embed them into Insights, right next to your data. Continue the discussion and keep it all in one place.
All this functionality (and so much more!), adding context and inclusion, is available today in data.world, but you can only get it if you don’t email that spreadsheet! Instead take the simple step of dragging and dropping it into data.world. Better yet, integrate data.world into your workflow using our new Excel Add-in. Together, we can all work toward a better data-driven future!
Read about our Excel add-in and how it can change your data analysis workflow.
Spreadsheets in emails aren’t the only culprits. Want an easy way to find out how and where your teams really work with your data? Use our Enterprise Data Stack Audit template!