Brooklyn’s qri aims to build a library of data for all the world to use
Three years in the making, the small startup launched its product and is building a community.
March 10, 2020
Since the start of modern computer processing, the value of data and the merits of using data to make decisions has skyrocketed, along with our ability to process it. It may be for that reason that our biggest companies so often make money not off the service they provide, but by the data they’re able to collect on a person. One Brooklyn startup is taking another tack. Qri (pronounced “query”) recently launched its product and is now looking for beta users. Qri wants to build a kind of GitHub or Wikipedia for data. Building a new suite of tools for data scientists and enthusiasts, qri wants to become a hub for open data, so that in the same way people can check Wikipedia to read about something, they’re able to check qri to see the data on it. What does that look like? We headed over to talk to qri’s Brendan O’Brien at their Brooklyn HQ to find out.
qri showing a “diff” function in a data set
Downtown Brooklyn Partnership: What is qri?
Brendan O’Brien: Qri, at its core, is a data set version control system. That sounds like a mouthful, but the thing that it does is it allows people to collaborate on data in a trusted and honorable way.
DBP: Could you give an example of what that means?
BOB: Totally. We think that open data, or the idea of data collaboration, can be moving orders of magnitude faster than it currently does. Today, the way open data works is that it’s basically just a number of people or organizations who collect data then choose to publish online to allow other people to use it. It’s a very one-directional conversation. If you go to the New York City Open Data portal, you can download the data sets New York City publishes. What you can’t do is take the work that I have made from that data set and use it for something of your own.
“There’s a culture [in Brooklyn] to put your money where your mouth is. I think that attitude forces the kind of pragmatism that I really enjoy, and I think has really helped ground the project.”
DBP: Couldn’t you publish that on GitHub?
BOB: Yes, totally, and that’s the problem right now. Most people working with data don’t and can’t use GitHub. The number of people who work with data are many more than the number of people work with software, and they’re not willing to learn Git. On top of that, how do you audit and trust a data set? There are no purpose-built tools for quickly being able to understand what has changed in a data set since we last looked at it, or who made the dataset. These are features that we’re used to seeing in software, that, if you don’t come from the software world, you might not ever know existed.
DBP: What’s an example of some of the tools?
BOB: There are some key components that are missing. The biggest thing missing is versioning, which creates a new version of the data each time someone edits it. If a user journals what they’ve done, they’re now creating a history of important points in that life of that data set. The other one that really doesn’t exist is a “diff” - a program that compares files in order to determine how or whether they differ. Quickly getting a computer to highlight for you what has changed makes it much easier for software engineers to compare notes. We don’t have that at all in the world of open data right now.
The qri team shares a laugh
DBP: Would you be selling these tools to companies?
BOB: There’s a lot of the flavoring of Wikipedia in qri, certainly in the attitude of people working together and learning to trust each other’s work. That’s at the center of how we think that you achieve massive scale. As a company, the way that we plan to make money are twofold: the first is to sell private data services to the people who are using the public site version; the other thing we want to do is sell an enterprise version of qri, so that, if you’re a company and you want this exact same system but you want it to only operate inside of your walls, we will set that up for you. Our real ultimate goal is making a giant Data Commons.
DBP: So you would sort of unlock this value and then, as people use it, sell them…
BOB: Sell services against that, totally. Ideally we’re creating a lot of value that is just circulating for everybody. You hear two things in data science all the time, first, that, “The data set that I want doesn’t exist;” the second, “I spend more time cleaning data than I do working with data.” We think you can cut down cleaning and spend more time working with data if you can trust someone else’s data. So, we’re spending a lot of time building this “plumbing” that lets you very quickly understand what someone else has been doing.
DBP: What was your background before this?
BOB: I originally started as a graphic designer in Canada. I moved down here a number of years ago, and spent a lot of time in open source software, which is where I learned how to code. The way that I got my programming knowledge was entirely just learning from projects on GitHub. The motivation for qri began when we witnessed a phenomenon in Canada that isn’t talked about a lot, where from 2010 to 2016, the census was optional. Data not getting collected like that means that you’re now in the dark on a whole bunch of things. You lose access to information forever, it’s not a one-time thing. We had a number of other instances of muzzling or suppression of scientific evidence coming from the government, as well.
Around four years ago, with the administration change, there was concern that the same thing might happen in the United States and so, along with a number of folks, I worked to try and build an archive of the EPA data. It was thousands of volunteers just showing up on weekends, saying, “Hey, we’re going to back this up, as we see EPA data very, very vital to climate change and understanding how climate change is progressing.” EPA data is used all over the world. It was during that eight to ten month period that I realized just how difficult the task was. The hardest thing was being able to prove we didn’t change anything. That is vital if you’re going to depend on that data for research purposes. We saw there was a real need for a set of tools that allows you to quickly and easily make a new version of something, automatically creating an audit trail. So that’s what we built.
DBP: So you and some people were working on these tools and you thought there could be social and commercial value to them, and you went out and founded a company?
BOB: This felt very meaningful, and I thought, “Okay, I’m willing, I’m willing to spend at least five to 10 years working on this problem.” So I went out and talked to a number of folks and I ended up raising $3 million in capital to get this project off the ground. We’ve got backers who are very confident in open source, including board member Bob Young, the founder of Red Hat Linux.
DBP: And now you’re at a point where you’ve started looking for beta users?
BOB: Totally. We’re at this phase where we spent three years in research mode, trying to crack this problem in a way that’s easy to use, and we now have that. The fun thing about this is that it doesn’t need millions of people to use it to make it useful. You can use it for your own personal needs, and if you publish that back then other people can see your work better, and that in turn should raise your profile. We have a lot of data that is being published by a growing community of people. Ideally, we’d love to have more people join the project and tell us how they think it should be shaped.
DBP: Why did you choose to make this in Brooklyn?
BOB: Brooklyn is my backyard. It’s where I’ve been living for the last seven or eight years. I also really like working in a place where we are forced to reckon with real problems, and then to contextualize that with data. The tech sector is not the number one thing here, and I really enjoy that. I think it’s exciting that we have to prove that what we make is worth your time, and that’s something that’s a part of New York. There’s a sort of culture here to put your money where your mouth is. I think that attitude forces the kind of pragmatism that I really enjoy, and I think has really helped ground the project. We don’t get up and celebrate unless we really feel like we’ve made something that did not exist. If we’d founded this in Silicon Valley I think we could have created a lot of buzz around the concepts that we were chasing, but I think the company would have had a different character. I think we would have been encouraged to raise more money sooner and spend it really fast. We want to be around for a while and establish a community, and to do that we need to think long-term. That’s not easy to do when you’re encouraged to seek out big, splashy investments and show high growth right out the gate.
Also, coming from Toronto, it’s really similar in the speed at which people move, the density, the growth and the renovation of formerly industrial spaces. So I feel at home here.
Do you need a directory of Business Improvement Districts? How about the GPS locations of bike paths in the city’s nature preserves? Qri is your place. The team of just five people continues to build a repository of the world’s data from right here in Brooklyn.
To find out more about Make It in Brooklyn and stay in the loop about events and opportunities, sign up to the Make It in Brooklyn newsletter to receive the latest news and updates.