Q: In your work with predictive analytics, what challenges do you most frequently encounter? A: I consistently come across three types of problems faced by data science teams large and small: access, collaboration, and governance issues. On the access side, data scientists have an incredibly difficult time getting to the data they need—either because of IT architecture issues or institutional issues, where different teams “own” different data sets and have varying incentives to make their data available. The truth is, though, that there are a huge number of reasons why data scientists have a hard time getting access to the data they need. On the collaboration side, data scientists need to work in groups, centralizing their shared knowledge and working towards a common goal. This can be incredibly difficult as well, thanks to distributed teams and high turnover rates. Q: And as a lawyer, how do you see the governance challenges come across? A: Regulatory concerns limit predictive analytics in ways that data science teams and lawyers frequently don’t realize. As organizations move from a business intelligence framework, where analysts were the primary end consumer of data, to a machine-based framework, where machine learning models themselves are replacing analysts in a number of ways, new governance issues are arising that are challenging the way data science gets done. I’ll cite just one example: the EU’s General Data Protection Regulation, or GDPR, which can impose fines of up to four percent of global revenue, can require that meaningful information be available about the logic of machine learning models, which consumers can have a right to access. Before, you could ask the business intelligence analyst what she or he was doing with the data if you needed to. Increasingly, though, we’re going to need to ask the models themselves, and that requires an entirely different framework for governing and supervising how predictive analytics are applied. Q: How does predictive analytics deliver value at the organization you work with – what is one specific way in which it actively drives decisions or operations? A: A host of ways, from consumer identification and retention efforts, to streamlined decision making from the bottom of organizations all the way up to their c-suites. I think what’s most fascinating—and most powerful—about the current state of predictive analytics is the move towards automation. Data science is really eating entire organizations in the sense that data science teams’ products are becoming cross-vocational; you can have one data science team, for example, building models that span multiple areas of expertise, covering logistics and manufacturing to even medical diagnostics. There’s one fascinating example of some researchers at Mount Sinai Hospital in New York, who were able to use unsupervised deep learning to diagnose a range of patients, though no one fully knew how or why the diagnoses were accurate. But more to your question: one of our customers was using drone images to manage a large infrastructure project in a remote area, and had serious problems getting that data to data scientists and analysts involved in that project. So they used our platform to provide proper access to, and governance of, their data. And even though the consumers of the data were dispersed all over the world, in multiple regulatory jurisdictions, they were able to perform an infrastructure monitoring and upgrading effort that they would have had to complete in person only a few years ago (and at great cost). Q: When it comes to specific laws and legal trends, what should data scientists be aware of? A: I mentioned the EU’s GDPR, but what we’re really seeing is a wave of new efforts to regulate data and the way it’s used. And that last part is crucial—restrictions on how data can be used is the wave of the future from a regulatory standpoint. It used to be that regulations on data focused on security and access. But in a world where our data is increasingly available, and where we generate so much of it, regulations are going to assume that our data is accessible as a baseline, and move to focus on regulating how it’s used. And that’s exactly what the GDPR does, as well as China’s new “cybersecurity law,” among other examples. These new purpose-based restrictions can be hard to enforce with many of today’s data science tools. Q: Sneak preview: Please tell us a take-away that you will provide during your talk at Predictive Analytics World. A: The key problem confronting predictive analytics is really transparency. We’re in a world where data science operations are taking on increasingly important tasks, and the only thing holding them back is going to be how well the data scientists who train the models can explain what it is their models are doing. More on that during my talk!