Why the Concept of “Citizen Data Scientist” Terrifies Me

Imagine this scenario:

You enter your dentist’s office as a follow-up visit to the 6-month checkup you had last week. You’re nervous because the checkup revealed bleeding in your gums and cracks in your fillings, all things that require your dentist’s repair on this occasion.

So, you’re there nervously fidgeting in the lobby waiting for your name to be called, envisioning all sorts of dentistry implements of torture that will soon be inside your mouth. Finally, when your name is called, you’re shown to the room of horrors and slip into the chair that renders you vulnerable and helpless. The images of probes and picks and scalers track like ticker tape in your mind until the sound of a stranger’s voice breaks the monotony.

“Hello,” he says. “My name is Dr. Payne and I am your Citizen Dentist for today.”

Citizen Dentist?! You repeat the question out loud for him to hear, want an answer to this looney statement. “What is a Citizen Dentist?”

Get this. He replies, “I’m a person who performs dental work, but my proficiency and expertise is outside of the field of dentistry.”

Figure 1:Source:  Looney Tunes

We all you would never visit with a Citizen Dentist or Citizen Heart Surgeon or Citizen Colonoscopist; you’d only consult with doctors with deep expertise in their respective fields, who are constantly learning new techniques and being re-certified.

Thing is, Gartner is using this exact language for a data scientist. They define a “Citizen Data Scientist” as a person who creates or generates models that uses advanced diagnostic analytics or predictive and prescriptive capabilities, but whose primary job function is outside the field of statistics and analytics.

Wait a minute.

Doesn’t it make more sense to create “Citizens of Data Science” – a community of business stakeholders who understand how to identify, validate, value and prioritize the use cases where data and analytics can derive and drive new sources of customer, product and operational value?

Let’s take a closer look at the meaning and importance of championing Citizens of Data Science.

Data Science Skills and Expertise

The article “What IBM Looks for in a Data Scientist” outlines some key Data Scientist skills including:

  • Training as a scientist with an MS or PhD
  • Expertise in machine learning and statistics with an emphasis on decision optimization
  • Expertise in R, Python or Scala
  • Ability to transform and manage large data sets
  • Proven ability to apply the skills above to real-world business problems
  • Ability to evaluate model performance and tune it accordingly

While this list is interesting, it is insufficient if we expect our data scientists to deliver meaningful, relevant customer, product and operational value. Delivering value requires a data science team with a diverse set of skills and perspectives that understands how to uncover the sources of value (see Figure 2).

Figure 2: Data Science is a Team Sport

The Data science team is comprised of Data Engineers, Data Scientists and Business Stakeholders….  And like a baseball team who can’t function effectively with only shortstops and catchers, a data science team must clearly articulate the roles, responsibilities and expectations of the Data Engineers, Data Scientists and Business Stakeholders summarized in Figure 3[1].

Figure 3: Data Science Team Roles, Responsibilities and Tools

Because it might be hard to read Figure 3, let me write out the responsibilities:

Major Responsibilities of the Data Engineer:

  • Collect, manage, analyze, and visualize data
  • Manage data infrastructure and architecture
  • Develop dataset processes for modeling, mining and production
  • Improve data reliability, efficiency and quality
  • Develop data pipeline infrastructure
  • Develop scale out and scale up solutions

Major Responsibilities of the Data Scientist:

  • Translate business issues into analytic models / algorithms
  • Ascribe value to raw data through original interpretation and modeling
  • Interact with data using sophisticated analytic techniques
  • Prepare data for use in predictive and prescriptive modeling
  • Perform Feature engineering and identify hidden patterns in data
  • Automate using prescriptive and predictive analytics at scale
  • Engage stakeholders through stories

Major Responsibilities of Business Stakeholder:

  • Help the business make better decisions through data
  • Satisfy business queries using data
  • Deploy critical thinking when reviewing data
  • Deploy math skills when reviewing data
  • Undertake Business Intelligence activities
  • Communicate findings using understandable language
  • Data Stewardship and metadata management

A data science team prospers by exploiting the differences in opinions and perspectives from the different data science team members; to embrace the conflict between opposing and divergent goals to drive AI innovation (more about the topic of AI-driven innovation in a future blog).

Bear versus the Wolfpack: Data Science Pods

Because it is not realistic to hire a data scientist who can do it all – what I’ll call a bear – we deploy a wolfpack approach to data science projects. We leverage the diversity of experience, backgrounds and opinions of multiple team members – the wolfpack – to create a more robust, holistic data science AI-driven solution.

The Data Science wolfpack approach manifests itself into Data Science Pods with clearly defined roles, responsibilities and expectations. These Data Science Pods support the “organizational improvisation” necessary to respond to the fluid, iterative nature of a data science engagement. The Data Science Pod, and its support organization, is able to morph roles, responsibilities and expectations during the heat of the data science battle (see Figure 4).

Figure 4: Data Science Pods

And having a bias towards Design Thinking, we are in the process of creating a Data Science Pod Canvas (see Figure 5) that we will soon release (the more folks who use it, the better the canvas will get).

Figure 5: Data Science Pod Canvas (Beta)

Summary: Creating “Citizens of Data Science”

Individually strong vs. collectively strong…we can build an outstanding data scientist, but I’m more interested in building a data science culture that can drive cross organizational collaboration (and leverage conflict) to derive and drive new sources of customer, product and operational value.

Instead of trying to turn subject matter experts into “Citizen Data Scientist” let’s turn them into “Citizens of Data Science” and define the role that they play in helping organizations leverage data and analytics to power their business models, which includes:

  • Identifying, validating, vetting, valuing and prioritizing the use cases against which the data science resources should focus
  • Identifying the metrics and KPI’s against which progress and success will be measured
  • Quantifying the costs associated with False Positives and False Negatives
  • Governance in ensuring the proper usage of the resulting analytics (think ethics here, baby)
  • Operationalizing the resulting analytic insights so the front-line employees and customers are getting the insights they need to be successful
  • Monetization in identifying where and how the analytics can derive and drive new sources of customer, product and operational value

Oh, by the way, I have an entire book dedicated to creating “Citizens of Data Science.” Nice, easy but highly relevant read if you want to teach your business stakeholders to “Think Like a Data Scientist” in order to transform your organization to exploit the economics of data to uncover new sources of customer, product and operational value.

Figure 6: “The Art of Thinking Like a Data Scientist

In a world more and more dominated by AI, it will become everyone’s responsibility to become a Citizen of Data Science– to “Think Like a Data Scientist – in order to identify those areas where data and analytics can optimize key business and operational processes, reduce security and compliance risks, uncover new revenue opportunities, and create a more compelling customer and partner engagement.

You are only a click away from learning how to “Think Like a Data Scientist”!  Whatcha waiting for?

[1] Sources for Figure 3 include job postings from major Silicon Valley companies including:  Amazon, American Express, Apple. Facebook, Google, Netflix and Data Science communities including Kaggle, Masters of Data Science and DataCamp, as well as the following articles: What happens when you hire a data scientist without a data engineer?;  What is a Data Engineer?;  What is a Data Scientist?; What is a Data Analyst?; What is a Business Analyst?;  What is the difference between a Data Engineer and a Data Scientist?;  Data Engineering vs Data Science Infographic.