Data Pricing and Data Asset Governance in the AI Era

KDD 2021 Tutorial


Date & Time: Saturday, Aug. 14th 09:00 - 16:00 SGT

Location: KDD 2021 (Virtual Conference)

Tutorial Background and Overview

The recent emergence of powerful machine learning techniques and applications change disruptively the landscape of data and machine learning model supply chain in industry, that is, how data and machine learning models are produced, requested, deployed, shared and evolved. To echo this novel trend, this tutorial focuses on two major themes of the recent advances in data science: data pricing (Part A) and data asset management (Part B), each a self-contained 3-hour component collectively presenting the theme of “Data Pricing and Data Asset Governance in the AI Era Governance in the AI Era”.

Part A: Data and Model Pricing in the Pipeline of Machine Learning

The first part of the tutorial focuses on data pricing in the end-to-end machine learning pipeline. Building powerful machine learning models, particularly deep learning models, requires large amounts of data. Much data may be acquired from external sources. Moreover, many parties share their machine learning models as a service (MLaas) so that they can monetize their data assets and intellectual property in a timely manner. This tutorial systematically reviews the state-of-the-art research and development in this end-to-end process, and discusses the principles, opportunities, and challenges. We will start with a quick introduction to end-to-end data and machine learning supply chain, and review the essential principles in data and machine learning model pricing. Then, we will focus on the practice of pricing on four important components in the machine learning process, namely raw data, data labels, revenue allocation in collaborative machine learning, and pricing machine learning models.

Outline of Tutorial Part A

  1. Introduction

    • a) Machine learning pipeline

    • b) Data products and machine learning models as economic goods

    • c) Data and model pricing in machine learning pipeline

  2. Essentials of Data and Model Pricing

    • a) Data markets and structures

    • b) Data and model pricing desiderata

    • c) Pricing strategies

  3. Pricing raw data

    • a) Pricing general data sets

    • b) Pricing crowdsourcing/crowdsensing tasks

    • c) Pricing privacy

    • d) Pricing queries to databases

  4. Pricing data labels

    • a) Pricing data labels by golden sets

    • b) Pricing data labels by peer predictions

  5. Revenue allocation in collaborative machine learning

    • a) Pricing data by leave-one-out mechanism

    • b) Pricing data by cooperative game

  6. Pricing machine learning models

    • a) Revenue maximization pricing

    • b) Pricing (raw) data versus pricing machine learning models

  7. Summary, opportunities and challenges

Part B: Data Asset for Collaborative Intelligence

The second part of the tutorial focuses on data asset governance for decentralized collaboration. The nature of big data today entails an increasingly decentralized setting where data from various sources would be contributed to achieve data intelligence in a collaborative manner. We examine the corresponding challenges peculiar to decentralized data collaboration under the two principles of trust and incentive, including consensus, privacy, data auditing, data accounting and incentive design. Case studies are also presented for data economy ecosystems for both individual user data and business data settings.

Outline of Tutorial Part B

  1. Background and motivation

    • a) Challenges for current data ecosystem

    • b) Why data asset

    • c) Data asset history and definition

  2. Data asset core components

    • a) Value

    • b) Right

    • c) Control

  3. Data asset governance for decentralized collaborative intelligence

    • a) Governance principles

      • Trust

      • Incentive

    • b) Governance dimensions

      • Agreement

      • Accounting

      • Auditing

      • Reward & Penalty

    • c) Governance mechanisms

  4. "Trust" for data asset governance for decentralized collaborative intelligence

    • a) Attacking models

    • b) Agreement:

      • Consensus Algorithms: Framework and Evaluation

    • c) Accounting:

      • Distributed ledger Technology

    • d) Auditing:

      • Data auditing

    • e) Privacy:

      • Federated learning

  5. "Incentive" for data asset governance for decentralized collaborative intelligence

    • a) Data pricing

    • b) Value allocation model

    • c) Tokenomics design

  6. Data Economy Ecosystem

    • a) Case Study: Personal Data as Emerging Asset Class

    • b) Case Study: B-to-B Data Sharing and Exchange

  7. Summary, opportunities and challenges


Tutors

Jian Pei

School of Computing Science

Simon Fraser University

Email: jpei@cs.sfu.ca

Feida Zhu

School of Computing and Information Systems

Singapore Management University

Email: fdzhu@smu.edu.sg

Zicun Cong

School of Computing Science

Simon Fraser University

Email: zicun_cong@cs.sfu.ca

Xuan Luo

School of Computing Science

Simon Fraser University

Email: xuan_luo@cs.sfu.ca

Huiwen Liu

School of computing and Information Systems

Singapore Management University

Email: hwliu.2018@@phdcs.smu.edu.sg

Xin Mu

Peng Cheng Laboratory

Shenzhen, China

Email: mux@pcl.ac.cn