Control Meets Learning Seminar
The design of data markets has gained in importance as firms increasingly use predictions from machine learning models to make their operations more effective, yet need to externally acquire the necessary training data to fit such models. This is particularly true in the context of the Internet where an ever-increasing amount of user data is being collected and exchanged. The challenge in creating such a marketplace stems from the very nature of data as an asset: (i) it can be replicated at zero marginal cost; (ii) its value to a firm is inherently combinatorial (i.e. the value of a particular dataset depends on what other (potentially correlated) datasets are available); (iii) its value to a firm is dependent on which other firms get access to the same data; (iv) prediction tasks and the value of an increase in prediction accuracy vary widely between different firms, and so it is not obvious how to set prices for a collection of datasets with correlated signals; (v) finally, the authenticity and truthfulness of data is difficult to verify a priori without first applying to a prediction task.
In this work, we consider the case with N competing firms and a monopolistic data seller. We demonstrate that modeling the utility of firms solely through the increase in prediction accuracy experienced reduces the complex, combinatorial problem of allocating and pricing multiple data sets to an auction of a single digital (freely replicable) good. We address an important property of such markets that has been given limited consideration thus far, namely the externality faced by a firm when data is allocated to other, competing firms. Addressing this is likely necessary for progress towards the practical implementation of such markets. Using the modeling abstraction, we obtain forms of the welfare-maximizing and revenue-maximizing auctions for such settings. We highlight how the form of the firms' private information – whether they know the externalities they exert on others or that others exert on them – affects the structure of the optimal mechanisms. We find that in all cases, the optimal allocation rules turn out to be single thresholds (one per firm), in which the seller allocates all information or none of it to a firm. We demonstrate how externality affects both allocation of information and revenue generated through simple examples. Finally, we discuss situations when this linear model fails.
This work is done in collaboration with Anish Agarwal, Thibaut Horel, Maryann Rui.