Program
Business Administration - Business Analytics
Research project title

Budget Allocation in Crowdsourcing

Research description

Data with informative labels provides basis for training machine learning models. However, when data is recorded without innate label, crowdsourcing is often used as a tool to label data by human intelligence. Despite of its popularity, the labels generated by crowdsourcing usually do not have a high quality due to unreliable workers unless a high level of redundancy is utilized during labeling in a high cost. To reduce the cost and in-crease the accuracy of crowdsourced data labels, we propose to develop a multi-stage crowdsourcing strategy where, in each stage, items are assigned to workers based on item's difficulty and worker's reliability estimated from previous labels. The types of label we consider include rating, ranking and classification labels. We will formulate this problem as a Markov decision process whose optimal policy can be solved by a knowledge-gradient method. The performances of our strategies will be evaluated over simulated and real data in comparisons with existing approaches. The impact of this project is that, when implemented in a real-world crowdsourcing platform, the significantly reduced cost and improved accuracy of data labels will facilitate machine learning and data mining, where most studies are conducted based upon the availability of large training data.

Undergraduate minimum qualifications

Familiar with at least one programming language including C/C++, Matlab, R, and Python.

Undergraduate role

Collect and clean crowdsourcing data from various online crowdsourcing marketplace such Amazon Mechanical Turk. Implement the algorithm for budget allocation for different crowdsourcing tasks. Coauthor with the mentor in the research paper produced during this project.