Making Better Use of the Crowd

Tutorial Information for NIPS 2016, ACL 2017, and KDD 2017
Instructor: Jenn Wortman Vaughan

Overview

Over the last decade, crowdsourcing has been used to harness the power of human computation to solve tasks that are notoriously difficult to solve with computers alone, such as determining whether or not an image contains a tree, rating the relevance of a website, or verifying the phone number of a business.

The machine learning and natural language processing communities were early to embrace crowdsourcing as a tool for quickly and inexpensively obtaining the vast quantities of labeled data needed to train systems. Once this data is collected, it can be handed off to algorithms that learn to make autonomous predictions or actions.

Usually this handoff is where interaction with the crowd ends. The crowd provides the data, but the ultimate goal is to eventually take humans out of the loop. Are there better ways to make use of the crowd?

In this tutorial, I will showcase innovative uses of crowdsourcing that go beyond the collection of data. I will also dive into recent research aimed at understanding who crowdworkers are, how they behave, and what this should teach us about best practices for interacting with the crowd.

The innovations I'll discuss fall into three categories:

Applications to machine learning and/or natural language processing that go beyond the collection of data. For example, the crowd can be used to generate kernels by providing information about object similarity, or to debug the large and complex machine learning models used in fields like computer vision and speech recognition.
Hybrid intelligence systems. These “human in the loop” AI systems leverage the complementary strengths of humans and machines in order to achieve more than either could achieve alone. While the study of hybrid intelligence systems is relatively new, there are already compelling examples that suggest its great potential for applications like real-time on-demand closed captioning of day-to-day conversations and crowd-powered writing and editing.
Large scale studies of human behavior online. Crowdsourcing is gaining popularity among social scientists who use platforms like Amazon Mechanical Turk to quickly and easily recruit large pools of subjects for behavioral experiments. Such experiments can benefit computer science too. With the rise of social computing, computer scientists can no longer ignore the effects of human behavior when reasoning about the performance of computer systems. Experiments allow us to better model this human behavior, which leads to better designed algorithms and systems.

In the second half of the tutorial, I will talk about one of the most obvious and important yet often overlooked aspects of crowdsourcing: The crowd is made of people.

I'll discuss recent research—both qualitative and quantitative—that has opened up the black box of crowdsourcing to uncover that crowdworkers are not independent contractors, but rather a network with a rich communication structure.

I'll cover experiments that explored how to boost the quality of crowdwork using both well-designed monetary incentives (such as performance-based payments) and intrinsic motivation (such as gamification or a sense of doing good).

Finally, I'll discuss what this research can teach us about how to most effectively interacting with the crowd. (Hint: Be respectful, be responsive, be clear.)

Despite the inclusion of best practices and tips, this tutorial should not be viewed as a prescriptive guide for applying existing techniques. The goals of the tutorial are to inspire you to find novel ways of using crowdsourcing in your own research and to provide you with the resources you need to avoid common pitfalls when you do.

Target Audience

This tutorial is open to anyone who wants to learn more about cutting edge research in crowdsourcing. No assumptions will be made about your familiarity with either crowdsourcing or specific machine learning techniques. Anyone who is curious is welcome to attend!

Accompanying Material

Slides: from NIPS, from ACL, from KDD

Video: from NIPS

Notes: Here is my survey/position paper/best practice guide which was recently published in JMLR.

Logistics

For the NIPS tutorial...
Date and time: Monday, December 5, 2016, 8:30-10:30am
Location: Centre Convencions Internacional Barcelona (Area 3)

For the ACL tutorial...
Date and time: Sunday, July 30, 2017, 2:00-5:30pm
Location: Westin Bayshore, Vancouver, Canada (Mackenzie room)

For the KDD tutorial...
Date and time: Sunday, August 13, 2017, 1:00-5:00pm
Location: World Trade and Convention Centre/Scotiabank Centre, Halifax, Nova Scotia, Canada (Tentatively in Suite 304)
Registration: here