Instructor: Jenn Wortman Vaughan
Over the last decade, crowdsourcing has been used to harness the power of human computation to solve tasks that are notoriously difficult to solve with computers alone, such as determining whether or not an image contains a tree, rating the relevance of a website, or verifying the phone number of a business.
The machine learning and natural language processing communities were early to embrace crowdsourcing as a tool for quickly and inexpensively obtaining the vast quantities of labeled data needed to train systems. Once this data is collected, it can be handed off to algorithms that learn to make autonomous predictions or actions.
Usually this handoff is where interaction with the crowd ends. The crowd provides the data, but the ultimate goal is to eventually take humans out of the loop. Are there better ways to make use of the crowd?
In this tutorial, I will showcase innovative uses of crowdsourcing that go beyond the collection of data. I will also dive into recent research aimed at understanding who crowdworkers are, how they behave, and what this should teach us about best practices for interacting with the crowd.
The innovations I'll discuss fall into three categories:
- Applications to machine learning and/or natural language processing that go beyond the collection of data. For example, the crowd can be used to generate kernels by providing information about object similarity, or to debug the large and complex machine learning models used in fields like computer vision and speech recognition.
- Hybrid intelligence systems. These “human in the loop” AI systems leverage the complementary strengths of humans and machines in order to achieve more than either could achieve alone. While the study of hybrid intelligence systems is relatively new, there are already compelling examples that suggest its great potential for applications like real-time on-demand closed captioning of day-to-day conversations and crowd-powered writing and editing.
- Large scale studies of human behavior online. Crowdsourcing is gaining popularity among social scientists who use platforms like Amazon Mechanical Turk to quickly and easily recruit large pools of subjects for behavioral experiments. Such experiments can benefit computer science too. With the rise of social computing, computer scientists can no longer ignore the effects of human behavior when reasoning about the performance of computer systems. Experiments allow us to better model this human behavior, which leads to better designed algorithms and systems.
In the second half of the tutorial, I will talk about one of the most obvious and important yet often overlooked aspects of crowdsourcing: The crowd is made of people.
I'll discuss recent research—both qualitative and quantitative—that has opened up the black box of crowdsourcing to uncover that crowdworkers are not independent contractors, but rather a network with a rich communication structure.
I'll cover experiments that explored how to boost the quality of crowdwork using both well-designed monetary incentives (such as performance-based payments) and intrinsic motivation (such as piqued curiosity).
Finally, I'll discuss what this research can teach us about how to most effectively interacting with the crowd. (Hint: Be respectful, be responsive, be clear.)
Despite the inclusion of best practices and tips, this tutorial should not be viewed as a prescriptive guide for applying existing techniques. The goals of the tutorial are to inspire you to find novel ways of using crowdsourcing in your own research and to provide you with the resources you need to avoid common pitfalls when you do.
Crowdsourcing has the potential for major impact on the way we design machine learning and AI systems, but to unleash this potential we need more creative minds exploring novel ways to use it. Interested? Come to the tutorial or check out the material below.
Here are my slides (minus animations) from NIPS. This is a big file. I will post the ACL slides closer to the ACL tutorial date.
Making Better Use of the Crowd is a survey/position paper/best practice guide that I wrote while preparing this tutorial for NIPS. Please feel free to share it broadly, use it in your class, or cite it as an unpublished note. Comments are welcome.
The video of my NIPS tutorial is now available here.
Frequently Asked Questions for the NIPS Tutorial
But Jenn, aren't you a theorist? Is this secretly a theory tutorial?
While I'm probably best known in the NIPS community for my research on learning theory and algorithmic economics, this is not a theory tutorial. In fact, this tutorial will contain almost no math at all. But if you're interested in my vision of how research on the mathematical foundations of human computation can both help and benefit from experimental and empirical research, check out this recent CACM review article.
Who is the target audience of this tutorial?
This tutorial is open to anyone who wants to learn more about cutting edge research in crowdsourcing. No assumptions will be made about your familiarity with either crowdsourcing or specific machine learning techniques. Anyone who is curious is welcome to attend!
Will your slides be available online?
Yes! Check back here shortly before the workshop. If I'm on the ball, I will also post detailed lecture notes including a full list of references and other resources. I believe the tutorial will also be recorded and will post more info on that when I have it.
For the NIPS tutorial...
Date and time: Monday, December 5, 2016, 8:30-10:30am
Location: Centre Convencions Internacional Barcelona, Area 3
For the ACL tutorial...
Date and time: Sunday, July 30, 2017 (time TBD)
Location: Vancouver, Canada