Mturk: where having followers can be a bad thing

MTurk may have as many as 40,000 active workers, and under 500 of them have completed my HITs. [1] Given this, I’ve been assuming that I would rarely have the same worker in one of my studies more than once. I was wrong.

I conducted a survey of 200 mTurk workers and, in the process, found that 14 of them (7%) were repeat participants — and more recently, after conducting a study of 160 mTurk workers, found that 18 of them (11.25%) had previously participated in my research. One worker, in fact, has now participated in my research three times.

My HITs pay well and may be attracting the attention of those who use turk alert, /r/HITsWorthTurkingFor, or any other site, subreddit, or script meant to hunt down the best paying HITs, or follow requesters who offer the most lucrative work.

The Problem

Most of my research contains deception — I lie to people, for science. It’s important to make sure workers who previously participated in deceptive research do not participate multiple times because exposure to deception might affect future behavior. Now that the threat of repeated participation is real, I need to create a convenient method for excluding workers on a large scale.

The Solution

The method I chose was to assign workers a qualification — then, when creating a HIT, exclude all workers who have been assigned this qualification. This sounds simple enough, but because I want to implement this on hundreds (and eventually thousands) of workers, I’ll need to use mTurk’s API (via Boto) and Python to automate the process. [2]

My original hope was to create a Python class that would take in a list of HITs whose workers should be excluded and, if a HIT was ever removed from this list, the workers associated with the removed HIT would no longer be granted the qualification (that is to say, they would again be allowed to participate in my HITs); I wanted a script that was smart enough to not require me to keep track of workers, only HIT IDs. Due to some oddities of the mTurk API, this solution was not possible.

A HIT that is 120 days old is considered expired (in the disposed state) and can no longer be called by the API. [3] This means that I cannot assign a qualification to the workers who completed such HITs simply by knowing the ID of the HIT. Rather, I need to know the IDs of each individual worker. To deal with this, my Python class needs to take in four lists: a list of HITs and a list of workers to exclude, as well as a list of HITs and a list of workers to remove the qualification from, to no longer exclude. I can still exclude workers by knowing only the ID of the HIT they completed, but only if I do so before the HIT has expired.

Apart from convenience for myself, I also had the goal of burdening workers as little as possible. Again, the oddities of the mTurk API made this difficult. My intention was to be able to grant and remove my qualification without alerting any workers. However, this is only partly possible. The mTurk API allows you to assign a qualification without alerting the receiver but, for whatever reason, you cannot remove this same qualification without the worker receiving an email alert. I don’t know how burdened mTurk workers are by random, meaningless emails, but I at least intend to rarely revoke my qualification (and therefore seldom contact these workers).

The code below is my current solution to the problem of repeated participation: [4]

'''

By R Gordon Rinderknecht, using Python 2.7.

Feel free to take any part of it and claim it as your own.

'''

from boto.mturk.connection import MTurkConnection

class participation_record(object):


    def __init__(self, access_id, secret_key, qualification,
                 hit_list =[], worker_list = [],
                 remove_hit_list = [],remove_worker_list =[]):
        self.qual = qualification
        self.hit_list = hit_list
        self.worker_list = worker_list
        self.remove_hit_list = remove_hit_list
        self.remove_worker_list  = remove_worker_list
        self.mturk = MTurkConnection(
            aws_access_key_id=access_id,
            aws_secret_access_key=secret_key,
            host='mechanicalturk.amazonaws.com')

    #get workers with a qualification
    def get_workers_from_qual(self):
        q = self.mturk.get_all_qualifications_for_qual_type(
            self.qual)
        workers = []
        for worker in q:
            workers.append(worker.SubjectId)
        return workers

    #get assignments from every page
    def get_all_assignments(self,hit):
        assignments = []

        page = 1

        while len(self.mturk.get_assignments(hit, page_number =
                                             str(page))) > 0:
            assignments.extend(self.mturk.get_assignments(hit,
                               page_number = str(page)))
            page += 1

        return assignments

    #get workers from a HIT
    def get_workers_from_hit(self,hit):

        '''
        Please forgive my Pokémon style (“Gotta catch ‘em all!”)
        error handling.  Boto does not provide extensive
        documentation on the errors I can expect to receive
        from its methods.
        '''

        try:
            assignments = self.get_all_assignments(hit)
            workers = []
            for a in assignments:
                workers.append(a.WorkerId)
            return workers
        except:
            print ("Hit: %s is no longer reviewable or "
                   "was entered incorrectly") % hit
            return []

    #get workers from a list of HITs
    def get_workers_from_all_hits(self,hits):
        all_workers = []
        for hit in hits:
            workers = self.get_workers_from_hit(hit)
            all_workers.extend(workers)
        return all_workers

    #add qualification to a worker
    def add_qualification(self,worker):
        try:
            self.mturk.assign_qualification(self.qual, worker,
                 send_notification=False)
            return True
        except:
            print ("Worker: %s no longer exists or was "
                   "entered incorrectly") % worker
            return False

    #remove qualification from a worker
    def remove_qualification(self,worker):
        try:
            self.mturk.revoke_qualification(worker, self.qual,
                 reason="Bulk Administration")
            return True
        except:
            print ("Worker: %s does not have this "
                   "qualification or was entered "
                   "incorrectly") % worker
            return False

    #get all workers who will receive a qualification into a list
    def get_workers_list(self):
        a = self.get_workers_from_all_hits(self.hit_list)
        a.extend(self.worker_list)
        return a

    #get all workers who will lose a qualification into a list
    def remove_workers_list(self):
        a = self.get_workers_from_all_hits(
            self.remove_hit_list)
        a.extend(self.remove_worker_list)
        return a

    #update list of qualified workers
    def participation_record_update(self):

        add_counter = 0
        remove_counter = 0

        remove_workers = set(self.remove_workers_list())
        new_workers = set(self.get_workers_list())

        for worker in remove_workers:
            if (self.remove_qualification(worker)==True):
                remove_counter += 1

        current_workers = set(self.get_workers_from_qual())

        for worker in new_workers:
            if worker not in current_workers:
                if (self.add_qualification(worker)==True):
                    add_counter += 1

        print ("%s new workers were granted your "
               "qualification") %add_counter

        print ("%s workers lost your "
               "qualification") %remove_counter

        x = len(self.get_workers_from_qual())
        print ("%s workers have received your "
               "qualification") %x


ACCESS_ID = '[your AWS access id]'
SECRET_KEY = '[your AWS secret key]'
qualification = "3UKR2O9MUFZMHV1???????????????"

#HITs whose workers should receive the qualification
hit_list = ['2W151Y7QEOZJ5TUY6NKM??????????',
            '3WUVMVA7OBAA627QRTUT??????????']

#workers who should receive the qualification
worker_list = ['AQVP5?????????',
               'A1CGU?????????',
               'A2QD7?????????']

#HITs whose workers should no longer receive the qualification
remove_hit_list = []

#workers who should no longer receive the qualification
remove_worker_list = []

record = participation_record(ACCESS_ID,SECRET_KEY,
                              qualification,hit_list,
                              worker_list,remove_hit_list,
                              remove_worker_list)

#call the method that updates your list of qualified workers
record.participation_record_update()

With this code, it is possible to both give and revoke a qualification to a worker at the same time. This may happen if a worker participated in both a HIT whose workers are being excluded and in a HIT whose workers are no longer being excluded, or if you simply made a mistake (for example, by including the same worker in both the worker_list and remove_worker_list). In such a situation, this code is designed to continue excluding the worker.

This code is just a first step. It contains minimal error handling and would benefit from added functionality. If you have any suggestions for how to improve this code or make this process of assigning qualifications simpler, your comments would be appreciated.

Also, thank you TurkerNation for helping me figure out mTurk's API.



Footnotes

[1]http://turkernation.com/archive/index.php/t-23243.html
[2]You can get started using Python and Boto by following this guide.
[3]http://mechanicalturk.typepad.com/blog/2011/04/overview-lifecycle-of-a-hit-.html
[4]This code has been updated to reflect a later post covering issues related to returning assignments from a HIT.

Comments