Working with Batches

An AI Task Builder Batch allows you to collect human annotations on your existing data. You provide a dataset, define instructions, and participants evaluate each datapoint according to your instructions.

This guide covers the workflow for creating, configuring, and publishing a Batch.

Workflow overview

1

Create a dataset: Set up a dataset to hold your data.

2

Upload your data: Request presigned URLs and upload your files to S3.

3

Monitor dataset status: Wait for the dataset to finish processing.

4

Create a batch: Initialise a new batch and attach your dataset.

5

Create instructions: Define what participants should do with each datapoint.

6

Set up the batch: Trigger task generation.

7

Monitor batch status: Wait for tasks to be generated.

8

Create a study: Create a Prolific study that references your batch.

9

Publish the study: Make the study available to participants.

10

Retrieve responses: Download the annotated data after participants complete their tasks.

Creating a dataset

Create a dataset to hold your data.

$POST /api/v1/data-collection/datasets
1{
2 "name": "Product reviews Q4 2024",
3 "workspace_id": "6278acb09062db3b35bcbeb0"
4}

Response

1{
2 "id": "0192a3b5-e8f9-7a0b-1c2d-3e4f5a6b7c8d",
3 "name": "Product reviews Q4 2024",
4 "status": "UNINITIALISED"
5}

Uploading your data

Upload your dataset as a CSV file using presigned URLs.

Step 1: Request a presigned URL

$GET /api/v1/data-collection/datasets/{dataset_id}/upload-url/{filename}

For example:

$GET /api/v1/data-collection/datasets/0192a3b5-e8f9-7a0b-1c2d-3e4f5a6b7c8d/upload-url/reviews.csv

Step 2: Upload to S3

Use the presigned URL from the response to upload your CSV file directly to S3.

$curl -X PUT \
> -H "Content-Type: text/csv" \
> --data-binary @reviews.csv \
> "{presigned_url}"

CSV format

Your CSV should contain one row per datapoint. Each column is displayed to participants alongside the instructions.

1id,review_text,product_name,rating
21,"Great product, exactly what I needed!",Widget Pro,5
32,"Arrived damaged, very disappointed",Widget Pro,1
43,"Works as expected, nothing special",Basic Widget,3

For advanced options including metadata columns and custom task grouping, see Working with Datasets.

Monitoring dataset status

Poll the dataset endpoint to check when processing is complete.

$GET /api/v1/data-collection/datasets/{dataset_id}

Wait for the status to change to READY before proceeding.

Dataset status

StatusDescription
UNINITIALISEDDataset created but no data uploaded
PROCESSINGDataset is being processed
READYDataset is ready to be attached to a batch
ERRORSomething went wrong during processing

Creating a batch

Once your dataset is ready, create a batch and attach the dataset.

$POST /api/v1/data-collection/batches
1{
2 "workspace_id": "0192a3b4-c5d6-7e8f-9a0b-1c2d3e4f5a6b",
3 "name": "Product review sentiment analysis",
4 "dataset_id": "0192a3b5-e8f9-7a0b-1c2d-3e4f5a6b7c8d",
5 "task_details": {
6 "task_name": "Review Sentiment Classification",
7 "task_introduction": "<p>Read each product review carefully and classify its sentiment.</p>",
8 "task_steps": "<ol><li>Read the review text</li><li>Consider the overall tone</li><li>Select the appropriate sentiment</li></ol>"
9 }
10}

Task details

The optional task_details object provides context to participants:

FieldTypeDescription
task_namestringTitle displayed to participants
task_introductionstringIntroduction or general guidance
task_stepsstringSteps participants should follow

All three fields support basic HTML formatting.

Response

1{
2 "id": "0192a3b4-d6e7-7f8a-0b1c-2d3e4f5a6b7c",
3 "workspace_id": "0192a3b4-c5d6-7e8f-9a0b-1c2d3e4f5a6b",
4 "name": "Product review sentiment analysis",
5 "dataset_id": "0192a3b5-e8f9-7a0b-1c2d-3e4f5a6b7c8d",
6 "status": "UNINITIALISED",
7 "total_task_count": 0
8}

Batch status

A batch transitions through the following states:

StatusDescription
UNINITIALISEDBatch created but contains no tasks
PROCESSINGBatch is being processed into tasks
READYBatch is ready to be attached to a study
ERRORSomething went wrong during processing

Creating instructions

Instructions define what participants should do with each datapoint. Each instruction is displayed to participants sequentially alongside the datapoint.

$POST /api/v1/data-collection/batches/{batch_id}/instructions

Instruction types

TypeDescription
multiple_choiceSelection from a list of options. Use answer_limit to control how many options can be selected: 1 for single-select, -1 for unlimited, or any number up to the total options.
free_textOpen-ended text input
multiple_choice_with_free_textSelection from options, each with a heading and an associated free text field for additional input
file_uploadFile submission (images, documents, etc.)

By default, when there are 5 or more options, a dropdown is rendered instead of checkboxes or radio buttons. Set disable_dropdown: true to always use checkboxes/radio buttons. See Instructions for full details on all instruction fields.

Example: Sentiment classification

1{
2 "order": 1,
3 "type": "multiple_choice",
4 "description": "What is the overall sentiment of this review?",
5 "answer_limit": 1,
6 "options": [
7 { "label": "Positive", "value": "positive" },
8 { "label": "Neutral", "value": "neutral" },
9 { "label": "Negative", "value": "negative" }
10 ]
11}

Example: Free text explanation

1{
2 "order": 2,
3 "type": "free_text",
4 "description": "Briefly explain why you chose this sentiment rating",
5 "placeholder_text_input": "e.g. The reviewer uses positive language and expresses satisfaction..."
6}

Setting up the batch

Once your instructions are created, trigger task generation. Each datapoint in your dataset is paired with all instructions to create a task. Tasks are then organized into task groups — participants complete one task group per submission.

$POST /api/v1/data-collection/batches/{batch_id}/setup
1{
2 "tasks_per_group": 5
3}

The tasks_per_group parameter controls how many tasks are randomly assigned to each group. If omitted, each task group contains a single task.

Participants complete all tasks within their assigned group in a single submission. No participant will be assigned the same task group twice, even if they complete multiple submissions.

For custom task grouping based on your own criteria, see Working with Datasets.

This triggers task generation. The batch status will change to PROCESSING and then to READY once complete.

Monitoring batch status

Poll the batch endpoint to check when task generation is complete.

$GET /api/v1/data-collection/batches/{batch_id}

Wait for the status to change to READY before creating a study.

1{
2 "id": "0192a3b4-d6e7-7f8a-0b1c-2d3e4f5a6b7c",
3 "workspace_id": "0192a3b4-c5d6-7e8f-9a0b-1c2d3e4f5a6b",
4 "name": "Product review sentiment analysis",
5 "status": "READY",
6 "total_task_count": 150
7}

The total_task_count reflects the number of datapoints in your dataset.

Publishing a batch

To make your batch available to participants, create a Prolific study that references it.

$POST /api/v1/studies/

When creating the study, set data_collection_method to AI_TASK_BUILDER_BATCH and provide your batch ID:

1{
2 "name": "Product Review Sentiment Analysis",
3 "internal_name": "sentiment-analysis-q4-2024",
4 "description": "<p>Help us understand the sentiment in product reviews by classifying each review and explaining your reasoning.</p>",
5 "estimated_completion_time": 15,
6 "maximum_allowed_time": 45,
7 "reward": 300,
8 "data_collection_method": "AI_TASK_BUILDER_BATCH",
9 "data_collection_id": "0192a3b4-d6e7-7f8a-0b1c-2d3e4f5a6b7c",
10 "data_collection_metadata": {
11 "annotators_per_task": 3
12 }
13}

Use annotators_per_task in data_collection_metadata to specify how many participants should annotate each datapoint. The default is 1. After publishing, this value can only be increased.

Then publish the study:

$POST /api/v1/studies/{study_id}/transition/
1{
2 "action": "PUBLISH"
3}

Retrieving responses

After participants have completed their tasks, download the annotated data as a CSV.

$GET /api/v1/data-collection/batches/{batch_id}/report/

This returns your original CSV with additional columns containing participant responses for each instruction.


By using AI Task Builder, you agree to our AI Task Builder Terms.