# Batches

# Working with Batches

An AI Task Builder Batch allows you to collect human annotations on your existing data. You provide a dataset, define instructions, and participants evaluate each datapoint according to your instructions.

This guide covers the workflow for creating, configuring, and publishing a Batch.

## Workflow overview

<Steps>
  <Step>
    Create a dataset: Set up a dataset to hold your data.
  </Step>

  <Step>
    Upload your data: Request presigned URLs and upload your files to S3.
  </Step>

  <Step>
    Monitor dataset status: Wait for the dataset to finish processing.
  </Step>

  <Step>
    Create a batch: Initialise a new batch and attach your dataset.
  </Step>

  <Step>
    Create instructions: Define what participants should do with each datapoint.
  </Step>

  <Step>
    Set up the batch: Trigger task generation.
  </Step>

  <Step>
    Monitor batch status: Wait for tasks to be generated.
  </Step>

  <Step>
    Create a study: Create a Prolific study that references your batch.
  </Step>

  <Step>
    Publish the study: Make the study available to participants.
  </Step>

  <Step>
    Retrieve responses: Download the annotated data after participants complete their tasks.
  </Step>
</Steps>

## Creating a dataset

Create a dataset to hold your data.

```bash
POST /api/v1/data-collection/datasets
```

```json
{
  "name": "Product reviews Q4 2024",
  "workspace_id": "6278acb09062db3b35bcbeb0"
}
```

### Response

```json
{
  "id": "0192a3b5-e8f9-7a0b-1c2d-3e4f5a6b7c8d",
  "name": "Product reviews Q4 2024",
  "status": "UNINITIALISED"
}
```

## Uploading your data

Upload your dataset as a CSV file using presigned URLs.

### Step 1: Request a presigned URL

```bash
GET /api/v1/data-collection/datasets/{dataset_id}/upload-url/{filename}
```

For example:

```bash
GET /api/v1/data-collection/datasets/0192a3b5-e8f9-7a0b-1c2d-3e4f5a6b7c8d/upload-url/reviews.csv
```

### Step 2: Upload to S3

Use the presigned URL from the response to upload your CSV file directly to S3.

```bash
curl -X PUT \
  -H "Content-Type: text/csv" \
  --data-binary @reviews.csv \
  "{presigned_url}"
```

### CSV format

Your CSV should contain one row per datapoint. Each column is displayed to participants alongside the instructions.

```csv
id,review_text,product_name,rating
1,"Great product, exactly what I needed!",Widget Pro,5
2,"Arrived damaged, very disappointed",Widget Pro,1
3,"Works as expected, nothing special",Basic Widget,3
```

For advanced options including metadata columns and custom task grouping, see [Working with Datasets](/api-reference/ai-task-builder/datasets).

## Monitoring dataset status

Poll the dataset endpoint to check when processing is complete.

```bash
GET /api/v1/data-collection/datasets/{dataset_id}
```

Wait for the status to change to `READY` before proceeding.

### Dataset status

| Status          | Description                                |
| --------------- | ------------------------------------------ |
| `UNINITIALISED` | Dataset created but no data uploaded       |
| `PROCESSING`    | Dataset is being processed                 |
| `READY`         | Dataset is ready to be attached to a batch |
| `ERROR`         | Something went wrong during processing     |

## Creating a batch

Once your dataset is ready, create a batch and attach the dataset.

```bash
POST /api/v1/data-collection/batches
```

```json
{
  "workspace_id": "0192a3b4-c5d6-7e8f-9a0b-1c2d3e4f5a6b",
  "name": "Product review sentiment analysis",
  "dataset_id": "0192a3b5-e8f9-7a0b-1c2d-3e4f5a6b7c8d",
  "task_details": {
    "task_name": "Review Sentiment Classification",
    "task_introduction": "<p>Read each product review carefully and classify its sentiment.</p>",
    "task_steps": "<ol><li>Read the review text</li><li>Consider the overall tone</li><li>Select the appropriate sentiment</li></ol>"
  }
}
```

### Task details

The optional `task_details` object provides context to participants:

| Field               | Type   | Description                      |
| ------------------- | ------ | -------------------------------- |
| `task_name`         | string | Title displayed to participants  |
| `task_introduction` | string | Introduction or general guidance |
| `task_steps`        | string | Steps participants should follow |

All three fields support basic HTML formatting.

### Response

```json
{
  "id": "0192a3b4-d6e7-7f8a-0b1c-2d3e4f5a6b7c",
  "workspace_id": "0192a3b4-c5d6-7e8f-9a0b-1c2d3e4f5a6b",
  "name": "Product review sentiment analysis",
  "dataset_id": "0192a3b5-e8f9-7a0b-1c2d-3e4f5a6b7c8d",
  "status": "UNINITIALISED",
  "total_task_count": 0
}
```

### Batch status

A batch transitions through the following states:

| Status          | Description                              |
| --------------- | ---------------------------------------- |
| `UNINITIALISED` | Batch created but contains no tasks      |
| `PROCESSING`    | Batch is being processed into tasks      |
| `READY`         | Batch is ready to be attached to a study |
| `ERROR`         | Something went wrong during processing   |

## Creating instructions

Instructions define what participants should do with each datapoint. Each instruction is displayed to participants sequentially alongside the datapoint.

```bash
POST /api/v1/data-collection/batches/{batch_id}/instructions
```

### Instruction types

| Type                             | Description                                                                                                                                                                         |
| -------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `multiple_choice`                | Selection from a list of options. Use `answer_limit` to control how many options can be selected: `1` for single-select, `-1` for unlimited, or any number up to the total options. |
| `free_text`                      | Open-ended text input                                                                                                                                                               |
| `multiple_choice_with_free_text` | Selection from options, each with a heading and an associated free text field for additional input                                                                                  |
| `file_upload`                    | File submission (images, documents, etc.)                                                                                                                                           |

<Note>
  By default, when there are 5 or more options, a dropdown is rendered instead of checkboxes or radio buttons. Set `disable_dropdown: true` to always use checkboxes/radio buttons. See [Instructions](/api-reference/ai-task-builder/instructions) for full details on all instruction fields.
</Note>

### Example: Sentiment classification

```json
{
  "order": 1,
  "type": "multiple_choice",
  "description": "What is the overall sentiment of this review?",
  "answer_limit": 1,
  "options": [
    { "label": "Positive", "value": "positive" },
    { "label": "Neutral", "value": "neutral" },
    { "label": "Negative", "value": "negative" }
  ]
}
```

### Example: Free text explanation

```json
{
  "order": 2,
  "type": "free_text",
  "description": "Briefly explain why you chose this sentiment rating",
  "placeholder_text_input": "e.g. The reviewer uses positive language and expresses satisfaction..."
}
```

## Setting up the batch

Once your instructions are created, trigger task generation. Each datapoint in your dataset is paired with all instructions to create a task. Tasks are then organized into **task groups** — participants complete one task group per submission.

```bash
POST /api/v1/data-collection/batches/{batch_id}/setup
```

```json
{
  "tasks_per_group": 5
}
```

The `tasks_per_group` parameter controls how many tasks are randomly assigned to each group. If omitted, each task group contains a single task.

Participants complete all tasks within their assigned group in a single submission. No participant will be assigned the same task group twice, even if they complete multiple submissions.

For custom task grouping based on your own criteria, see [Working with Datasets](/api-reference/ai-task-builder/datasets).

This triggers task generation. The batch status will change to `PROCESSING` and then to `READY` once complete.

## Monitoring batch status

Poll the batch endpoint to check when task generation is complete.

```bash
GET /api/v1/data-collection/batches/{batch_id}
```

Wait for the status to change to `READY` before creating a study.

```json
{
  "id": "0192a3b4-d6e7-7f8a-0b1c-2d3e4f5a6b7c",
  "workspace_id": "0192a3b4-c5d6-7e8f-9a0b-1c2d3e4f5a6b",
  "name": "Product review sentiment analysis",
  "status": "READY",
  "total_task_count": 150
}
```

The `total_task_count` reflects the number of datapoints in your dataset.

## Publishing a batch

To make your batch available to participants, create a Prolific study that references it.

```bash
POST /api/v1/studies/
```

When creating the study, set `data_collection_method` to `AI_TASK_BUILDER_BATCH` and provide your batch ID:

```json
{
  "name": "Product Review Sentiment Analysis",
  "internal_name": "sentiment-analysis-q4-2024",
  "description": "<p>Help us understand the sentiment in product reviews by classifying each review and explaining your reasoning.</p>",
  "estimated_completion_time": 15,
  "maximum_allowed_time": 45,
  "reward": 300,
  "data_collection_method": "AI_TASK_BUILDER_BATCH",
  "data_collection_id": "0192a3b4-d6e7-7f8a-0b1c-2d3e4f5a6b7c",
  "data_collection_metadata": {
    "annotators_per_task": 3
  }
}
```

<Note>
  Use `annotators_per_task` in `data_collection_metadata` to specify how many participants should annotate each datapoint. The default is 1. After publishing, this value can only be increased.
</Note>

Then publish the study:

```bash
POST /api/v1/studies/{study_id}/transition/
```

```json
{
  "action": "PUBLISH"
}
```

## Retrieving responses

After participants have completed their tasks, download the annotated data as a CSV.

```bash
GET /api/v1/data-collection/batches/{batch_id}/report/
```

This returns your original CSV with additional columns containing participant responses for each instruction.

***

By using AI Task Builder, you agree to our [AI Task Builder Terms](https://prolific.notion.site/Researcher-Terms-7787f102f0c541bdbe2c04b5d3285acb).