# Batches
# Working with Batches
An AI Task Builder Batch allows you to collect human annotations on your existing data. You provide a dataset, define instructions, and participants evaluate each datapoint according to your instructions.
This guide covers the workflow for creating, configuring, and publishing a Batch.
## Workflow overview
Create a dataset: Set up a dataset to hold your data.
Upload your data: Request presigned URLs and upload your files to S3.
Monitor dataset status: Wait for the dataset to finish processing.
Create a batch: Initialise a new batch and attach your dataset.
Create instructions: Define what participants should do with each datapoint.
Set up the batch: Trigger task generation.
Monitor batch status: Wait for tasks to be generated.
Create a study: Create a Prolific study that references your batch.
Publish the study: Make the study available to participants.
Retrieve responses: Download the annotated data after participants complete their tasks.
## Creating a dataset
Create a dataset to hold your data.
```bash
POST /api/v1/data-collection/datasets
```
```json
{
"name": "Product reviews Q4 2024",
"workspace_id": "6278acb09062db3b35bcbeb0"
}
```
### Response
```json
{
"id": "0192a3b5-e8f9-7a0b-1c2d-3e4f5a6b7c8d",
"name": "Product reviews Q4 2024",
"status": "UNINITIALISED"
}
```
## Uploading your data
Upload your dataset as a CSV file using presigned URLs.
### Step 1: Request a presigned URL
```bash
GET /api/v1/data-collection/datasets/{dataset_id}/upload-url/{filename}
```
For example:
```bash
GET /api/v1/data-collection/datasets/0192a3b5-e8f9-7a0b-1c2d-3e4f5a6b7c8d/upload-url/reviews.csv
```
### Step 2: Upload to S3
Use the presigned URL from the response to upload your CSV file directly to S3.
```bash
curl -X PUT \
-H "Content-Type: text/csv" \
--data-binary @reviews.csv \
"{presigned_url}"
```
### CSV format
Your CSV should contain one row per datapoint. Each column is displayed to participants alongside the instructions.
```csv
id,review_text,product_name,rating
1,"Great product, exactly what I needed!",Widget Pro,5
2,"Arrived damaged, very disappointed",Widget Pro,1
3,"Works as expected, nothing special",Basic Widget,3
```
For advanced options including metadata columns and custom task grouping, see [Working with Datasets](/api-reference/ai-task-builder/datasets).
## Monitoring dataset status
Poll the dataset endpoint to check when processing is complete.
```bash
GET /api/v1/data-collection/datasets/{dataset_id}
```
Wait for the status to change to `READY` before proceeding.
### Dataset status
| Status | Description |
| --------------- | ------------------------------------------ |
| `UNINITIALISED` | Dataset created but no data uploaded |
| `PROCESSING` | Dataset is being processed |
| `READY` | Dataset is ready to be attached to a batch |
| `ERROR` | Something went wrong during processing |
## Creating a batch
Once your dataset is ready, create a batch and attach the dataset.
```bash
POST /api/v1/data-collection/batches
```
```json
{
"workspace_id": "0192a3b4-c5d6-7e8f-9a0b-1c2d3e4f5a6b",
"name": "Product review sentiment analysis",
"dataset_id": "0192a3b5-e8f9-7a0b-1c2d-3e4f5a6b7c8d",
"task_details": {
"task_name": "Review Sentiment Classification",
"task_introduction": "
Read each product review carefully and classify its sentiment.
",
"task_steps": "- Read the review text
- Consider the overall tone
- Select the appropriate sentiment
"
}
}
```
### Task details
The optional `task_details` object provides context to participants:
| Field | Type | Description |
| ------------------- | ------ | -------------------------------- |
| `task_name` | string | Title displayed to participants |
| `task_introduction` | string | Introduction or general guidance |
| `task_steps` | string | Steps participants should follow |
All three fields support basic HTML formatting.
### Response
```json
{
"id": "0192a3b4-d6e7-7f8a-0b1c-2d3e4f5a6b7c",
"workspace_id": "0192a3b4-c5d6-7e8f-9a0b-1c2d3e4f5a6b",
"name": "Product review sentiment analysis",
"dataset_id": "0192a3b5-e8f9-7a0b-1c2d-3e4f5a6b7c8d",
"status": "UNINITIALISED",
"total_task_count": 0
}
```
### Batch status
A batch transitions through the following states:
| Status | Description |
| --------------- | ---------------------------------------- |
| `UNINITIALISED` | Batch created but contains no tasks |
| `PROCESSING` | Batch is being processed into tasks |
| `READY` | Batch is ready to be attached to a study |
| `ERROR` | Something went wrong during processing |
## Creating instructions
Instructions define what participants should do with each datapoint. Each instruction is displayed to participants sequentially alongside the datapoint.
```bash
POST /api/v1/data-collection/batches/{batch_id}/instructions
```
### Instruction types
| Type | Description |
| -------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `multiple_choice` | Selection from a list of options. Use `answer_limit` to control how many options can be selected: `1` for single-select, `-1` for unlimited, or any number up to the total options. |
| `free_text` | Open-ended text input |
| `multiple_choice_with_free_text` | Selection from options, each with a heading and an associated free text field for additional input |
| `file_upload` | File submission (images, documents, etc.) |
By default, when there are 5 or more options, a dropdown is rendered instead of checkboxes or radio buttons. Set `disable_dropdown: true` to always use checkboxes/radio buttons. See [Instructions](/api-reference/ai-task-builder/instructions) for full details on all instruction fields.
### Example: Sentiment classification
```json
{
"order": 1,
"type": "multiple_choice",
"description": "What is the overall sentiment of this review?",
"answer_limit": 1,
"options": [
{ "label": "Positive", "value": "positive" },
{ "label": "Neutral", "value": "neutral" },
{ "label": "Negative", "value": "negative" }
]
}
```
### Example: Free text explanation
```json
{
"order": 2,
"type": "free_text",
"description": "Briefly explain why you chose this sentiment rating",
"placeholder_text_input": "e.g. The reviewer uses positive language and expresses satisfaction..."
}
```
## Setting up the batch
Once your instructions are created, trigger task generation. Each datapoint in your dataset is paired with all instructions to create a task. Tasks are then organized into **task groups** — participants complete one task group per submission.
```bash
POST /api/v1/data-collection/batches/{batch_id}/setup
```
```json
{
"tasks_per_group": 5
}
```
The `tasks_per_group` parameter controls how many tasks are randomly assigned to each group. If omitted, each task group contains a single task.
Participants complete all tasks within their assigned group in a single submission. No participant will be assigned the same task group twice, even if they complete multiple submissions.
For custom task grouping based on your own criteria, see [Working with Datasets](/api-reference/ai-task-builder/datasets).
This triggers task generation. The batch status will change to `PROCESSING` and then to `READY` once complete.
## Monitoring batch status
Poll the batch endpoint to check when task generation is complete.
```bash
GET /api/v1/data-collection/batches/{batch_id}
```
Wait for the status to change to `READY` before creating a study.
```json
{
"id": "0192a3b4-d6e7-7f8a-0b1c-2d3e4f5a6b7c",
"workspace_id": "0192a3b4-c5d6-7e8f-9a0b-1c2d3e4f5a6b",
"name": "Product review sentiment analysis",
"status": "READY",
"total_task_count": 150
}
```
The `total_task_count` reflects the number of datapoints in your dataset.
## Publishing a batch
To make your batch available to participants, create a Prolific study that references it.
```bash
POST /api/v1/studies/
```
When creating the study, set `data_collection_method` to `AI_TASK_BUILDER_BATCH` and provide your batch ID:
```json
{
"name": "Product Review Sentiment Analysis",
"internal_name": "sentiment-analysis-q4-2024",
"description": "Help us understand the sentiment in product reviews by classifying each review and explaining your reasoning.
",
"estimated_completion_time": 15,
"maximum_allowed_time": 45,
"reward": 300,
"data_collection_method": "AI_TASK_BUILDER_BATCH",
"data_collection_id": "0192a3b4-d6e7-7f8a-0b1c-2d3e4f5a6b7c",
"data_collection_metadata": {
"annotators_per_task": 3
}
}
```
Use `annotators_per_task` in `data_collection_metadata` to specify how many participants should annotate each datapoint. The default is 1. After publishing, this value can only be increased.
Then publish the study:
```bash
POST /api/v1/studies/{study_id}/transition/
```
```json
{
"action": "PUBLISH"
}
```
## Retrieving responses
After participants have completed their tasks, download the annotated data as a CSV.
```bash
GET /api/v1/data-collection/batches/{batch_id}/report/
```
This returns your original CSV with additional columns containing participant responses for each instruction.
***
By using AI Task Builder, you agree to our [AI Task Builder Terms](https://prolific.notion.site/Researcher-Terms-7787f102f0c541bdbe2c04b5d3285acb).