Working with Datasets
A dataset contains the data that participants will annotate in an AI Task Builder Batch. This page covers dataset creation, upload, and advanced configuration options.
For the complete batch workflow, see Working with Batches.
Creating a dataset
Response
Uploading data
Upload your dataset as a CSV file using presigned URLs.
Step 1: Request a presigned URL
For example:
Step 2: Upload to S3
Use the presigned URL from the response to upload your CSV file directly to S3.
CSV format
Your CSV should contain one row per datapoint. Each column is displayed to participants alongside the instructions.
Metadata columns
Columns prefixed with META_ are not displayed to participants. Use these for internal data you need in your results but don’t want participants to see.
In this example, participants see only the id and review_text columns. The META_source and META_timestamp columns are included in your results but hidden during annotation.
Custom task grouping
By default, tasks are grouped randomly when you set up a batch (using the tasks_per_group parameter). To define your own groupings, include a META_TASK_GROUP_ID column in your CSV.
Rows with the same META_TASK_GROUP_ID value will be grouped together into a single task group. Participants complete all tasks within a group in one submission.
In this example, tasks 1 and 2 are grouped together, as are tasks 3 and 4. A participant assigned to the widget_pro_reviews group will annotate both reviews in a single submission.
If your dataset includes META_TASK_GROUP_ID, these groupings take precedence over the tasks_per_group parameter during batch setup.
Dataset status
Poll the dataset endpoint to check processing status.
Wait for the status to reach READY before creating a batch with this dataset.