Datasets built in public

Browse community datasets. Add a clip, image, recording, or snippet. Fork any into your own workspace.

01:32
CC-BY 4.0
Video

Street food, around the world

by @wanda.k · 2h ago

Short clips of street food vendors with on-screen descriptions and transcripts.

“…this is jollof rice from a stall in Lagos. The vendor says it cooks in a single iron pot…”

184 612
Botany · CC0
Image

Roadside plants, East Africa

by @kiprop.j · 5h ago

Phone photos of plants along East African roadsides, species-tagged by contributors.

acacia
sisal
frangipani
jacaranda
+1
41 2,840
Linguistics
Audio

Hausa proverbs, spoken

by @aminata · 1d ago

Recorded proverbs in Hausa with English glosses and pronunciation notes.

Kowa ya yi haƙuri… → Whoever is patient, eats the ripe fruit.

27 318

“Livraison rapide mais l'emballage…” → positive · packaging concern

Sentiment · MIT
Text

Customer reviews, French + English

by @lila.m · 1d ago

Bilingual customer reviews labeled with sentiment, aspect, and topic tags.

“Livraison rapide mais l'emballage…” → positive · packaging concern

96 4,120
Wildlife · CC-BY
Audio

Morning birdsong, by region

by @okoth · 2d ago

Dawn chorus recordings tagged by region and species.

Common bulbul · 06:14 · Kisumu

12 204
OCR · CC-BY
Image

Market signage, multilingual

by @priya.r · 3d ago

Handwritten and printed signs from informal markets; transcribed and translated.

handwritten
hindi
tamil
english
+1
64 1,180
03:18
CC-BY 4.0
Video

Sourdough walkthroughs

by @jonas.s · 4d ago

Step-by-step bread-making clips with verbal instructions and timing labels.

“Now you stretch and fold every 30 minutes for the next two hours…”

38 92

“The next train to Andheri departs from platform 3.” → arrival · Andheri · platform 3

NLU · CC0
Text

Transit arrival announcements

by @meera.v · 5d ago

Public-transit announcements labeled for intent, location, and time.

“The next train to Andheri departs from platform 3.” → arrival · Andheri · platform 3

18 1,640

Don't see what you'd like to build?

Start a new public dataset. Define what you want labeled, write the instructions, and the platform routes contributions to it.