Overview

The Transcripts API enables automatic generation of transcripts and embeddings for video or audio content in the platform. These processes are queued and handled asynchronously using BullMQ (backed by Valkey/Redis), ensuring scalability and reliability even for large workloads.

When a content item is enqueued, a background worker extracts its audio, generates a transcript via AI, and stores related data (including embeddings) for search and context retrieval in the Vector Database (Qdrant).

Workflow

Enqueue Job:
The API enqueues a transcript generation job for a single video or for all videos in a group.
Job Processing:
A BullMQ worker listens for new jobs in Redis and handles:
- Audio extraction from the video
- Transcript generation using OpenAI
- Embedding creation and storage in Qdrant
Status Tracking:
Each transcript record is stored in ContentTranscript with a status (e.g. pending, completed, or failed).

Environment Requirements

Transcription requires that the following environment variables be correctly set:

Variable	Description
`REACT_APP_ENABLE_TRANSCRIPTS_GENERATION`	Must be `"true"` to enable transcript generation
`OPENAI_API_KEY`	API key used to generate transcripts and embeddings
`VALKEY_URL`	Redis/Valkey connection string used by BullMQ

Endpoints Summary

Endpoint	Method	Description
`/api/transcript/:contentId/enqueue`	`POST`	Enqueue transcript generation for a single content item
`/api/transcript/enqueue-group/:groupId`	`GET`	Enqueue transcript generation for all videos and audio in a group

Notes

The system automatically prevents duplicate jobs from running concurrently for the same content.
Jobs can be forced to regenerate transcripts or embeddings by passing flags in the request body.
Logs are available in the server output for tracking job processing and errors.

Workflow​

Environment Requirements​

Endpoints Summary​

Notes​

Workflow

Environment Requirements

Endpoints Summary

Notes