
The TranscribeSpeech node transcribes speech from audio or video files. Supported input types include:
- Base-64 encoded data strings (if your media is small enough to fit in a request payload). Be sure to include the
data:prefix with a mime type (opens in a new tab). - Hosted media URLs (with a wide range of supported formats)
- YouTube URLs
TranscribeSpeech also includes these built-in capabilities:
- segmentation by sentence
- diarization (speaker identification)
- alignment to word-level timestamps
- automatic chapter detection
To simply transcribe input without further processing, provide an audio_uri. This can be a publicly-hosted audio or video file, base-64-encoded audio or video data, or a privately-hosted external file (opens in a new tab). For best results, you may also provide a prompt that describes the content of the audio or video.
Output
{ "text": "language like that, the wounded inner child, the inner pain, is part of a kind of pop psychological movement in the United States that is a sort of popular Freudianism that ..."}
To enable additional capabilities, set:
segment: Trueto return a list of sentencesegmentswithstartandendtimestamps.align: Trueto return a list of alignedwordswithin sentencesegments.diarize: Trueto includespeakerIDs withinsegmentsandwords.suggest_chapters: Trueto return a list of suggestedchapterswith titles andstarttimestamps.
Output
{ "text": "language like that, the wounded inner child, the inner pain, is part of a kind of pop psychological movement in the United States that is a sort of popular Freudianism that ...", "segments": [ { "start": 0.874, "end": 15.353, "speaker": "SPEAKER_00", "text": "language like that, the wounded inner child, the inner pain, is part of a kind of pop psychological movement in the United States that is a sort of popular Freudianism that", "words": [ { "word": "language", "start": 0.874, "end": 1.275, "speaker": "SPEAKER_00" }, { "word": "like", "start": 1.295, "end": 1.455, "speaker": "SPEAKER_00" } ] } ], "chapters": [ { "title": "Introduction to the Wounded Inner Child and Popular Psychology in US", "start": 0.794 }, { "title": "The Paradox of Popular Psychology and Anger in America", "start": 16.186 } ]}
You can customize the chapter summarization feature by implementing your own pipeline. To learn how to do this, and see example of how to use text segments to create an animated captions experience, check out our runnable example on val.town (opens in a new tab). You can also find this example in the examples/descript directory of the substrate-python (opens in a new tab) and substrate-typescript (opens in a new tab) SDK repositories.
