Skip to content
Talk to an Engineer Dashboard

Diarize

Connect to Diarize to transcribe and diarize audio and video content from YouTube, X, Instagram, and TikTok.

Connect to Diarize to transcribe and diarize audio and video content from YouTube, X, Instagram, and TikTok. Submit transcription jobs and retrieve results in JSON, TXT, SRT, or VTT format with speaker labels and word-level timestamps.

Diarize logo

Supports authentication: API Key

Diarize connector card shown in Scalekit's Create Connection search

Register your Diarize API key with Scalekit so it can authenticate and proxy transcription requests on behalf of your users. Unlike OAuth connectors, Diarize uses API key authentication — there is no redirect URI or OAuth flow.

  1. Get a Diarize API key

    • Sign in to diarize.io and go to SettingsAPI Keys.

    • Click + Create New Key, give it a name (e.g., Agent Auth), and confirm.

    • Copy the key value — store it securely, as you will not be able to view it again.

    Diarize.io settings page showing the API Keys section with an existing key and the Create New Key button

  2. Create a connection in Scalekit

    • In Scalekit dashboard, go to Agent AuthCreate Connection. Find Diarize and click Create.

    • Note the Connection name — you will use this as connection_name in your code (e.g., diarize).

    • Click Save.

    Scalekit connection configuration for Diarize showing the connection name and API Key authentication type

  3. Add a connected account

    Connected accounts link a specific user identifier in your system to a Diarize API key. Add accounts via the dashboard for testing, or via the Scalekit API in production.

    Via dashboard (for testing)

    • Open the connection you created and click the Connected Accounts tab → Add account.

    • Fill in:

      • Your User’s ID — a unique identifier for this user in your system (e.g., user_123)
      • API Key — the Diarize API key you copied in step 1
    • Click Create Account.

    Add connected account form for Diarize in Scalekit dashboard showing User ID and API Key fields

    Via API (for production)

    // Never hard-code API keys — read from secure storage or user input
    const diarizeApiKey = getUserDiarizeKey(); // retrieve from your secure store
    await scalekit.actions.upsertConnectedAccount({
    connectionName: 'diarize',
    identifier: 'user_123', // your user's unique ID
    credentials: { token: diarizeApiKey },
    });

Connect a user’s Diarize account and transcribe audio and video content through Scalekit tools. Scalekit handles API key storage and tool execution automatically — you never handle credentials in your application code.

Diarize is primarily used through Scalekit tools. Use execute_tool to submit transcription jobs, poll for completion, and download results in any supported format.

Tool calling

Use this connector when you want an agent to transcribe and diarize audio or video from YouTube, X, Instagram, or TikTok.

  • Use diarize_create_transcription_job to submit a URL for transcription. Returns an id (job ID) and an estimatedTime (in seconds) for how long processing will take.
  • Use diarize_get_job_status to poll until status is COMPLETED or FAILED. Use estimatedTime to set a sensible timeout — do not give up before that time has elapsed.
  • Use diarize_download_transcript to retrieve the result once complete. Choose json for structured speaker diarization data, or txt, srt, vtt for plain-text and subtitle formats.
examples/diarize_transcribe.py
import os, time
from scalekit.client import ScalekitClient
scalekit_client = ScalekitClient(
client_id=os.environ["SCALEKIT_CLIENT_ID"],
client_secret=os.environ["SCALEKIT_CLIENT_SECRET"],
env_url=os.environ["SCALEKIT_ENV_URL"],
)
connected_account = scalekit_client.actions.get_or_create_connected_account(
connection_name="diarize",
identifier="user_123",
).connected_account
# Step 1: Submit a transcription job
create_result = scalekit_client.actions.execute_tool(
tool_name="diarize_create_transcription_job",
connected_account_id=connected_account.id,
tool_input={
"url": "https://www.youtube.com/watch?v=example",
"language": "en", # optional — omit for auto-detection
"num_speakers": 2, # optional — improves speaker diarization
},
)
job_id = create_result.result["id"]
estimated_seconds = create_result.result.get("estimatedTime", 120)
deadline = time.time() + estimated_seconds * 2
print(f"Job {job_id} submitted. Estimated: {estimated_seconds}s")
# Step 2: Poll until complete
while True:
if time.time() > deadline:
raise TimeoutError(f"Job {job_id} timed out after {estimated_seconds * 2}s")
time.sleep(15)
status_result = scalekit_client.actions.execute_tool(
tool_name="diarize_get_job_status",
connected_account_id=connected_account.id,
tool_input={"job_id": job_id},
)
status = status_result.result["status"]
print("Status:", status)
if status == "COMPLETED":
break
if status == "FAILED":
raise RuntimeError(f"Job {job_id} failed")
# Step 3: Download the diarized transcript
transcript_result = scalekit_client.actions.execute_tool(
tool_name="diarize_download_transcript",
connected_account_id=connected_account.id,
tool_input={"job_id": job_id, "format": "json"},
)
# handle the transcript_result

Scalekit tools

Submit a new transcription and diarization job for an audio or video URL (YouTube, X, Instagram, TikTok). Returns a job ID that can be used to check status and download results.

NameTypeRequiredDescription
urlstringYesThe URL of the audio or video content to transcribe (e.g. YouTube, X, Instagram, TikTok link)
languagestringNoLanguage code for transcription (e.g. en, es, fr). Defaults to auto-detection if not provided.
num_speakersintegerNoExpected number of speakers in the audio. Helps improve diarization accuracy.

Retrieve the current status of a transcription job by its job ID. Returns job state (PENDING, PROCESSING, COMPLETED, FAILED), metadata, and an estimatedTime field (in seconds) indicating how long processing is expected to take. Use estimatedTime to determine polling frequency and max wait duration — for example, a 49-minute episode may have an estimatedTime of ~891 s (~15 mins), so the agent should wait at least that long before giving up.

NameTypeRequiredDescription
job_idstringYesThe unique ID of the transcription job to check

Download the transcript output for a completed transcription job in JSON, TXT, SRT, or VTT format, including speaker diarization, segments, and word-level timestamps.

NameTypeRequiredDescription
job_idstringYesThe unique ID of the completed transcription job
formatstringNoOutput format: json (default), txt, srt, or vtt