Exa connector

API KeyAnalyticsAIAutomationSearch

Connect to Exa to perform AI-powered semantic web search, crawl websites for structured content, get natural language answers from the web, run in-depth...

Exa connector

Install the SDK
Section titled “Install the SDK”
- Node.js
- Python
Terminal window
1 npm install @scalekit-sdk/node
Terminal window
1 pip install scalekit
Full SDK reference: Node.js | Python
Set your credentials
Section titled “Set your credentials”

Add your Scalekit credentials to your .env file. Find values in app.scalekit.com > Developers > API Credentials.
.env
```
SCALEKIT_ENVIRONMENT_URL=<your-environment-url>
SCALEKIT_CLIENT_ID=<your-client-id>
SCALEKIT_CLIENT_SECRET=<your-client-secret>
```
Set up the connector
Section titled “Set up the connector”

Register your Exa credentials with Scalekit so it can authenticate requests on your behalf. You do this once per environment.
Dashboard setup steps
Register your Exa API key with Scalekit so it can authenticate and proxy requests on behalf of your users. Unlike OAuth connectors, Exa uses API key authentication — there is no redirect URI or OAuth flow.
1. Generate an Exa API key
  - Sign in to dashboard.exa.ai/api-keys. Under Management, click API Keys.
  - Click + Create Key, enter a name (e.g., Agent Auth), and confirm.
  - In the Secret Key column, click the eye icon to reveal the key and copy it. Store it somewhere safe — you will not be able to view it again.
2. Create a connection in Scalekit
  - In Scalekit dashboard, go to AgentKit > Connections > Create Connection. Find Exa and click Create.
  - Note the Connection name — you will use this as connection_name in your code (e.g., exa).
3. Add a connected account
  
  Connected accounts link a specific user identifier in your system to an Exa API key. Add accounts via the dashboard for testing, or via the Scalekit API in production.
  
  Via dashboard (for testing)
  - Open the connection you created and click the Connected Accounts tab → Add account.
  - Fill in:
    - Your User’s ID — a unique identifier for this user in your system (e.g., user_123)
    - API Key — the Exa API key you copied in step 1
  - Click Save.
  Via API (for production)
  Node.js
  Python
  1 await scalekit.actions.upsertConnectedAccount({ 2 connectionName: 'exa', 3 identifier: 'user_123', 4 credentials: { api_key: 'your-exa-api-key' }, 5 });
  1 scalekit_client.actions.upsert_connected_account( 2 connection_name="exa", 3 identifier="user_123", 4 credentials={"api_key": "your-exa-api-key"} 5 )
  In production, call upsertConnectedAccount when a user connects their Exa account — for example, after they paste their API key into a settings page in your app.
Each Exa API key has a default limit of 10 QPS. Search, find-similar, and get-contents cost 1 credit per request, plus additional credits per content item (text, highlights, or summary) returned. exa_research and exa_websets run multiple sub-queries internally and consume significantly more credits. Monitor usage at dashboard.exa.ai → Usage.

1
import { ScalekitClient } from '@scalekit-sdk/node'
2
import 'dotenv/config'
3

4
const scalekit = new ScalekitClient(
5
  process.env.SCALEKIT_ENV_URL,
6
  process.env.SCALEKIT_CLIENT_ID,
7
  process.env.SCALEKIT_CLIENT_SECRET,
8
)
9
const actions = scalekit.actions
10

11
const connector = 'exa'
12
const identifier = 'user_123'
13

14
// Make your first call
15
const result = await actions.executeTool({
16
  connector,
17
  identifier,
18
  toolName: 'exa_list_websets',
19
  toolInput: {},
20
})
21
console.log(result)

1
import os
2
from scalekit.client import ScalekitClient
3
from dotenv import load_dotenv
4
load_dotenv()
5

6
scalekit_client = ScalekitClient(
7
    env_url=os.getenv("SCALEKIT_ENV_URL"),
8
    client_id=os.getenv("SCALEKIT_CLIENT_ID"),
9
    client_secret=os.getenv("SCALEKIT_CLIENT_SECRET"),
10
)
11
actions = scalekit_client.actions
12

13
connection_name = "exa"
14
identifier = "user_123"
15

16
# Make your first call
17
result = actions.execute_tool(
18
    tool_input={},
19
    tool_name="exa_list_websets",
20
    connection_name=connection_name,
21
    identifier=identifier,
22
)
23
print(result)

What you can do

Connect this agent connector to let your agent:

Similar find — Find web pages similar to a given URL using Exa’s neural similarity search
Search records — Search the web using Exa’s AI-powered semantic or keyword search engine
Research records — Run in-depth research on a topic using Exa’s neural search
Crawl records — Crawl one or more web pages by URL and extract their content including full text, highlights, and AI-generated summaries
List websets, webset items — List all Exa Websets in your account with optional pagination
Websets records — Execute a complex web query designed to discover and return large sets of URLs (up to thousands) matching specific criteria

Common workflows

Proxy API call

Node.js
Python

1
const result = await actions.request({
2
  connectionName: 'exa',
3
  identifier: 'user_123',
4
  path: '/search',
5
  method: 'POST',
6
  body: { query: 'LLM observability tools 2025', num_results: 5 },
7
});
8
console.log(result.data);

1
result = actions.request(
2
    connection_name='exa',
3
    identifier='user_123',
4
    path="/search",
5
    method="POST",
6
    json={"query": "LLM observability tools 2025", "num_results": 5},
7
)
8
print(result)

Semantic search

Search the web by meaning, not just keywords. This example searches for companies in the AI infrastructure space and returns AI-generated summaries for each result.

1
result = actions.execute_tool(
2
    connection_name='exa',
3
    identifier='user_123',
4
    tool_name="exa_search",
5
    tool_input={
6
        "query": "AI infrastructure companies building GPU cloud platforms",
7
        "num_results": 10,
8
        "type": "neural",
9
        "category": "company",
10
        "contents": {
11
            "summary": {"query": "What does this company do and who are their customers?"}
12
        }
13
    }
14
)
15

16
for item in result.result.get("results", []):
17
    print(f"{item['title']}: {item['url']}")
18
    print(f"  → {item.get('summary', 'No summary')}\n")

Search with full content enrichment

Retrieve the full page text and highlighted snippets alongside search results — useful when you want to pass source material directly into an LLM context window.

1
result = actions.execute_tool(
2
    connection_name='exa',
3
    identifier='user_123',
4
    tool_name="exa_search",
5
    tool_input={
6
        "query": "OpenAI API rate limits and pricing 2025",
7
        "num_results": 5,
8
        "type": "keyword",                     # keyword mode for precise terms
9
        "include_domains": ["openai.com", "platform.openai.com"],
10
        "contents": {
11
            "text": {"max_characters": 2000},  # cap text to save tokens
12
            "highlights": {
13
                "num_sentences": 3,
14
                "highlights_per_url": 2
15
            }
16
        }
17
    }
18
)
19

20
for item in result.result.get("results", []):
21
    print(f"## {item['title']}")
22
    print(f"URL: {item['url']}")
23
    if item.get("highlights"):
24
        print("Highlights:")
25
        for h in item["highlights"]:
26
            print(f"  - {h}")
27
    print()

Find similar pages

Discover pages that are semantically similar to a known URL — useful for competitive research, finding alternative data sources, or discovering similar products.

1
# Find companies similar to a known competitor
2
result = actions.execute_tool(
3
    connection_name='exa',
4
    identifier='user_123',
5
    tool_name="exa_find_similar",
6
    tool_input={
7
        "url": "https://www.linear.app",
8
        "num_results": 10,
9
        "exclude_domains": ["linear.app"],     # exclude the source URL itself
10
        "start_published_date": "2024-01-01",  # only recently indexed pages
11
        "contents": {
12
            "summary": {"query": "What product does this company build?"}
13
        }
14
    }
15
)
16

17
print("Similar companies to Linear:")
18
for item in result.result.get("results", []):
19
    print(f"  {item['title']} — {item['url']}")
20
    if item.get("summary"):
21
        print(f"    {item['summary']}")

Get content for known URLs

Extract structured content from a list of URLs you already have — from a CRM export, a prior search, or a manually curated list. No search query required.

1
# Enrich a list of company URLs from your CRM
2
company_urls = [
3
    "https://www.anthropic.com",
4
    "https://mistral.ai",
5
    "https://cohere.com",
6
]
7

8
result = actions.execute_tool(
9
    connection_name='exa',
10
    identifier='user_123',
11
    tool_name="exa_get_contents",
12
    tool_input={
13
        "urls": company_urls,
14
        "summary": {
15
            "query": "What AI models or products does this company offer, and who are their target customers?"
16
        },
17
        "subpages": 1,                # also fetch one subpage per URL (e.g. /about or /pricing)
18
        "subpage_target": "pricing",  # target the pricing subpage specifically
19
        "max_age_hours": 48           # use content no older than 48 hours
20
    }
21
)
22

23
for item in result.result.get("results", []):
24
    print(f"{item['url']}: {item.get('summary', 'No summary')}")

Get a direct answer

Ask a question and get a synthesized natural language answer grounded in live web sources. Returns the answer and the source URLs used — ready to display or inject into a citation-aware LLM prompt.

1
result = actions.execute_tool(
2
    connection_name='exa',
3
    identifier='user_123',
4
    tool_name="exa_answer",
5
    tool_input={
6
        "query": "What are the context window sizes and pricing for Claude Sonnet and GPT-4o as of 2025?",
7
        "num_results": 8,
8
        "text": True,                                  # include source snippets
9
        "include_domains": ["anthropic.com", "openai.com", "platform.openai.com"]
10
    }
11
)
12

13
print("Answer:", result.result.get("answer"))
14
print("\nSources:")
15
for source in result.result.get("sources", []):
16
    print(f"  - {source['title']}: {source['url']}")

Deep research on a topic

Run multi-angle research that decomposes your topic into parallel sub-queries and synthesizes the results. Use output_schema to get structured JSON instead of free-form text — useful for generating reports your code can consume directly.

1
result = actions.execute_tool(
2
    connection_name='exa',
3
    identifier='user_123',
4
    tool_name="exa_research",
5
    tool_input={
6
        "topic": "Competitive landscape of AI coding assistants in 2025 — key players, pricing, and differentiators",
7
        "num_subqueries": 5,
8
        "output_schema": {
9
            "type": "object",
10
            "properties": {
11
                "summary": {"type": "string"},
12
                "competitors": {
13
                    "type": "array",
14
                    "items": {
15
                        "type": "object",
16
                        "properties": {
17
                            "name": {"type": "string"},
18
                            "pricing": {"type": "string"},
19
                            "key_differentiator": {"type": "string"},
20
                            "target_customer": {"type": "string"}
21
                        }
22
                    }
23
                },
24
                "market_trends": {
25
                    "type": "array",
26
                    "items": {"type": "string"}
27
                }
28
            },
29
            "required": ["summary", "competitors", "market_trends"]
30
        }
31
    }
32
)
33

34
report = result.result
35
print("Summary:", report.get("summary"))
36
print("\nCompetitors:")
37
for c in report.get("competitors", []):
38
    print(f"  {c['name']}: {c.get('key_differentiator')}")
39
print("\nTrends:")
40
for t in report.get("market_trends", []):
41
    print(f"  - {t}")

LangChain integration

Let an LLM decide which Exa tool to call based on natural language. This example builds an agent that can search, retrieve content, and answer research questions on demand.

1
from langchain_openai import ChatOpenAI
2
from langchain.agents import AgentExecutor, create_openai_tools_agent
3
from langchain_core.prompts import (
4
    ChatPromptTemplate, SystemMessagePromptTemplate,
5
    HumanMessagePromptTemplate, MessagesPlaceholder, PromptTemplate
6
)
7

8
# Load all Exa tools in LangChain format. Use page_size=100 so connector tool lists are not truncated.
9
tools = actions.langchain.get_tools(
10
    identifier='user_123',
11
    providers=["EXA"],
12
    page_size=100
13
)
14

15
prompt = ChatPromptTemplate.from_messages([
16
    SystemMessagePromptTemplate(prompt=PromptTemplate(
17
        input_variables=[],
18
        template=(
19
            "You are a research assistant with access to Exa web search tools. "
20
            "Use exa_search for general queries, exa_answer for direct questions, "
21
            "exa_find_similar for competitive analysis, and exa_research for deep multi-source topics. "
22
            "Always cite your sources."
23
        )
24
    )),
25
    MessagesPlaceholder(variable_name="chat_history", optional=True),
26
    HumanMessagePromptTemplate(prompt=PromptTemplate(
27
        input_variables=["input"], template="{input}"
28
    )),
29
    MessagesPlaceholder(variable_name="agent_scratchpad")
30
])
31

32
llm = ChatOpenAI(model="gpt-4o")
33
agent = create_openai_tools_agent(llm, tools, prompt)
34
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
35

36
result = agent_executor.invoke({
37
    "input": "Who are the top 5 competitors to Notion for team knowledge management? Summarize each and compare their pricing."
38
})
39
print(result["output"])

Tool list

Use the exact tool names from the Tool list below when you call execute_tool. If you’re not sure which name to use, list the tools available for the current user first.

exa_answer#Get a natural language answer to a question by searching the web with Exa and synthesizing results. Returns a direct answer with citations to the source pages. Ideal for factual questions, current events, and research queries. Rate limit: 60 requests/minute.5 params

Get a natural language answer to a question by searching the web with Exa and synthesizing results. Returns a direct answer with citations to the source pages. Ideal for factual questions, current events, and research queries. Rate limit: 60 requests/minute.

NameTypeRequiredDescription

querystringrequiredThe question or query to answer from web sources.

exclude_domainsarrayoptionalJSON array of domains to exclude from answer sources.

include_domainsarrayoptionalJSON array of domains to restrict source search to. Example: ["reuters.com","bbc.com"]

include_textbooleanoptionalWhen true, also returns the source page text alongside the synthesized answer.

num_resultsintegeroptionalNumber of web sources to use when generating the answer (1–20). More sources improves accuracy but costs more credits.

exa_crawl#Crawl one or more web pages by URL and extract their content including full text, highlights, and AI-generated summaries. Useful for reading specific pages discovered via search. Rate limit: 60 requests/minute. Credit consumption depends on number of URLs.7 params

Crawl one or more web pages by URL and extract their content including full text, highlights, and AI-generated summaries. Useful for reading specific pages discovered via search. Rate limit: 60 requests/minute. Credit consumption depends on number of URLs.

NameTypeRequiredDescription

urlsarrayrequiredJSON array of URLs to crawl and extract content from.

highlights_per_urlintegeroptionalNumber of highlight sentences to return per URL when include_highlights is true. Defaults to 3.

include_highlightsbooleanoptionalWhen true, returns the most relevant sentence-level highlights from each page.

include_html_tagsbooleanoptionalWhen true, retains HTML tags in the extracted text. Defaults to false (plain text only).

include_summarybooleanoptionalWhen true, returns an AI-generated summary for each crawled page.

max_charactersintegeroptionalMaximum characters of text to extract per page. Defaults to 5000.

summary_querystringoptionalOptional query to focus the AI summary on a specific aspect of the page.

exa_delete_webset#Delete an Exa Webset by its ID. This permanently removes the webset and all its collected items. This action cannot be undone.1 param

Delete an Exa Webset by its ID. This permanently removes the webset and all its collected items. This action cannot be undone.

NameTypeRequiredDescription

webset_idstringrequiredThe ID of the webset to delete.

exa_find_similar#Find web pages similar to a given URL using Exa's neural similarity search. Useful for competitor research, finding related articles, or discovering similar companies. Optionally returns page text, highlights, or summaries. Rate limit: 60 requests/minute.8 params

Find web pages similar to a given URL using Exa's neural similarity search. Useful for competitor research, finding related articles, or discovering similar companies. Optionally returns page text, highlights, or summaries. Rate limit: 60 requests/minute.

NameTypeRequiredDescription

urlstringrequiredThe URL to find similar pages for.

end_published_datestringoptionalOnly return pages published before this date. ISO 8601 format: YYYY-MM-DDTHH:MM:SS.000Z

exclude_domainsarrayoptionalArray of domains to exclude from results.

include_domainsarrayoptionalArray of domains to restrict results to.

include_textbooleanoptionalWhen true, returns the full text content of each result page.

max_charactersintegeroptionalMaximum characters of page text to return per result when include_text is true. Defaults to 3000.

num_resultsintegeroptionalNumber of similar results to return (1–100). Defaults to 10.

start_published_datestringoptionalOnly return pages published after this date. ISO 8601 format: YYYY-MM-DDTHH:MM:SS.000Z

exa_get_webset#Get the status and details of an existing Exa Webset by its ID. Use this to poll the status of an async webset created with Create Webset. Returns metadata including status (created, running, completed, cancelled), progress, and configuration.1 param

Get the status and details of an existing Exa Webset by its ID. Use this to poll the status of an async webset created with Create Webset. Returns metadata including status (created, running, completed, cancelled), progress, and configuration.

NameTypeRequiredDescription

webset_idstringrequiredThe ID of the webset to retrieve.

exa_list_webset_items#List the collected URLs and items from a completed Exa Webset. Use this after polling Get Webset until its status is 'completed' to retrieve the discovered results.3 params

List the collected URLs and items from a completed Exa Webset. Use this after polling Get Webset until its status is 'completed' to retrieve the discovered results.

NameTypeRequiredDescription

webset_idstringrequiredThe ID of the webset to retrieve items from.

countintegeroptionalNumber of items to return per page. Defaults to 10.

cursorstringoptionalPagination cursor from a previous response to fetch the next page of items.

exa_list_websets#List all Exa Websets in your account with optional pagination. Returns a list of websets with their IDs, statuses, and configurations.2 params

List all Exa Websets in your account with optional pagination. Returns a list of websets with their IDs, statuses, and configurations.

NameTypeRequiredDescription

countintegeroptionalNumber of websets to return per page. Defaults to 10.

cursorstringoptionalPagination cursor from a previous response to fetch the next page.

exa_research#Run in-depth research on a topic using Exa's neural search. Performs a semantic search and returns results with full page text and AI-generated summaries, providing structured multi-source research output. Best for comprehensive topic analysis. Rate limit: 60 requests/minute.8 params

Run in-depth research on a topic using Exa's neural search. Performs a semantic search and returns results with full page text and AI-generated summaries, providing structured multi-source research output. Best for comprehensive topic analysis. Rate limit: 60 requests/minute.

NameTypeRequiredDescription

querystringrequiredThe research topic or question to investigate across the web.

categorystringoptionalRestrict research to a specific content category for more targeted results.

exclude_domainsarrayoptionalJSON array of domains to exclude from research results.

include_domainsarrayoptionalJSON array of domains to restrict research sources to. Useful to focus on authoritative sources.

max_charactersintegeroptionalMaximum characters of text to extract per source page. Defaults to 5000.

num_resultsintegeroptionalNumber of sources to gather for the research (1–20). More sources provide broader coverage.

start_published_datestringoptionalOnly include sources published after this date. ISO 8601 format.

summary_querystringoptionalOptional focused question to guide the AI page summaries. Defaults to the main research query.

exa_search#Search the web using Exa's AI-powered semantic or keyword search engine. Supports filtering by domain, date range, content category, and result type. Optionally returns page text, highlights, or summaries alongside search results. Rate limit: 60 requests/minute.19 params

Search the web using Exa's AI-powered semantic or keyword search engine. Supports filtering by domain, date range, content category, and result type. Optionally returns page text, highlights, or summaries alongside search results. Rate limit: 60 requests/minute.

NameTypeRequiredDescription

querystringrequiredThe search query. For neural/auto type, natural language works best. For keyword type, use specific terms.

categorystringoptionalRestrict results to a specific content category.

end_crawl_datestringoptionalOnly return pages crawled (discovered) before this date. ISO 8601 format.

end_published_datestringoptionalOnly return pages published before this date. ISO 8601 format: YYYY-MM-DDTHH:MM:SS.000Z

exclude_domainsarrayoptionalJSON array of domains to exclude from results. Example: ["reddit.com","quora.com"]

include_domainsarrayoptionalJSON array of domains to restrict results to. Example: ["techcrunch.com","wired.com"]

include_highlightsbooleanoptionalWhen true, returns relevant text snippets from each result page.

include_summarybooleanoptionalWhen true, returns an LLM-generated summary for each result page.

include_textbooleanoptionalWhen true, returns the full text content of each result page (up to max_characters).

max_age_hoursintegeroptionalMaximum age of cached content in hours. 0 fetches fresh content; -1 always uses cache; omit for fallback. Max 720.

max_charactersintegeroptionalMaximum characters of page text to return per result when include_text is true. Defaults to 3000.

moderationbooleanoptionalWhen true, enables content moderation to filter unsafe content from results.

num_resultsintegeroptionalNumber of results to return (1–100). Defaults to 10.

start_crawl_datestringoptionalOnly return pages crawled (discovered) after this date. ISO 8601 format.

start_published_datestringoptionalOnly return pages published after this date. ISO 8601 format: YYYY-MM-DDTHH:MM:SS.000Z

system_promptstringoptionalAdditional instructions that guide generated output, source preferences, or agent behavior.

typestringoptionalSearch type: 'neural' for semantic AI search (best for natural language), 'keyword' for exact-match keyword search, 'auto' to let Exa decide.

use_autopromptbooleanoptionalWhen true, Exa automatically rewrites the query to be more semantically effective.

user_locationstringoptionalTwo-letter ISO country code of the user, used to localize results. e.g. US, GB, DE.

exa_websets#Execute a complex web query designed to discover and return large sets of URLs (up to thousands) matching specific criteria. Websets are ideal for lead generation, market research, competitor analysis, and large-scale data collection. Returns a webset ID — poll status with GET /websets/v0/websets/{id}. High credit consumption.6 params

Execute a complex web query designed to discover and return large sets of URLs (up to thousands) matching specific criteria. Websets are ideal for lead generation, market research, competitor analysis, and large-scale data collection. Returns a webset ID — poll status with GET /websets/v0/websets/{id}. High credit consumption.

NameTypeRequiredDescription

querystringrequiredThe search query describing what kinds of pages or entities to find. Be specific and descriptive for best results.

countintegeroptionalTarget number of URLs to collect. Can range from hundreds to thousands. Higher counts take longer and consume more credits.

entity_typestringoptionalThe type of entity to search for. Helps Exa understand what constitutes a valid result match.

exclude_domainsarrayoptionalJSON array of domains to exclude from webset results.

external_idstringoptionalOptional external identifier to tag this webset for reference in your system.

include_domainsarrayoptionalJSON array of domains to restrict webset sources to.

Exa connector

Exa connector

Install the SDK

Set your credentials

Set up the connector

Generate an Exa API key

Create a connection in Scalekit

Add a connected account

Make your first call

What you can do

Common workflows

Tool list