Applied AI Engineer

Arcade • Full-time • San Francisco, California, United States • $179k - $240k / year • 1d ago

Applied AI Engineer

Everyone's talking about AI. But here's the truth: ChatGPT can't send your emails. It can't book your flights. It can't even order you lunch.

Why? Because AI is trapped in a chat box. It can't take real actions in the real world.

We are changing that forever. We're not just building another AI company - we're creating the infrastructure that will power every AI application you'll use in the future.

The Revolution Needs You

Every AI app needs agentic "tools" - special functions that let AI models take real actions. Without tools, AI can only chat. With tools, AI can actually do things. We're building the actions runtime that allows AI agents to safely take real-world actions at enterprise scale. As an Applied AI Engineer on the Tools team, you'll push the boundary of what "a tool" even means at Arcade — designing agentic tools that go beyond deterministic API wrappers, building agents that build new tools, and composing tools into workflows that solve higher-level problems.

Why This Is The Opportunity of a Lifetime

Founder-Market Fit : Our CEO previously founded Stormpath (acquired by Okta), where he created the first Authentication API for developers. He's done this before - and this time the market is 10x bigger. Our CTO led the vector database team at Redis, shipped 100+ LLM applications, and is a contributor to LangChain and LlamaIndex. He knows this space better than anyone.
Dream Team : We've assembled authentication, integrations, distributed systems, and AI experts from Okta, Redis, Microsoft, Splunk, Ngrok, Google, Airbyte, Disney, and HPE who've built and founded multiple successful developer platforms.
Perfect Timing : We're at the inflection point of AI adoption. The biggest problem isn't better models - it's connecting AI to real-world actions. That's us.
Massive Market : We're building critical infrastructure for the biggest technological shift of our generation. Every AI app will need what we're building.
Backed By The Best: Our investors have backed Databricks, Clickhouse, MongoDB, Perplexity, Cohere, ScaleAI, Confluent, Elastic, and Firebase. They see what we see - this is going to be huge.

The Challenge

You'll report to the Engineering Manager for Tools and Growth. The Tools team owns Arcade's tool catalog — thousands of tools across many services, growing faster than any human can review by hand. The next leap in agent quality lives inside this team's work, and you'll be the applied-AI seat that pushes it forward.

Three real problems define the role.

Agentic tools vs. deterministic tools. Most tools today are deterministic: call X API with Y arguments, get Z result. That model breaks down for entire classes of agent work — research a topic, summarize a thread, decide which of three accounts to act on. Agentic tools, the ones that internally reason, plan, or call models are the answer, but the design space is wide open. When is agentic better than deterministic? How do you make an agentic tool fast, reliable, and debuggable? You'll set the bar for what these look like at Arcade.

Agents that build tools. The toolkit catalog is too big for hand-crafting to scale. We need agent harnesses that can take a vendor's API and produce a high-quality toolkit — design, code, eval, docs with a human in the loop only where the human is actually needed. There's early work on this already. You'll take it from a prototype into the production pipeline that produces the next thousand tools.

Workflows that compose tools. Individual tools solve narrow problems. Real customer outcomes: "close the quarter," "triage the inbox," "stand up the integration" need many tools, chained, with the right control flow. We need to figure out what the right primitive looks like above the tool layer, and you'll lead that design.

The most honest thing we can say about this work: most of the problems you'll be solving didn't exist three months ago. There's no prior art. There's no known solution. If that's the part of the job that makes you nervous, this isn't the right role. If that's the part that makes you lean in, it is.

We do real experiments. We form hypotheses. We publish learnings. Research is part of the job. But the role is built around shipping. If you want to spend six months proving an idea in a notebook before anything reaches a customer, this isn't the right role. If you want to ship the experiment and the writeup in the same quarter, it is.

What You'll Do

Design and ship agentic tools that go beyond deterministic API wrappers — and define the patterns the rest of the Tools team will use to build more.
Build the agent harness that automates tool creation — take a vendor's API, produce a high-quality toolkit end-to-end, keep humans in the loop only where humans add real value.
Design workflows that compose tools into higher-level abstractions customers can actually point at outcomes ("triage this inbox," "close out this account") rather than individual API calls.
Bring applied-ML rigor to tool design — evals, model-aware iteration, retrieval, tool description tuning, response shaping. Make decisions defensible with data.
Run model-aware experiments across Claude, GPT, Gemini — agentic tool behavior diverges across models in ways nobody else is studying, and we should.
Set the technical bar for what "good tool-building" looks like as the team scales — your patterns get inherited by every toolkit author after you.
Contribute back to the MCP and agent ecosystem where the conversation about agentic tools is forming.

Required Skills

5+ years software engineering experience, with at least 2 years shipping production ML or applied-AI systems. Formal title matters less than the work.
Strong Python.
LLM application depth — prompting, retrieval, tool use, agent design. You've built non-trivial agent systems and know where the rough edges are.
Experience designing or composing multi-tool / multi-agent workflows that produced real outcomes.
You've built evals at scale — not "I ran a benchmark once," but a measurement system real engineering decisions were made against.
Statistics fluency — significance, confidence intervals, A/B test design. You can defend whether a small delta is real or noise.
Comfort across multiple frontier models and reasoning about their behavioral differences.
A do-er, not a researcher-in-residence. You'd rather ship a working v0.5 next week than a polished v2.0 next quarter.
Comfort with ambiguity — early team, narrow charter that will expand. You make good decisions with incomplete data.
An insatiable desire to ship.

Bonus Points

You've built agents that build software (codegen agents, harness-style systems, meta-agents).
Prior work on tool-use specifically — BFCL, τ-bench, ToolBench, MCP eval work, or equivalent.
MCP ecosystem familiarity — extra bonus if you've filed an issue against the spec.
You've worked on agent frameworks (LangChain, CrewAI, AutoGen, Mastra) and have opinions about where they get tool use and workflow composition wrong.
Prior experience at an API platform, integrations-heavy product, or developer tools company.

Join The Movement

We're not just building a product - we're leading a movement to transform AI from just chatbots to agents that can take actions against real systems. This is your chance to be at the forefront of that revolution.

If you want to look back in 5 years and say, "I helped build that", then we want to talk to you. Ready to make AI actually useful? Apply Now

Compensation and Benefits

This role offers a competitive salary, equity, and benefits. Compensation is aligned with the range below and determined based on a candidate's background, experience, and performance.

Salary Range

$179,000-240,000 USD