Company Description
Surf is an AI intelligence platform dedicated to providing accurate and reliable insights for digital assets through an intuitive chat interface. By leveraging proprietary data and domain-specific models, Surf enables institutions and individual investors to analyze projects, assess market conditions, and make confident, data-driven investment decisions. Surf is committed to simplifying complex information and empowering users to navigate the digital asset landscape with ease and clarity.
Role Description
You’ll build and scale the core AI stack behind our product: training and fine-tuning models, building evaluation + data engines, and deploying reliable inference systems. This is a hands-on role for someone who likes owning the full loop: data → training → eval → deployment → monitoring → iteration.
Qualifications
- Hands-on experience fine-tuning LLMs, including SFT and preference tuning (e.g., DPO/ORPO), plus at least one of: distillation or adapter-based methods (e.g., LoRA/QLoRA).
- Experience scaling training with distributed setups (e.g., DDP/FSDP/DeepSpeed), using mixed precision, and doing GPU performance profiling/optimization (throughput, memory, utilization, bottlenecks).
- Familiarity with modern model serving and optimization stacks such as Triton, vLLM, TGI, or Ray Serve, including quantization workflows (e.g., AWQ/GPTQ/bitsandbytes) and latency/cost tuning.
- Experience building and shipping agentic systems using frameworks like LangGraph/LangChain, LlamaIndex (agents), Semantic Kernel, OpenAI Assistants-style tool calling, or similar—designing tool interfaces, state/plan management, retries, and guardrails for reliability in production.