AI/ML Engineer

Rakuten Symphony • Full-time • San Mateo, CA, US • 1w ago

Job Title: Senior AI Engineer (AI COE)

Location: San Mateo, CA, USA (Onsite)

Hire Type - Fulltime Only (NO C2C)

Why should you choose us?

Rakuten Symphony is a Rakuten Group company, that provides global B2B services for the mobile telco industry and enables next-generation, cloud-based, international mobile services. Building on the technology Rakuten used to launch Japan’s newest mobile network, we are taking our mobile offering global. To support our ambitions to provide an innovative cloud-native telco platform for our customers, Rakuten Symphony is looking to recruit and develop top talent from around the globe. We are looking for individuals to join our team across all functional areas of our business – from sales to engineering, support functions to product development. Let’s build the future of mobile telecommunications together!

Are you interested in working for a Global Leader in Telecom and Cloud Innovation? Are you excited about going beyond prompt engineering and building domain-specialized Large Language Models? Do you want to work on continued pre-training (CPT), fine-tuning, and evaluation of LLMs, and deploy them into large-scale, production telecom platforms?

As part of our Generative AI engineering team, you will work on adapting, training, and operationalizing LLMs for telecom-specific reasoning, analytics, automation, and content generation. This role is ideal for engineers who want to work closer to the model layer, not just orchestration and APIs.

Join us and help build AI-native telecom platforms powered by custom, domain-tuned language models and help build the future of connected intelligence in telecommunications.

What Do We Expect From You

As a Senior AI Engineer – Generative AI, you will design, train and deploy customized LLMs optimized for telecom domain knowledge, reasoning and structured content generation.

You will work across the full model lifecycle — data preparation, continued pre-training, fine-tuning, evaluation and production deployment — and collaborate with platform teams to integrate these models into real systems.

Roles & Responsibilities:

Cultivate a culture of engineering excellence, innovation, and continuous improvement.
Design and implement LLM training and adaptation pipelines, including:
Continued Pre-Training (CPT) on domain-specific corpora
Supervised Fine-Tuning (SFT) and instruction tuning
Prepare and curate large-scale telecom-specific datasets for model training:
Technical documents, logs, tickets, procedures, specifications
Build domain-specialized “writing and reasoning models” for:
Technical documentation generation
Incident summaries and root-cause narratives
Network insights and operational explanations
Evaluate and benchmark LLMs for: Domain accuracy and reasoning quality
Factuality, consistency, and hallucination reduction
Optimize training and inference for cost, latency, and scalability.
Collaborate with GenAI platform teams to deploy fine-tuned models as secure, scalable services.
Integrate fine-tuned models with RAG and tool-augmented workflows where appropriate (not RAG-only).
Contribute to internal best practices for model training, evaluation, and governance

Skills and Qualifications:

Bachelor’s or Master’s degree in Computer Science, Engineering, Mathematics, or a related field.
4–7 years of experience in AI / ML engineering, with hands-on experience in LLM fine-tuning or adaptation.
Strong proficiency in Python and deep learning frameworks such as PyTorch.
Experience working with transformer-based language models.
Practical experience with SFT, instruction tuning, or CPT workflows.
Experience in building and training Large Language Models (LLMs) from scratch, including tokenizer design, data curation, pretraining (CPT), and scaling strategies.
Proven experience experimenting with LLM architectures and training strategies (e.g., hyperparameter tuning, curriculum learning, alignment techniques), and evaluating model performance across diverse benchmarks
Understanding of distributed training concepts and GPU-based workloads.
Experience deploying trained models into production environments
Strong foundation in data modeling, microservices and event-driven architectures.
Demonstrated experience integrating customer and network analytics systems for business insights.

Knowledge, Skills, Abilities, Competencies

Deep understanding of LLM training dynamics and token-level behavior.
Experience with data preprocessing, tokenization, and curriculum design.
Familiarity with LLM evaluation methodologies (BLEU, ROUGE, perplexity, task-based evals, human-in-the-loop).
Knowledge of parameter-efficient tuning methods (LoRA, adapters, QLoRA) and trade-offs vs full fine-tuning
Understanding of model safety, bias, alignment techniques (RLHF/DPO) and domain alignment challenges.
Strong problem-solving skills and attention to model quality.
Ability to collaborate effectively across research, engineering, and product teams.
Ability to translate technical outcomes into measurable business and customer experience impact.

What We Offer:

Opportunity to work on cutting-edge AI/ML, Gen AI technologies.
Ownership of impactful projects from end to end.
Collaborative and intellectually stimulating environment.

Rakuten Shugi Principles:

Our worldwide practices describe specific behaviours that make Rakuten unique and united across the world. We expect Rakuten employees to model these 5 Shugi Principles of Success.
Always improve, always advance. Only be satisfied with complete success - Kaizen.
Be passionately professional. Take an uncompromising approach to your work and be determined to be the best.
Hypothesize - Practice - Validate - Shikumika. Use the Rakuten Cycle to success in unknown territory.
Maximize Customer Satisfaction. The greatest satisfaction for workers in a service industry is to see their customers smile.
Speed!! Speed!! Speed!! Always be conscious of time. Take charge, set clear goals, and engage your team.