We’re seeking a skilled Data Operations Tech Lead to join our team!
We are a data-driven organization building and operating a growing analytics and data platform that supports critical business decisions. Our Data Operations team plays a key role in ensuring data pipelines are reliable, observable, and scalable as we evolve from legacy, SQL-centric systems to a modern cloud-based data stack. We value operational excellence, continuous improvement, and collaboration across engineering, platform, and product teams.
We are seeking a Data Operations Tech Lead to help lead the evolution of our Data Operations team as it expands from supporting legacy SQL-based pipelines to also operating and supporting modern data pipelines built on Airflow and Spark.
This role is ideal for a technically strong, operationally minded leader who has experience running data pipelines at scale and enjoys building reliability, observability, and automation. You will serve as both a hands-on technical leader and a mentor, helping a team with strong SQL experience successfully transition to a modern data stack.
You will play a key role in defining how we monitor, triage, and resolve Tier 1 / Tier 2 data pipeline issues, while continuously improving the stability and operability of our platform.
Does this sound like the right opportunity to explore?... Lets connect!
What You’ll Do
Technical Leadership & Operations
- Act as the technical lead for Data Operations, owning the operational readiness of modern data pipelines built on Airflow, Spark, and cloud data infrastructure
- Lead incident triage and resolution for Tier 1 and Tier 2 pipeline issues (data delays, job failures, data quality alerts, SLA breaches)
- Establish clear runbooks, escalation paths, and operational best practices for pipeline support
- Partner with Data Engineering to influence pipeline design with operability, observability, and supportability in mind
Monitoring, Reliability & Automation
- Design and improve monitoring, alerting, and dashboards for data pipelines and workflows
- Implement automation to reduce manual intervention (auto-retries, self-healing workflows, standardized alerts)
- Drive root cause analysis (RCA) and post-incident reviews to prevent recurring issues
- Continuously improve pipeline reliability, performance, and cost efficiency
Team Enablement & Mentorship
- Mentor and upskill team members transitioning from SQL Server–centric workflows to Airflow, Spark, ELK stack, and distributed data systems
- Create learning paths, documentation, and hands-on guidance for modern data tooling
- Lead by example with hands-on troubleshooting, debugging, and operational support
- Help establish a culture of ownership, quality, and continuous improvement within Data Operations
Cross-Functional Collaboration
- Work closely with Data Engineering, Platform, and Product teams to align on priorities and operational expectations
- Serve as a bridge between legacy data systems and the modern data platform during the transition period
- Provide feedback on operational gaps, tooling needs, and process improvements
What you Bring
- 8+ years of experience in data engineering, data platform operations, or data reliability roles
- Hands-on experience operating data pipelines built on Airflow (or similar orchestrators) and Spark
- Strong understanding of distributed data systems, batch processing, and failure modes at scale
- Solid SQL skills and experience working with relational databases (e.g., SQL Server, Postgres)
- Proven experience supporting production data pipelines with SLAs and on-call responsibilities
- Experience with one of the Cloud platforms (AWS, GCP, or Azure)
Operational & Automation Mindset
- Experience building monitoring, alerting, and incident response processes for data systems
- Strong troubleshooting skills across orchestration, compute, and data layers
- Passion for automation and reducing toil through tooling and process improvements
- Ability to prioritize operational stability while enabling team velocity
Leadership & Communication
- Experience leading or mentoring engineers in an operational or support-focused environment
- Ability to explain complex distributed system issues in clear, practical terms
- Comfortable working with teams of varying technical backgrounds
- Strong documentation and knowledge-sharing habits
- Familiarity with data quality frameworks, lineage, or observability tools
- Experience in healthcare, fintech, or other regulated environments
- Exposure to data reliability engineering (DRE) or SRE practices applied to data platforms
What We Offer
- Work from anywhere in the US! Machinify is digital-first.
- Full Medical/Dental/Vision for employees & their families
- Flexible and trusting environment where you’ll feel empowered to do your best work
- Competitive salary, equity, 401(k) including employer match
The salary for this position is based on an array of factors unique to each candidate: Such as years and depth of experience, set skills, certifications, etc. The base salary range for this role is $200k-$250k. We are hiring for different levels, and our Recruiting team will let you know if you qualify for a different role/range. Salary is one component of the total compensation package, which includes meaningful equity, excellent healthcare, flexible time off, and other benefits and perks.