Data Operations Tech Lead

Machinify • Remote - US • 2w ago

We’re seeking a skilled Data Operations Tech Lead to join our team!

We are a data-driven organization building and operating a growing analytics and data platform that supports critical business decisions. Our Data Operations team plays a key role in ensuring data pipelines are reliable, observable, and scalable as we evolve from legacy, SQL-centric systems to a modern cloud-based data stack. We value operational excellence, continuous improvement, and collaboration across engineering, platform, and product teams.

We are seeking a Data Operations Tech Lead to help lead the evolution of our Data Operations team as it expands from supporting legacy SQL-based pipelines to also operating and supporting modern data pipelines built on Airflow and Spark.

This role is ideal for a technically strong, operationally minded leader who has experience running data pipelines at scale and enjoys building reliability, observability, and automation. You will serve as both a hands-on technical leader and a mentor, helping a team with strong SQL experience successfully transition to a modern data stack.

You will play a key role in defining how we monitor, triage, and resolve Tier 1 / Tier 2 data pipeline issues, while continuously improving the stability and operability of our platform.

Does this sound like the right opportunity to explore?... Lets connect!

What You’ll Do

Technical Leadership & Operations

Act as the technical lead for Data Operations, owning the operational readiness of modern data pipelines built on Airflow, Spark, and cloud data infrastructure

Lead incident triage and resolution for Tier 1 and Tier 2 pipeline issues (data delays, job failures, data quality alerts, SLA breaches)

Establish clear runbooks, escalation paths, and operational best practices for pipeline support

Partner with Data Engineering to influence pipeline design with operability, observability, and supportability in mind

Monitoring, Reliability & Automation

Design and improve monitoring, alerting, and dashboards for data pipelines and workflows

Implement automation to reduce manual intervention (auto-retries, self-healing workflows, standardized alerts)

Drive root cause analysis (RCA) and post-incident reviews to prevent recurring issues

Continuously improve pipeline reliability, performance, and cost efficiency

Team Enablement & Mentorship

Mentor and upskill team members transitioning from SQL Server–centric workflows to Airflow, Spark, ELK stack, and distributed data systems

Create learning paths, documentation, and hands-on guidance for modern data tooling

Lead by example with hands-on troubleshooting, debugging, and operational support

Help establish a culture of ownership, quality, and continuous improvement within Data Operations

Cross-Functional Collaboration

Work closely with Data Engineering, Platform, and Product teams to align on priorities and operational expectations

Serve as a bridge between legacy data systems and the modern data platform during the transition period

Provide feedback on operational gaps, tooling needs, and process improvements

What you Bring

8+ years of experience in data engineering, data platform operations, or data reliability roles

Hands-on experience operating data pipelines built on Airflow (or similar orchestrators) and Spark

Strong understanding of distributed data systems, batch processing, and failure modes at scale

Solid SQL skills and experience working with relational databases (e.g., SQL Server, Postgres)

Proven experience supporting production data pipelines with SLAs and on-call responsibilities

Experience with one of the Cloud platforms (AWS, GCP, or Azure)

Operational & Automation Mindset

Experience building monitoring, alerting, and incident response processes for data systems

Strong troubleshooting skills across orchestration, compute, and data layers

Passion for automation and reducing toil through tooling and process improvements

Ability to prioritize operational stability while enabling team velocity

Leadership & Communication

Experience leading or mentoring engineers in an operational or support-focused environment

Ability to explain complex distributed system issues in clear, practical terms

Comfortable working with teams of varying technical backgrounds

Strong documentation and knowledge-sharing habits

Familiarity with data quality frameworks, lineage, or observability tools

Experience in healthcare, fintech, or other regulated environments

Exposure to data reliability engineering (DRE) or SRE practices applied to data platforms

What We Offer

Work from anywhere in the US! Machinify is digital-first.

Full Medical/Dental/Vision for employees & their families

Flexible and trusting environment where you’ll feel empowered to do your best work

Unlimited FTO

Competitive salary, equity, 401(k) including employer match

The salary for this position is based on an array of factors unique to each candidate: Such as years and depth of experience, set skills, certifications, etc. The base salary range for this role is $200k-$250k. We are hiring for different levels, and our Recruiting team will let you know if you qualify for a different role/range. Salary is one component of the total compensation package, which includes meaningful equity, excellent healthcare, flexible time off, and other benefits and perks.