San Francisco Metro Area, CA
If you are passionate about data, passionate about biology, and passionate about their intersection this is the job for you.
- Work with computational and research scientists to understand common analysis use cases and data access needs.
- Design strategies for data storage and integration across different data sources (both internal and external) for multiple use cases.
- Implement, document, and maintain processing pipelines, databases, and data warehouse infrastructure.
- Work closely with full-stack engineers to develop APIs and GUIs for accessing and visualizing scientific data.
- Set data engineering vision and drive both independent and collaborative software development projects end-to-end.
- Contribute to a range of projects, from one-off solutions to long-term, complex systems.
- Build out core infrastructure, tooling, and software development processes.
- 5+ years working with contemporary ETL tools and frameworks.
- 3+ years building Python-based back-end systems.
- Fluent knowledge of SQL.
- Experience implementing RESTful APIs, GraphQL, and other programmatic interfaces to complex multidimensional data.
- Experience deploying high-performance data back-ends in the cloud with Amazon Web Services, Heroku, Google Cloud Platform, or a similar service.
- Firm grasp on software testing and test-driven development.
- Demonstrated success in owning projects end-to-end, including working with non-technical stakeholders to define requirements and seek feedback.
- Worked with machine learning tools and infrastructure, e.g. TensorFlow and PyTorch.
- Built back-ends for high-dimensional graph or network data.
- Worked in biology or life sciences, and have familiarity with databases and data types used by computational biologists.
- Built software with technologies like ElasticSearch, GraphQL, and Google Cloud Platform.