The Cornell Wildlife Health Lab (CWHL) at Cornell University’s College of Veterinary Medicine (Ithaca, NY) aims to protect wildlife populations by developing science-based solutions to wildlife health and wildlife management challenges. Working at the intersection of wildlife ecology, veterinary medicine, and data science, our multidisciplinary team conducts disease surveillance, engages in collaborative research, develops new diagnostic tools, and creates training and educational materials for the public and wildlife professionals.
The CWHL is currently leading the Surveillance Optimization Project for Chronic Wasting Disease (SOP4CWD), a regional effort to develop a suite of tools to guide chronic wasting disease (CWD) surveillance and management actions. The project uses traditional and novel techniques from data and statistical science to generate data-driven and scientifically sound recommendations to wildlife management agencies in states and provinces across North America.
The Data Engineer will join the SOP4CWD and work in collaboration with web developers, data scientists, ecologists, and wildlife professionals to lead the development of a data warehouse and data pipelines that integrate state wildlife agency data resources, statistical models, and web applications for participating wildlife agencies. The Data Engineer will prepare data for and run large analysis processes written in R, Python, and other languages. The Data Engineer will transition current local process to cloud services. The Data Engineer will be responsible for developing and maintaining system metadata and help documentation. The Data Engineer will develop, test, code, and maintain frequently updated data resources provided by wildlife agencies. The goal of SOP4CWD is to develop software and data products that ensure surveillance data and decision support tools are available and accessible to wildlife agencies. The Data Engineer is expected to play a significant role in ensuring the success of the project in this regard. Software produced will include but are not limited to R and Python.
The Data Engineer is expected to:
- Serve as lead on the design of a CWD surveillance data warehouse
- Design, build and run efficient data engineering pipelines to move data between wildlife agency data systems, the data warehouse, analysis processes, and web application
- Edit existing R code to reduce runtime through parallelization and/or modularization of code and/or through other modifications of codes as appropriate and/or applicable.
- Work closely and collaboratively with Principal Investigators (PIs), CWHL staff, and wildlife agency staff to define, map, and engineer data transfer streams from current platforms.
- Be highly self-motivated, flexible, and able to develop innovative solutions to technically and logistically complex problems in a highly collaborative working environment.
- Takes ownership of assigned projects and drives them to completion in a timely manner.
- Ensures that projects meet technical requirements and comply with Data Use Agreements (DUAs) and security standards.
- Bachelor’s degree in Computer Science, Data Science, Statistics, or other relevant technical field
- Experience with database software (such as Postgres, Oracle, Sql Server, and/or MySql) and cloud computing services (such as Google Cloud, Azure, and AWS)
Experience developing, managing, and automating software in a data science or scientific project setting
- Experience building distributed, reliable data pipelines that ingest and process data
- Experience with scientific computing languages R and/or Python
- Proof of professional liability insurance is required
- Master’s degree in Computer Science, Data Science, Statistics or other relevant technical field
- An interdisciplinary background in ecology, natural resources, epidemiology, mathematics, computer science, statistics, data management,
- Background in wildlife disease ecology, wildlife management, environmental science, natural resource management, or other related fields
- Experience collaborating with data science and/or programming teams on software projects
- Proven experience writing application requirements and architecting applications.
- Experience building distributed, reliable data pipelines that ingest and process data at scale.
- Prior experience designing/deploying relational database schemas.
- Knowledge of the development/maintenance of databases, data storage, processing, and APIs.
- Knowledge of data security, including the appropriate handling of sensitive data, and the maintenance of data privacy throughout all computations and workflows.
- Understanding of ecological mathematical modeling, including but not limited to agent-based or population-based ecological models.
- Experience developing, automating, and running R and/or Python statistical or scientific applications
- Proficiency in R Shiny
- Proficiency in Net Logo and/or the running of computational simulations as an experimental methodology.
- Experience with web application development, user interface design, and web development
- Experience working with or for state or federal wildlife agencies
- Dedication to a career track in wildlife and/or natural resources management.
Knowledge, Skills and Abilities
- Ability to write well-abstracted, reusable code components.
- Outstanding communication, collaboration and project management skills.
- Exceptional multi-tasking ability with exceptional accountability and follow-through
- Demonstrable skills in problem solving, critical thinking, and written and verbal communication.