Senior Data Engineer - Center of Data Science


Date: Aug 10, 2019

Location: New York, NY, US

Company: New York Life Insurance Co


A career at New York Life offers many opportunities. To be part of a growing and successful business. To reach your full potential, whatever your specialty. Above all, to make a difference in the world by helping people achieve financial security. It’s a career journey you can be proud of, and you’ll find plenty of support along the way. Our development programs range from skill-building to management training, and we value our diverse and inclusive workplace where all voices can be heard. Recognized as one of Fortune’s World’s Most Admired Companies, New York Life is committed to improving local communities through a culture of employee giving and service, supported by our Foundation. It all adds up to a rewarding career at a company where doing right by our customers is part of who we are, as a mutual company without outside shareholders. We invite you to bring your talents to New York Life, so we can continue to help families and businesses “Be Good At Life.” To learn more, please visit LinkedIn, our Newsroom and the Careers page of


New York Life, the largest writer of retail life insurance in the U.S. and a top player in annuities, long-term care and mutual funds, is seeking a Data Engineer in its Center for Data Science and Analytics.

The company has over 150 years of history and while usable data does not quite go back this far, we have a wealth of internal information on consumers, policies and their performance, as well as applicants, prospects and our 10,000 agents. We also have a multitude of external data from a great variety of sources. New York Life is likely the most data-rich company in the life insurance industry. Analytical challenges range from mortality risk (with a number of both medical and non-medical components) to agent recruiting decisions, consumer analytics (segmentation, response, conversion, retention, up-sell), fraud detection and digital advertising placement.


The Center for Data Science and Artificial Intelligence is the innovative corporate Analytics group within New York Life. We are a rapidly growing entrepreneurial department, which aims to design, create and offer innovative data-driven solutions for many parts of the enterprise. We are aided by New York Life’s existing business with a large market share in individual life insurance. We have the freedom to explore external data sources and new statistical techniques, and are excited about delivering a whole new generation of Analytical solutions.


In fact, we are designing and will build one of the first multivariate model-based continuous risk differentiations in the industry. This model will incorporate current underwriting best practices (including medical rules) as features and add other data sources, patterns/ideas and variables to essentially create a rating plan to support the next generation underwriting process at New York Life. This is just one of several projects with large business value. Geographic analytics on agents and customers, application fraud detection, agent success prediction and client prospecting analytics (off-line and on-line) are other exciting examples of enormous incremental value from analytics. Our products will be implemented into real-time core business processes and decisions that drive the company (e.g. underwriting, pricing, agent recruiting, prospecting, new product development).


We work with data ranging from demographics, credit and geo data to detailed medical data (medical test results, diagnosis, prescriptions) and social media information. We have a modern computing environment with a solid suite of data science/modeling tools and packages, and a large (but manageable) group of well-trained professionals at various levels to support you. Life insurance is on the verge of huge change. This is a chance to be part of, actually to drive, the transformation of an industry.


You will be part of Data & Platform sub-function team under Center for Data Science and Artificial Intelligence. The Data & Platform team services internally to Data Scientists who focus on statistical analysis.


You will be part of a fast paced, high-impact team who will work with an entrepreneurial mindset using some of the best of breed tools as part of our enterprise data lake (Hadoop) using R, Spark and Python.


You will apply your data engineering skills to design, develop and enhance data strategy across and within the data science domain.  This role provides strategic support to internal teams and leads the design, build and implementation of model ready data pipelines.  Experience in a fast paced data engineering role with multiple and complex data sets.



  • Provide strategy and guidance for the architecture and build of data ingestion jobs to process disparate and unique data sources to form a high integrity model ready dataset.
  • A strong communicator. Whether it’s explaining code to a peer, documenting a system for the team, presenting an idea to stakeholders, or speaking clearly and persuasively in a positive and negative situation, you are articulate and engaging.
  • Mentor a team that designs, develops, troubleshoots and debugs programs for databases, applications, tools, networks, etc.
  • Interface with technology and other data teams to manage data across wide variety of in-house and 3rd party data sources.
  • Action oriented and comfortable working with complex data.
  • Functions as data expert, contributes to analytics/solutions design and productizing decisions.
  • Proactively address and resolve technical issues to support key business initiatives.
  • Perform ad-hoc analysis and respond to data/analytical requests.
  • Be a constant learner, consider how advancements in big data tooling and patterns can improve and transform our operations.


Required qualifications

  • Graduate-level degree in computer science, engineering, or relevant experience in the field of Business Intelligence, Data Mining, Database Engineering, Programming
  • 8-10 years of overall experience working in the field of data wrangling and programming with a minimum of 2 years’ experience with ingesting, cleaning, merging and applying necessary data wrangling logic in Hadoop
  • Excellent command of SQL – best practices, optimization, troubleshooting and debugging
  • Fluency using Python for data related work (e.g. Numpy, Pandas, PySpark). Exposure to Git/GitLab
  • Strong Knowledge of enterprise platforms, Cloud technology and high-performance computing.
  • Experience working in a Linux environment
  • Experience building Exploratory Data Analysis reports such as Histograms, Box plots, Pareto, Scatter Plot using R, Python or a Data Visualization tool or package (e.g. Tableau, Spotfire)
  • Understanding of statistical modeling concepts, designs and analytics-based products
  • Any experience in using ETL tools such as Informatica, Pentaho, Ab Initio, Talend
  • Any experience working with Data Warehouses and/or Data Marts
  • Any experience working with Salesforce
  • Any experience in Life Insurance business


Other Notes:

Our technology stack is Enterprise Data Lake (using Hortonworks Hadoop Data Platform), Hive, Oracle, Python, Spark, RStudio Pro, IDQ, Trifacta, PySpark, SparkR, Linux, SAS



If you have difficulty using or interacting with any portions of this Web site due to incompatibility with an Assistive Technology, if you need the information in an alternative format, or if you have suggestions on how we can make this site more accessible, please contact us at: (212) 576-5811.

Job Segment: Database, Underwriter, Engineer, Social Media, Oracle, Technology, Insurance, Engineering, Marketing