Lead Data Scientist - Model Validation


Date: Apr 13, 2019

Location: New York, NY, US

Company: New York Life Insurance Co


A career at New York Life offers many opportunities. To be part of a growing and successful business. To reach your full potential, whatever your specialty. Above all, to make a difference in the world by helping people achieve financial security. It’s a career journey you can be proud of, and you’ll find plenty of support along the way. Our development programs range from skill-building to management training, and we value our diverse and inclusive workplace where all voices can be heard. Recognized as one of Fortune’s World’s Most Admired Companies, New York Life is committed to improving local communities through a culture of employee giving and service, supported by our Foundation. It all adds up to a rewarding career at a company where doing right by our customers is part of who we are, as a mutual company without outside shareholders. We invite you to bring your talents to New York Life, so we can continue to help families and businesses “Be Good At Life.” To learn more, please visit LinkedIn, our Newsroom and the Careers page of www.NewYorkLife.com.


The Center for Data Science and Artificial Intelligence is the innovative corporate Analytics group within New York Life. We are a rapidly growing entrepreneurial department which aims to design, create and offer innovative data-driven solutions for many parts of the enterprise. We are aided by New York Life’s existing business with a large market share in individual life insurance. We have the freedom to explore external data sources and new statistical techniques, and are excited about delivering a whole new generation of Analytical solutions.


In fact, we are designing and will build one of the first multivariate model-based continuous risk differentiations in the industry. This model will incorporate current underwriting best practices (including medical rules) as features and add other data sources, patterns/ideas and variables to essentially create a rating plan to support the next generation underwriting process at New York Life. This is just one of several projects with large business value. Geographic analytics on agents and customers, application fraud detection, agent success prediction and client prospecting analytics (off-line and on-line) are other exciting examples of enormous incremental value from analytics. Our products will be implemented into real-time core business processes and decisions that drive the company (e.g. underwriting, pricing, agent recruiting, prospecting, new product development).


We work with data ranging from demographics, credit and geo data to detailed medical data (medical test results, diagnosis, prescriptions) and social media information. We have a modern computing environment with a solid suite of data science/modeling tools and packages, and a large (but manageable) group of well-trained professionals at various levels to support you. Life insurance is on the verge of huge change. This is a chance to be part of, actually to drive, the transformation of an industry. Is this not why we became data scientists?

These models must be validated to ensure they effectively address the business needs and that assumptions are thoroughly understood. This position sits on the first line. The Model Validator should be able to understand a wide range of models to effectively challenge the model development process by assessing: the suitability of the chosen methodology, that the proper testing has been performed, alternative modeling approaches to the one proposed, the most adequate metrics for model accuracy and robustness, that the model has been properly implemented and that the model properly monitored. All these activities are performed and delivered in an environment that promotes the creation of value and constant improvement cycle.

You will apply your highly developed analytical skills to validate models that touch on all aspects of the life insurance value chain, ranging from risk models, fraud detection, process triaging, and marketing predictions to a variety of other analytics solutions. You will apply your high energy level, communication skills and business sense to communicate with model developers and internal stakeholders.


  • Validate statistical/machine learning models. The validation activities include assessment of: data sources, data quality, robustness of methodology, alternative modeling approaches, model testing, model implementation, and model monitoring. 
  • Prepare the validation report; this document details the validation work and provides recommendations to assess issues encountered during the model validation activities.
  • Communicate with model developers and relevant stakeholders to gain adequate understanding of all aspects of the modeling process.
  • Communicate with business partners and stakeholders to understand the business needs, data limitations, etc.
  • Determine the most appropriate testing for a given modeling approach.
  • Utilize data wrangling/data matching/ETL techniques while programming in several scripting languages to explore a variety of data sources, gain data expertise, replicate the modeling process and develop challenger models.
  • Perform code review.
  • Ensure that the validation projects are completed on time.
  • Ensure that validated models comply with regulatory and privacy requirements.
  • Keep up-to-date with regulatory and legal requirements.
  • Train and coach model developers on data governance and model validation activities.
  • Present findings and recommendations to different stakeholders like model developers, business partners, Enterprise Risk Management, and Audit.
  • Travels to events as needed (< 10%).


Required qualifications

  • Graduate-level degree with concentration in a quantitative discipline such as statistics, computer science, mathematics, economics, or operations research OR Fellowship in one of the Actuarial Societies (CAS or SOA).
  • 5+ years of experience with predictive analytics in an insurance context using large and complex datasets.
  • Experience in development of models with a focus on predictive models. Expertise in modeling techniques such as linear regression, logistic regression, survival analysis (Cox proportional hazard models), Generalized Linear Models (GLM), Robust GLM, regularization techniques (Ridge, Lasso, ElasticNet), decision tree-based models (Random Forests and GBM), cluster analysis, and Principal Component Analysis (PCA).
  • Model validation of models with a focus on predictive models. 
  • Expertise in performing data wrangling, data matching, and ETL techniques while programming in several languages (R, Python, SAS, SQL and Spark) to extract and transform data from a variety of data sources (Oracle, SQL, Hadoop).
  • Expertise in performing variable selection, feature creation (transformation, binning, high level categorical reduction, etc.) and model validation and testing (hold-outs, CV, bootstrap).
  • Expertise in performing outlier detection, robust statistical modeling (e.g. rank based regression), design and analysis of experiments, hypotheses testing, convex and non-convex optimization and partial least squares regression
  • Expertise in performing data visualization using R Shiny, Spotfire or Tableau.
  • Programming in R, Python, SPARK, SAS, and SQL.
  • Familiarity with GitHub/GitLab code sharing/collaboration tools.
  • Experience interfacing with business partners: data governance, regulators, audit, etc.


Location: Manhattan (midtown, walking distance from Penn Station and Grand Central). Relocation is available but remote work is not possible.



If you have difficulty using or interacting with any portions of this Web site due to incompatibility with an Assistive Technology, if you need the information in an alternative format, or if you have suggestions on how we can make this site more accessible, please contact us at: (212) 576-5811.

Job Segment: Scientific, Underwriter, Database, Developer, Social Media, Engineering, Insurance, Technology, Marketing