Scaling decisions & improving the quality of decisions -- a deep dive on data science & automation in insurance
With Luca Baldassarre, PhD & Lead Data Scientist at Swiss Re
Key Takeaways
Historically, much of the automation/augmentation in insurance has occured in consistent, structured types of risk (e.g. auto). At a reinsurer like Swiss Re, risk is complex, inconsistent, and global in scale; LLMs are a major unlock. And this is just one of many learnings from Luca.
Topics Covered
- Luca's background PhD and two post-docs, with a focus on the core math of machine learning
- Transitioned into startup for 3.5 years building systems to analyze aerial imagery
- Then four years ago joined Swiss Re, a large global reinsurance company, and have since been working to augment our products with data & AI
- Background on Swiss Re's business
- Difference between insurance (e.g. home insurance, dental, auto) what we call "personal lines" and reinsurance
- Some of these insurers need to offset or hedge their risks by "ceding" risk to a reinsurer, which has a much broader and diverse global portfolio, so if an earthquake strikes California, it can be compensated by earthquakes not happening in Japan at exactly the same time
- Swiss Re is structured along 5 major business units
- Life & Health Reinsurance
- P&C (property & casualty) Reinsurance, which includes commercial properties, natural disasters, etc.
- Global Clienting Solutions, which caters to our largest global clients, through which we provide services/solutions for fees
- Public Sector Solutions, which is a mix of solutions, but also risk transfer for public entities, so e.g. the Government of California wants to get insurance to cover the losses of earthquakes that are not covered by private insurance
- Corporate Solutions, which is a primary insurance business that caters to bespoke products for large companies
- FInally we have a smaller homegrown company that builds whitelabeled insurance technology platforms that caters to B2B2C
- Sitting on top of the business units and the "group functions" -- I sit in digital & tech, then you have HR, legal & compliance, risk management, etc.
- What are the primary functions in a given business line?
- Product development, figuring out new products or adjustments to existing products in e.g. flight delay insurance, pet insurance, mobile phone insurance
- Distribution is another major area, but it's less important for us because we are B2B; the primary insurers who are our clients are focused on the distribution/marketing of products -- that's the bulk of their activities
- Underwriting -- there needs to be an underwriter or underwriting system that assesses the risks and determines if its a risk we want to insure and at what price
- Underwriting is linked tightly to costing & pricing -- this is where the actuaries are doing a lot of work in making cost estimates of different types of risk & coming up with a market competitive price
- Then claims... you need to look at all claims in a manual or automated manner, and compensate/adjust for each of them
- Then there are auxilary functions around contracts management, and advisory roles... this is becoming a key differentiator for reinsurers, guiding companies into more healthy behavior, or otherwise better understanding risk to get coverage where you need it
- Where's the bulk of the headcount in the business?
- For primary insurance, there's a lot of effort into distribution & sales -- it's very competitive, particularly for standardized product lines
- It depends on size... some insurance companies are just carriers, distribution pre-defined products, applying underwriting rules, maybe they leverage reinsurance to do underwriting and claims settlement
- The largest primary insurers have their own R&D, underwriting, and claims teams, so there the heacount is way more distributed across different functions
- What does a claim between an insurer and reinsurer look like?
- There are two types of reinsurance
- Treaty -- where your primary insurer cedes a whole portfolio of risk, for example all their auto policies in a given region, which may include thousands, or even millions of policies... if there is a natural catastrophe event, e.g. a hail storm, the reinsurer will step in if the insurer has a cash deficit, and in that case we will assess the event
- Facultative, or "single risks" -- e.g. a primary insurer wants to insure a large manufacturing plant and its too big of an individual risk, so they want to split it apart; often in this case the insurer need the reinsurers' expertise to underwrite and evaluate the claims
- Mailboxes through which we receive submissions from brokers via email, with documents attached the describe the property or individual live; that's still a big chunk of software we use
- Specific software around underwriting, which is effectively a database, but that helps with underwriting decisions, and in more standard risk will do the pricing decisions
- Similarly for claims, there's a system that ingests them, and ultimately puts them into data lakes & warehouses
- There are two types of reinsurance
- When did we start transitioning from paper pushing into the first wave of digitization? re are different timelines depending on the product; very off the shelf/standard products were digitized first, really around the internet enabling people to purchase them
- Fair to say that Swiss Re is involved primary in less standard types of risk given focus on reinsurance and complex primary insurance? And therefore less automated today?
- Fair statement, but also what we've been addressing... speed matters & we'd like to move more quickly without compromising on underwriting and pricing
- More complex lines is where digitization is struggling; there so much heterogeneity in the data sources and types of risk
- What's the role of your team?
- Two categories -- scaling decisions & improving quality of decisions -- both depend on the quality of the data we get in, much of which is unstructured in documents
- That's really the biggest challenge for primary and reinsurance; it's a document & information based system
- Things come through normal documents, transcriptions, phone calls, and often in these complex risks, you don't have claims coming in every day, but you do have a very large portfolio over a very long time horizon you can leverage
- That said, we can get data from primary insurers, so we can e.g. get information on auto policies, but this is an easy problem
- Let's think more about insuring flights/airplanes -- there are thousands of flights every day, but there are very few plane crashes -- how do you insure these tail risks, very large, but very rare events?
- How do you address this problem of underwriting/pricing tail risk?
- Alternative datasets can help, but you still need to correlate them & number of events still limits you; this is where we're trying to really integrate/blend human judgement and expertise with data-driven approaches
- Purely data driven approaches for rare events tend to fail; underwriting at some point for this very complex risk becomes a bit of an art
- Breaking down data extraction into 1) extracting known data fields into structured systems 2) extracting information on a bespoke/one-off basis
- The first one is not perfectly solved, but it's much better solved in these structured, well-understood products
- Though in providing consumer flexibility e.g. "take a picture of your car to submit your auto claim", you need to deal with some unstructured data
- Identifying fraud here is also a huge issue; it's very possible now that consumers will use e.g. Stable Diffusion to create deepfaked images of their car, to add damage, and then submit that as a claim
- Is there still work required on the document OCR / ingestion piece?
- Yes because these documents can be crazily heterogeneous and complicated; there might be information about multiple pages, tables split across pages, in e.g. medical reports you may get a bundle but you only care about one page noting the cause of death
- On category 2, extracting signal fr om unstructured data, what opportunities do you see there?
- This is where recent advances with LLMs and genAI seem to be opening a breach
- We get submissions for very complex risk describining the manufacturing plant or different sites; these documents can run in the hundreds of pages and the underwriter may be looking for something very specific
- There's a real risk of fatigue reading through 200 pages
- The first one is not perfectly solved, but it's much better solved in these structured, well-understood products
- Have you observed post ChatGPT that your colleagues were starting to use these tools organically? Or have you tried to disallow this behavior?
- People are experimenting, but within a very conservative boundary
- Some of the things we've tried within these boundaries are quite exciting
- How exactly have you seen this tech changing workflows?
- Technology should be there to augment decision-making, not replace it; humans will be in the loop; GPT should not decide if you should receive health insurance or life insurance
- We just want the models to help humans find information they need, with citations
- How have you thought about implementing these models? What challenges have you run into?
- One reason the models are so powerful is because they can address several tasks; you don't need to maintain multiple models for e.g. contracts, engineering reports, etc.; this reduces the "MLOps" effort
- Fine-tuning necessitates you maintaining multiple fine-tuned versons, and at this point the return of doing that is not so clear
- You get similar advantages just by doing proper prompts
- Are these tools something you'd like to buy externally or build in-house?
- It's a challenging question; there's not a general recipe
- Building comes with risk because you can't just build, you need to think about security aspects, and broader workflow -- you need the broader system, not just the data science piece
- For smaller companies, it's easier to just buy; often times insurers will go through us to find these capabilities
- Important to think about what our advantage is, particularly when you have so much available in the open source community
- You also want to be careful about creating too much dependency with a single vendor
- DocQA players today really just look at extracted text sequentially; have you seen anything with more of a visual or spatial understanding?
- LLMs we've used today are just looking at a very large sequence of words, but we've spoken to a couple emerging vendors that have the visual understanding of the layout of a document, which is very important
- Do you think that the document-based Q&A format will hold as the long-term form factor?
- It's both this and the structured data extraction; both will exist
- There are always going to be these complex cases that require human reasoning & in these cases the doc Q&A format is very helpful
- Zooming out from doc data extraction in underwriting/claims, are there insights you can add in other areas e.g. new product development?
- Two underlying questions... 1) how do we use AI/ML to do product development and 2) how do we feed back insights from claims/writing into the products?
- On #2, you go through this review process on annual or biannual basis to fine-tuning parameters in your costing models. This is where you factor in changes in e.g. population longevity, safety standards in cars, etc.
- Often times you are using some sort of data science on claims or additional data sets to better understand trends; it is helpful here
- On #1, where you don't have prior information / it's a new product, it's often just more standard market sizing / marketing research
- The question is ultimately "does this new data source provide an edge, relative to its cost"
- Often times we identify new data sources that are easier to tap into that are cheaper or allow an easier customer experience (reducing questions from 20 to 10 when submitting an application)
- On #2, you go through this review process on annual or biannual basis to fine-tuning parameters in your costing models. This is where you factor in changes in e.g. population longevity, safety standards in cars, etc.
- Two underlying questions... 1) how do we use AI/ML to do product development and 2) how do we feed back insights from claims/writing into the products?
- Are there other major categories or excitement or time investment for your team that we should be talking about?
- Governance is very important; not everything you can do should be done
- Key that we understand regulatory landscape, civil society sentiment; it's key that these models have a positive impact for the end clients (the policy holders)