VeracityID: Stopping fraud before it starts: August 2018

Monday, August 27, 2018

Insurers can't stop fraud and misrepresentation with only data - Part 1

Insurance carriers use third party data to validate the information auto insurance customers are submitting to get a quote because getting the details wrong about rate-able factors almost guarantees that a policy will be a loser. So carriers use purchased data to test what the customer provides.

While useful, this approach has several pitfalls that can lure carriers into a false sense of security:

The Problem of Data Errors. All data sets have errors: mis-keys, lagging data, missing data, and so on. In the real world, a data set with 95% accuracy is almost unheard of. And third party data sometimes combines multiple data sources which makes things even worse. For example a head of household data set may be merged with a college marketing data set to identify young drivers living in a household. When data vendors create this type of 'synthetic data' the errors multiply. For example imagine a scenario where two nearly perfect (95% accurate) data sets are being combined to create synthetic data. Combining them drives their new 'Synthetic' data accuracy down to about 90% (.95 * .95). Combine three data sets and it falls even further.

Consumer errors. Most consumers shop for insurance online, even if they eventually use an agent to complete the transaction. This means that consumer errors from mis-keys, misunderstandings and carelessness are frequently introduced into the process. Errors that serve to magnify third party data's error problems.

Fraud and rate manipulation are low probability events. There are many ways that consumers can manipulate data to reduce their premiums or get payments they don't deserve but each of these taken in isolation is a small probability event, a few percent at most. With purchased data having error rates of at least 5 and typically 10% and with consumers making mistakes, the overall number of both false positive and false negative errors on any given data diagnostic can easily equal five or ten times the number of true positives.

So you can see how impractical it is to use purchased data alone to judge whether a specific customer has submitted accurate data. More work must be done to validate the initial data diagnosis. And critically, most of this work must be done immediately, during the quote session.

It is important to note that this problem isn't particular to insurance. Indeed, if you follow medical research you'll constantly hear "studies show" that this or that food item or activity is healthy or deadly. If you live long enough, you'll hear it described both ways. Using large, complex data sets to diagnose small probability events is inherently difficult, so difficult that government regulators won't allow drugs, diagnostics or medical devices to be marketed simply based upon this form of 'epidemiological' analysis. They require double blind controlled experiments.

Which isn't really possible when quoting insurance online. So what can insurance carriers do to reduce data diagnostic error rates sufficiently to detect, deter and defeat fraud without also rejecting a large proportion of good customers? There are three keys to diagnosing and eliminating misrepresentation and errors in the insurance quote and application process:

Leverage all sources of data, not just purchased ones.
Triangulate between different data types.
When in doubt, ask the customer.

This approach is described in Part 2 here.

VeracityID solutions detect, deter and defeat the most frequent and costly auto insurance frauds, during quote, binding, endorsement and at claim.

Learn more at VeracityID.com

Insurers can't stop fraud and misrepresentation with only data - Part 2

In part one (it can be accessed here) we explained that there is a fundamental limit to the ability of data alone to identify small probability events - the error rates inherent in the data and the acquisition process end up dwarfing the targeted events. This results in far more false positive and false negative errors than actual true positives.

What can insurance carriers do to reduce their data diagnostic error rates so that purchased data can become useful? We believe that there are three keys to diagnosing and eliminating misrepresentation and errors in the insurance quote and application process and critically, they must be done during the few minutes of the quote session.

Leverage all sources of data/information.
There are five different types of data derived from very different sources.

Synthetic data is composed of data sets merged together from other data usually gathered by others for other reasons.
Harvested data is data that the vendor (or carrier) collects themselves with the intention using it to evaluate customers. As a result it should be more accurate than synthetic.
Customer data is simply what the customer shares in the quote process.
Customer behavior is what actions the customer takes during the quote or series of quote sessions. People often say one thing but are telling a different story with their actions.
Finally customer interaction is information derived from a direct intervention with the customer.

Triangulate between different sources.

The way to reduce the probability of a false result is to use different types of data from different sources to get a better perspective. For example if synthetic data says that there is a teenage driver but the customer's data says there isn't, then a data capable carrier would triangulate by first looking at the customer's quote behavior: did he get a first quote with the young driver in the policy and then remove them on a second quote? Other analytics could be used to test anomalous relationships between different types of customer data. For example, are there an unusual number of vehicles for the number of listed drivers? Or barring that, a carrier could gather customer interaction data by asking questions driven by the identification of a possible missing driver.

One important caveat. There are often multiple providers of the same data but one provider's data cannot validate a second provider's if - as is usually the case - both data products are derived from the same underlying data sets. This is why getting customer data, behavior and interaction is so important: it's new information.

When in doubt ask the customer.
This raises two questions: First, how can you ask the customers in an automated online quote process? It must be automated. Carriers need the capability to trigger specific automated question/data gathering cascades based upon a specific diagnostic failure. That way the questions are integrated seamlessly into the quote process. It can be done quickly and cost effectively, we're doing it for carriers today.

Second, How does asking the customer about their data make the decision better given it's the customer's veracity that is in question? The answer lies in human nature: people who provide deceptive data know that they are being dishonest. It makes them very uncomfortable when a carrier immediately and specifically asks them about the data they were manipulating. As a result the true manipulators will often abandon the quote. If they don't, the carrier can often take steps to limit liability. For example in the case of the young hidden driver, the carrier simply insists that that driver either be placed on the policy or be explicitly excluded.

The other reason to talk to the customer is all of the false positives. It's quite likely that many of them are customer misunderstandings or mistakes. When these people are asked about the data they don't abandon, they welcome the help to get the right data so they can get a valid quote. By intervening with these customers, carriers help guide them back onto the path to coverage, improving conversion rates while reducing fraud.

Up front fraud and rate manipulation can be managed.
Historically the auto insurance industry has been fatalistic about up front fraud - the dollar value per event was too small and the time frame too short to do much. This is no longer true. Data capable carriers are using data and real time interventions to reduce their premium leakage and preexisting damage claims fraud losses by amounts that - depending on the channel - equal ten to thirty percent of net premiums written.

VeracityID solutions detect, deter and defeat the most frequent and costly auto insurance frauds, during quote, binding, endorsement and at claim.

Learn more at VeracityID.com

Thursday, August 2, 2018

The Definition of a Good P&C Insurance Customer

What makes a buyer of personal auto insurance a 'Good' customer? If we define 'good' as 'profitable' over the lifetime of the relationship then there are really two behaviors that characterize them:

Loyalty - the willingness to stick with the same carrier and not churn to the cheapest alternative every year.

Honesty - telling the truth about their risk characteristics up front and filing honest claims thereafter.

Of the two, carriers focus on Loyalty far more than honesty. They routinely track loyalty and have programs to recognize and reward faithful customers. Yet the the value of Honesty is potentially far greater. Or more specifically, the cost of dishonesty is much more expensive per case than that of disloyalty.

This prompts some interesting research questions:

What is the value of Honesty?
How much of the value of Loyalty is really Honesty? In other words, do dishonest customers jump in and then jump out while the honest persist?
How can carriers legally reward honest customers? (More carrot, less stick)
How good are carriers in finding the dishonest in the first place?
If Honesty and Loyalty are highly correlated then should carriers find ways to reward them more?

Let me know if you have any thoughts or reactions because at VeracityID we're focused on finding the answers to these and related questions, for Auto Insurance and the P&C industry overall.