Identification accuracy and confidence

Understand how Fingerprint defines and tests accuracy and how to assess the reliability of visitor IDs.

Fingerprint Identification analyzes browser and device details. It combines these with machine learning to create unique visitor IDs. Each detail helps make the visitor ID unique, making it easier to tell devices apart.

There are two primary methods of identification:

  • Deterministic identification: This method uses exact matches between fixed data points (for example, cookies). It identifies a returning visitor with complete certainty.
  • Probabilistic identification: This method looks at patterns across over 100 attributes. It estimates the likelihood that the data belongs to the same visitor. This leads to high-confidence matches, but not with absolute certainty.

Using both methods together allows reliable identification. This works even when user behavior or device settings change. Accuracy is measured by how often returning visitors are correctly recognized.

False positives and negatives

Fingerprint aims for high accuracy, but identification isn't always perfect. A false negative error can happen when a returning visitor is not recognized. A new visitor ID is assigned instead of using the previous one. A false positive error happens when two different visitors have similar details and get the same visitor ID by mistake.

These cases only occur with probabilistic identification. Deterministic identification identifies returning visitors without any ambiguity. When deterministic matching isn’t an option, probabilistic matching is used.

Fingerprint Identification accuracy

Fingerprint Identification uses deterministic features to identify each visitor's browser. When these features are available, it achieves perfect accuracy.

Sometimes deterministic features aren’t available. This can happen in incognito mode or if a user deletes their data. In these cases, Fingerprint can still identify visitors using advanced probabilistic matching. These methods reliably re-identify visitors over time. Even if device settings or software change.

To estimate the accuracy of probabilistic matching we:

  1. Take a dataset of deterministically identified visitors.
  2. Remove the deterministic data and re-identify them using probabilistic methods.
  3. Compare the output to see how often we get the same result.

This process gives an accuracy estimate for probabilistic identification. We conduct detailed accuracy monitoring across multiple platforms. This helps us maintain our industry-leading accuracy.

👍

What about mobile?

Identification is even more accurate on mobile devices identified using our native mobile SDKs. This is because we are able to use signals that are more accurate and last longer than those available for identifying browsers.

Factors affecting accuracy

Accuracy can vary by environment, and 100% accuracy isn’t always possible. Frequent identification is key to maintaining high accuracy. It helps catch small changes in a browser or device. When too much time passes between identifications, more changes are likely to occur. This can make probabilistic identification less accurate.

New browser releases can cause changes that briefly lower accuracy. We continuously adjust the matching algorithm to keep up and maintain high accuracy.

Confidence score

When you make an identification request, the response contains a confidence score.

{
  "requestId": "8nbmT18x79m54PQ0GvPq",
  "visitorId": "2JGu1Z4d2J4IqiyzO3i4",
  "confidence": {
    "score": 0.995,
    "revision": 1.1
  }
}

The confidence score shows how sure Fingerprint is that a browser or device hasn’t been wrongly identified as another. It’s a floating point number between 0 and 1. Higher numbers reflect more confidence in the identification accuracy.

How is the confidence score calculated?

In simple terms, the confidence score is calculated as

confidenceScore = 1 - falsePositiveProbability

Where falsePositiveProbability is the chance of a visitor being mistakenly identified as another existing visitor, according to our statistical models.

When deterministic methods are used to identify a returning visitor, the score is 1.0. The uniqueness of these properties makes it impossible to misidentify a browser. The confidence score for events identified probabilistically is always less than 1.0. When two browsers are identical, their properties might not differ. This creates a chance of incorrectly identifying one for the other.

Additionally, a visitor ID’s historical identification confidence affects the current confidence score. Any visitor previously identified with a score below 1.0 will not reach 1.0. Even for deterministically matched events.

📘

How is confidence score different from accuracy?

Fingerprint Identification accuracy reflects the probability of correct identification across all identification events. A confidence score is issued for each identification event. It reflects our estimate of the probability of correct identification for that individual event.

Using the confidence score in your application

You can use the confidence score to improve security and user experience, for example:

  • Request extra verification: To prevent account takeover, you can request additional authentication. This could include a one-time code for browsers or devices with a confidence score below 0.95.
  • Personalize the experience: If anonymous users return with a browser identified with a high confidence score, offer them a customized experience.

These thresholds are just examples. The best score for your app will depend on your needs and may require testing to find the right fit.

First-time visitors

Fingerprint identifies a browser or device as new only when it cannot find a match for it. Since a match cannot be found for first-time visitors, the falsePositiveProbability is 0. Thus the confidence score for such new visitors is 1.0. To distinguish first-time visitors from returning visitors you can use the visitorFound property. For first-time visitors, a false negative error may add slight friction but won't compromise security since impersonation isn't possible.

If there are big changes in browser properties or the confidence score is very low, an existing visitor may be identified as a first-time visitor. At that time, the confidence score will still be 1.0. The confidence score does not incorporate the probability of a false negative identification event.

Example confidence score calculations