CRM data quality improvement through automated duplicate detection

Challenge

In the domain of Customer Relationship Management (CRM), ensuring the reliability and quality of client registration information stands as a critical factor for the success of targeted retention campaigns. The challenge is exacerbated when a decentralized registration process draws data from diverse sources, leading to an increased susceptibility to duplicated records and insertion errors.

Existing tools support detecting duplicate entries – however that process often relies on an exact match of a particular key, comprising personal unique data such as phone numbers and email information. This process overlooks other essential details – the presence of invalid information or characters in certain fields, or similarities (not exact matches) that almost certainly are mistyping errors, allowing for potential inconsistencies in the database.

The manual, one-to-one correction of duplicated contacts proves to be a tedious and time-consuming task.

Recognizing the urgency to rectify this pervasive issue, we partnered with a car retailing group in a project whose primary goal is to establish a unified client database, while enhancing the overall accuracy and usability of the CRM. Aiming to streamline direct marketing actions and sales efforts, the project seeks to implement a more efficient and automated approach to rectify duplicates, acknowledging the imperative role data quality plays in targeted retention campaigns.

Solution

The strategy followed can be summarized in four steps:

Selecting potential duplicate contact pairs: Identify pairs of contacts that may be duplicates based on shared personal information such as phone numbers, emails, or ownership of the same vehicle.
Classifying duplicates using similarity scores: Assess the similarity between contact pairs through metrics like character and phonetic similarity. Use these scores to train a classification model, determining the probability of duplication for each pair.
Creating duplicate groups: Group identified duplicate pairs. Each group should ultimately be represented by a single contact entry in the database.
Consolidating client information: For the chosen contact in each group that will remain in the database, consolidate the client information by adopting the most reliable and up-to-date data available.

Result

Our approach to addressing duplicated records involved leveraging sophisticated machine learning models to detect duplicates. Through meticulous stakeholder discussions, we ensured a seamless transition from theoretical frameworks to real-world applications, bridging the gap between assumptions and practical implementation.

As a result of these efforts, we were able to detect and remove 12% duplicate contacts in the CRM database.

The remaining contacts in the database were consolidated, to keep the most trustworthy data for each field. We also proposed a detailed process for converting invalid phone numbers and emails into a standardized format, strengthening the database against potential inconsistencies. Moreover, we recommended the adoption of a more robust unique key system, minimizing the risk of duplications and ensuring long-term database integrity.

Service Applied

Marketing & Customer Analytics

Industry Applied

Transport & logistics Consumer Goods Retail & e-commerce

Delivery Mode Applied

Consultancy

CRM data quality improvement through automated duplicate detection

Leveraging Machine Learning for streamlined database integrity

Challenge

Solution

Result

Related Cases

Augmenting startups’ deal sourcing process

Data-driven sales forecasting in consumer goods production

Successfully deploy a complex tool in a challenging environment

Related Insights

Generative AI in action: Corporate Case Studies

How Generative AI is reshaping organizational strategies

Advantages of using digital twins in operations management

Get In Touch!

Porto

Lisboa