Incomplete data is a common headache in marketing. Accurate segmentation of customer and prospect data is a cornerstone for running successful campaigns, but what do you do when data is missing? Leaving gaps isn’t an option for me — I like to ensure it is as complete as possible, even though it can be a slow and painstaking process. Missing data in a CRM system is bad enough, but losing data accuracy is even worse for the CRM’s integrity.
Recently, we migrated to a new CRM system. We also started augmenting our data from external sources, including intent data and additional properties to enhance our marketing and sales automation. This left me with numerous gaps in fields like ‘job role’ and ‘seniority’, and inconsistencies in data structures and labels. Two in particular were significant for future campaigns:
Job Role: This field describes the area of work, such as Sales, HR, IT, Finance, etc.
Seniority: This field denotes the level within the organisation, like C-Suite, Head of Function, Director, Mid-Senior, etc.
Manually reviewing and updating these fields for our entire CRM, where ~30% of records were either missing some labels or needed translating to the new ‘Job_Role’ categories. Fixing these manually would have been time consuming and arduous.
So, a little experiment with AI was needed… could it help fill in the blanks by studying our complete data and making inferences. If successful, this approach could also be used for future data imports and evaluating the accuracy and integrity of new data entries.
>> Is it safe to use ChatGPT in this way? Start with policy.
You might ask, why not just use a VLOOKUP in Excel? The issue is the vast variation in job titles and how they’re entered by users. A simple lookup wouldn’t work due to the likelihood of many queries not finding an exact match. In addition to that, because the categories are changing, some job titles might not be a straight swap under the new categories e.g. a job title categorised as ‘IT’ for Job_Role might be better categorised as ‘Business Improvement & Innovation’ in the new system.
My Hypothesis
By training an AI model on existing data (job titles, job roles, and seniority), the AI could infer* how to update records and fill in any missing properties.
The Process
- Data Preparation: I took a section of our database (with personal information removed) and ran it through a custom GPT model. This was done on a paid team account, ensuring our data wouldn’t be used to train their models.
- Model Training: I provided the GPT model with tables showing how to convert old job role values to new ones and descriptions for each of the ‘seniority’ values.
- AI Prompting: I prompted the custom GPT to:
– Understand how existing complete data had been categorised into job roles and seniority.
– Act like an API, where I would provide a comma-separated list of properties, and it would returned an updated set with the new ‘job_role’ and filled in any missing ‘job_role’ or ‘seniority’ values.
The results showed promise. In a random sample of 100 records, manually checked against LinkedIn, only 3 or 4 were clearly wrong. I suspected these errors were due to the model lacking some additional context, such as variations in job titles across industries and countries (e.g., ‘Head of…’ could be senior or mid-senior, and ‘VP’ is more common in the US than ‘Director’ in the UK).
Refining the Model
I reworked the training data to include industry and country properties:
Job_Title, Industry, Country, Job_Role(new), Seniority
I ran another sample through the refined model, this time scripting a custom GPT as an API and using Google Sheets App Script to automate data handling. Of 100 records, only about three had results I might have done differently if updated manually, but none were clearly wrong.
Real-World Application
The ultimate test was to use the refined data in a targeted campaign. The results were impressive with high click-through rates from CTAs and emails, and decent landing page completions.
This experiment demonstrated that AI could efficiently fill in data gaps and maintain data integrity in our CRM system, to increase accuracy and ultimately improve the effectiveness and efficiency in acheiving marketing goals.
Finally, I’ll drop the video below that shows how ChatGPT could look at a set of relational data tables and infer conclusions from the data:
Leave a Reply