Presented at ESOMAR’s annual Congress conference in Athens, Greece, our collaboration with the Ifop Group was exhibited in the white paper titled “Synthetic Data in Marketing Studies: Exploring the promise of generative AI and synthetic data.” To address a key industry challenge in data collection processes, we worked with Thomas Duhard, Head of Data Projects at Ifop, to push the boundaries of AI and understand the potential of synthetic samples for our industry’s search for insights.
Below is a brief overview of the content, but you can download the full paper and watch our recent presentation from the ESOMAR Congress event earlier this month.
What are we looking to achieve with synthetic samples?
Standard data collection practices often struggle to balance fundamental economic and technical factors, such as assuring representativeness, achieving sufficient sample sizes, and maintaining data quality. By leveraging augmented respondents, we provide a straightforward solution to this problem by narrowing the scope and boosting real data with AI-generated synthetic sample boosters.
Understanding boost factors with the most extensive industry benchmarks
In the paper, the authors demonstrate the effectiveness of synthetic sample boosters through over 7,000 parallel tests using datasets from the Pew Research Center to compare real boosts to AI-generated boosts, illustrating how it can improve samples of low-incidence populations that are often hard to analyze.
The paper then explains the methodology behind the calculation of Effective Sample Sizes (ESS) and boost factors, concluding that, on average, Fairgen is as reliable as three times the amount of real data on the sub-segment level.
Qualitative benchmarking while boosting swing groups in the European election of 2024
The paper then showcases a study on the European elections; Ifop augmented a key swing group of secondary school teachers using our synthetic boosts. The political poll included a representative sample of 8,000 French adults, with only 116 respondents from the teacher demographic. By employing augmented synthetic respondents, this group was boosted to 580 respondents, correcting inconsistencies and aligning the sample with sociological plausibility, ultimately providing a better read into this rare demographic’s influence on the election's outcome. Moreover, the results showed that AI can reliably mimic human responses, enhancing the representation of niche groups.
What’s next for synthetic samples?
While the industry benefits from the economic gain and flexibility offered by augmented synthetic respondents, the paper highlights several key concerns surrounding synthetic data:
- What are the limitations of synthetic data?
- Does synthetic data pose a reliability risk?
- Is synthetic data the latest breakthrough in data collection?
Samuel and Thomas address these challenges and propose responsible deployment strategies to set a standard for ethical and effective use of augmented synthetic samples.
In conclusion, while augmenting real data promises significant benefits in terms of delivering unprecedented granular insights, it is essential to operate within the technology’s limitations. Careful deployment is vital for maintaining data quality and preventing misuse.
Through this collaboration, Fairgen and IFOP demonstrate that synthetic data is a powerful and viable tool for modern quantitative research. By acknowledging its limitations and maximizing its potential, synthetic data can drive granular recommendations and propel the industry forward.
Access the full paper and watch our talk from the ESOMAR Congress event here.