In life, we often preach quality over quantity. In the market research world, we like to preach quality and quantity.

Quantity speaks to the nature of quantitative studies: larger sample sizes reduce the influence of random variance. Thus, researchers must probe into vast samples to give informed business recommendations. These typically cover a wide range of sub-segments, each requiring a statistically significant pool of respondents.

But what about quality? What happens when large samples are infected with fraudulent respondents?

Research providers have long prioritized quality, but as the digital landscape becomes increasingly complex, the hunt for authentic data has grown more challenging. Driven by respondents’ lack of genuine engagement or hackers exploiting online systems for rewards, low-quality respondents have become a severe issue.

According to a recent estimation by Kantar Profiles, 15 million surveys were completed by international hackers in 2023, while averages are increasing in 2024. To position this issue from a time perspective, it has increased 780x since 2018.

It’s a widespread concern in our industry. CEO of the Insights Association, Melanie Courtright, has stated “These issues are global and persistent and must be addressed in a concerted industry-wide manner, we can more effectively and efficiently combat data quality threats by coordinating our efforts.” Alongside Insights Association’s Data Integrity Initiative, industry giants such as Market Research Society (MRS), SampleCon, and ESOMAR have joined together to guard against fraud and ensure quality showcasing the increasing need to be unified in this fight.

It’s time to truly understand what we are fighting against and establish better defense systems. Let’s address what is considered fraudulent in the market research efforts and what systematic processes we can undertake to foster fruitful, yet secure and trustworthy respondents.

Who are we on the lookout for?

As technology has advanced, dishonest panelists have adapted, learning to manipulate and exploit digital systems on a broad scale. Taking advantage of the evolving technology, they have become adept at disguising their true identities in various ways. Understanding how these individuals bypass existing safeguards and implementing effective countermeasures has become the backbone of digital sampling.

Lazy panelists

One of the most common types of panelists encountered is those who are disengaged or lazy. These respondents complete online surveys with little care, interest, or time invested in the process. They often rush through surveys, skip questions, provide minimal content, or give straight-line responses, all of which distort the accuracy of the data and misrepresent the target sample. While their intent may not be malicious– often stemming from poor timing or lack of connection in completing the survey– their responses still undermine the quality of the data collected.

Fraudulent respondents

The most damaging are those who work individually or collaborate in groups, called survey farms, to hack surveys. Being the true fraudsters, these scammers are driven by monetary incentives and often make their livelihood through these exploitations. Realizing high levels of sophistication, they utilize technology to mimic human responses, corroborate with similar samples, and progress their swindle to almost imperceptible degrees. Hiding under various IPs, ISPs, ghost personas, invalid emails, devices, or bots, these hackers have mastered infiltrating bulk surveys and funneling rewards at comprehensive rates.

How do we combat survey fraud?

As scammers take advantage of economic opportunities and leverage AI to both enhance and obscure their activities, it becomes mandatory to implement measures that identify and flag these respondents.

Pre-survey screening

Before launching a survey, it's essential to implement a comprehensive respondent screening and cleaning process. This involves setting a timeline that allows for meticulous vetting to ensure the quality and reliability of the data collected.

One effective approach is to incorporate initial screening questions designed to qualify or disqualify respondents based on their answers. These questions serve a dual purpose:

  • Ensure that only individuals from the specific target audience proceed to the full survey
  • Filter out respondents who may be randomly answering questions solely to meet participation quotas or earn financial incentives

To enhance the effectiveness of this process, screening questions should be carefully crafted to avoid directing respondents toward specific answers. Although there are many best practices for screening and engaging panelists in surveys, let's concentrate on the key pre-survey indicators. The questions should encourage natural responses by offering:

  • Multiple response options
  • Avoiding binary (yes/no) questions
  • Avoid overly leading prompts

This reduces the risk of introducing bias and helps identify more genuine, engaged participants. By fostering a more authentic respondent pool, the survey results will be more accurate and reflective of the target population's true opinions.

Integrating verification protocols during survey-taking

Innovative approaches in the fraud detection literature have emerged to improve the identification of fraudulent activities in surveys.

The email address score method assigns points based on the structural characteristics of an email address, such as the presence of suspicious patterns, domain irregularities, or anomalies in format. This approach provides a swift and efficient way to flag potentially fraudulent email addresses, allowing for early intervention before data collection is compromised.

The post-submission response verification protocol takes a different route by sending follow-up emails to respondents, requesting them to verify specific survey responses they previously submitted. This additional step not only confirms the authenticity of the responses but also deters fraudulent participation by introducing an accountability layer, ensuring that the data collected is coming from reliable sources.

Additional best practices to incorporate are:

  • Mandatory follow-up could involve a short qualitative interview at the end of the survey with a random mix of select questions they previously answered, allowing for cross-checking and verification of their responses.
  • Video or audio responses to open-ended questions can add another layer of verification.
  • Real-time detection systems to monitor and identify unwanted behaviors, such as speeding through questions or providing nonsensical answers.

These methods not only help confirm the authenticity of the data but also discourages fraudulent behavior by introducing an element of personal accountability and deterring those who are just in it for a quick reward.

Avoid mixing panels

A critical issue that frequently compromises the integrity of survey data is the mixing of panels. Many organizations encounter situations where the collected data is insufficient to draw reliable conclusions, leading them to supplement it with external sources. However, by relying on third-party platforms for additional market research panels, companies often resort to sources with varying quality standards. This approach to online sampling is often problematic because not all platforms maintain consistent research practices, quality checks, or ensure a diverse and representative demographic. As a result, combining panels can lead to inaccurate and biased data, which undermines the validity of the research and can result in misleading conclusions being accepted as fact.

In cases where there are not enough respondents in niche segments, we recommend using AI to generate augmented synthetic respondents with tools like Fairboost. By adhering to the “gold in, gold out” approach, researchers who train AI algorithms with high-quality respondents can confidently gain reliable results in underrepresented groups.

Post-survey quality checks

Bolstering survey integrity can be achieved through a combination of advanced security measures and rigorous quality checks. Implementing VPN blocks, adaptive captchas, and triple-opt-in methods helps to prevent fraudulent participation by ensuring that only genuine respondents can access and complete the survey. Responses should be scrutinized for validity through:

  • open-ended response checks
  • plagiarism detection
  • straight-lining identification
  • outlier analysis
  • consistency checks across answers

Respondents who fail these quality tests may face account suspension, serving as a deterrent for those attempting to manipulate the survey. Post-survey analysis should also include detecting any lingering patterns of undesirable behavior, ensuring that only valid and reliable data is retained.

Top fraud detection solutions in market research

Service providers in the market insight fields understand the demand for quality data checks to ensure the integrity and reliability of collected information. By employing advanced technologies and methodologies, these providers help organizations identify and rectify data inaccuracies, inconsistencies, and anomalies.

Faircheck by Fairgen

Our solution, Faircheck, is a data check platform that enables users to easily upload a dataset and receive a refined, downloadable subset with outliers flagged and explanations for each flag provided. Researchers can upload their surveys to Faircheck, where an outlier detection algorithm filters out respondents based on specific criteria. Outliers may be identified through factors such as response time, open-ended question analysis, local outlier factors, duplicate answers, or custom conditions. This process ensures that only high-quality responses are retained. Users have the flexibility to add or customize these detection criteria to better suit their specific needs, ensuring the final dataset is thoroughly cleaned and reliable.

Data Quality Controls by Potloc

At Potloc, data quality control platforms have been developed using a three-pronged approach. Their tool utilizes specialized survey sources, optimizes respondent experiences, and engages 14 quality checks to ensure the cleanest and most accurate insights. These data quality tests, driven by both human and AI resources, are integrated before, during, and after survey completion to create a secure and reliable process that filters out bots, fraudulent panelists, and duplicate or lazy responses. The final step involves an in-house data cleaning team that meticulously reviews responses to ensure that only relevant, high-quality data is passed on for final evaluation.

Qubed by Kantar

Qubed is another platform for combating survey fraud. This advanced solution use three Deep Neural AI Networks to analyze over 300 features in real-time, delivering fraud detection results within 100 milliseconds. Qubed identifies and removes Lazy, Dishonest, and Fraudulent panelists, ensuring only quality respondents remain. It uniquely detects "out of country" fraud, which accounted for 41% of bad data in 2022, protecting data integrity. Unlike other systems, Qubed evaluates a panelist’s full history, avoiding harsh judgments based on a single session, and ensuring fair treatment of good panelists.

ReDem

ReDem is an AI-powered platform that ensures the quality of your data. Functioning both in real-time and post-survey, ReDem thoroughly evaluates survey data against comprehensive criteria to eliminate unwanted behaviors. You can adjust quota settings for specific demographics, clean field phase data, apply respondent data weighting, and generate reports for stakeholders. Additionally, ReDem can be integrated directly into survey software, allowing for immediate verification of respondents. This means fraudulent behavior can be detected and flagged before survey completion.

Taking back control

In today’s technology-driven landscape, we stand at an unprecedented crossroads of opportunity and innovation. While the allure of rapid delivery and immediate access to information is compelling, we must not lose sight of the importance of maintaining high quality in our endeavors. In the realm of market research, the advent of generative AI has significantly enhanced our ability to gain deeper insights from surveys. However, with these advancements comes the critical responsibility of safeguarding the integrity of our data. By proactively implementing robust fraud detection and mitigation processes, we can ensure that our findings remain accurate and reliable, even as we navigate the complexities and potential pitfalls of this evolving technological frontier.