Each year, companies spend billions of dollars gathering survey data to guide product decisions. However, a growing percentage of this data is AI-generated. This bad data can lead to misguided decisions and cost companies billions.
Unfortunately, a large body of research has shown that identifying AI content isn’t reliable. At Roundtable, we discovered a different approach: keystroke tracking. After collecting millions of responses for our survey data-cleaning API, we noticed that AI and human-generated responses generate text in fundamentally different ways. Here's what we found.
Building an AI keystroke dataset
Consider this question:
Describe your morning commute - how do you get to work and how long does it take?
And two responses:
- I drive and leave home around 8. It takes about 20 minutes.
- My morning commute covers about 5.2 miles (8.4 kilometers) total. I walk half a mile (0.8 km) from my apartment to Ashmont Station, then take the Red Line subway for about 4.3 miles (6.9 km), followed by another 0.4 mile (0.6 km) walk from the downtown station to my office building. The whole trip typically takes 35 minutes door-to-door.
The AI response is obvious - it includes distance conversions, unnecessary details, and perfect paragraph structure. While humans could write this way, they rarely do in a rushed online survey.
AI keystroke patterns
By focusing on questions where human and AI responses clearly diverge, we built a large dataset of labeled responses and their corresponding typing patterns. While the content was often similar, the typing patterns revealed consistent differences.
Human typing has a natural variance. The time between keystrokes follows a characteristic distribution with random pauses and bursts, with pauses most likely at the end of words and phrases. For example:
Humans also make typos and use backspaces and other editing techniques to fix them. By contrast, AI responses typically show unnaturally consistent intervals between keystrokes with minimal variance, and almost never include corrections or backspaces:
This level of keystroke consistency is virtually impossible for humans to replicate.
Other AI responses look like human-AI hybrids, where AI pipes in text programmatically and humans then edit the responses:
Real-world impacts
To demonstrate how bad actors corrupt survey data, we ran a test study about an intentionally bad product - a solar-powered refrigerated hat. By classifying responses as “Human” or “AI”, we found that the AI responses created two types of problems in downstream analysis.
First, AI responses showed consistently higher willingness to pay ($135 for AI vs. $40 for humans), replicating a general pattern of AI optimism compared to human responses.
Second, AI responses added noise. Whereas human responses formed logical customer segments (motivated, indifferent, etc.), the flagged data formed nonsensical ones - for example, a segment rating the hat highly useful but being unwilling to pay.
Conclusion
Identifying AI content through text analysis is extremely difficult and usually unreliable. We suggest that analyzing typing patterns can be more robust. The complex keystroke data provides clear signals that are much harder to fake (and often more understandable) than content alone. Of course, this is a cat and mouse game. As detection methods evolve, so do evasive techniques, and we're constantly updating our models.
Mathew Hardy is the CTO and co-founder of Roundtable, an AI fraud detection platform.