- Home
- 2025 projects
- Cluster Faster
Cluster Faster
10 February 2025
Pairing human sensemaking & AI to transform large, unstructured datasets into clustered insights
Problem
Researchers and Operations Specialists often spend a lot of time analyzing large, unstructured datasets. For example, a government agency we spoke to reviews over 100K open-ended responses during peak periods. Another agency processes around 7K customer support tickets per month in a similar way.
Recent advances in Large Language Model (LLM) technology have made processing these datasets much easier, but two key challenges remain:
Verifying against hallucinations – Researchers must verify the output, but this requires manually reviewing data before or after processing, which is slow and error-prone.
Numerical errors – LLMs sometimes miscount data (e.g., a 1.2k-item dataset might be reported as having fewer), requiring researchers to count manually instead.
Solution
Cluster Faster pairs human sensemaking with LLM analysis, allowing users to review and refine analysis in-flight, while building their sense of the data. It processes each datapoint individually and logs each result, eliminating numerical errors, ensuring quick and accurate counts.
In early testing with a dataset of ~600 unstructured feedback, Cluster Faster showed ~90% match rates with previously hand-sorted analysis (vs. 68% without human input). After correcting for human error, the same results suggest that Cluster Faster’s paired human + LLM analysis process accurately categorised ~95% of the test dataset. Each run took only about 8 minutes.
How it works
Upon uploading a dataset, users define tags and categories for analysis and can add short descriptions. In early testing, adding these descriptions improved consistency, reducing variance across runs from 6% to 1%.

Adding categories and descriptions
Cluster Faster then analyses a random sample and presents it for review. Users can make changes that feed back to the tool. This key step:
Helps users understand the data without separate review.
Allows fine-tuning without reprocessing the full dataset, saving time.
Automates sampling to reduce effort and errors.
The generative (“learning”) nature of LLMs makes this especially powerful: in early testing, correcting individual datapoints improved accuracy for entire groups of similar datapoints.

Editing sample results
Finally, Cluster Faster analyses the full dataset and provides a distribution chart as well as CSV output. It does so by processing each datapoint individually, and hence has no risk of numerical hallucination.

Results are visualised in a bar chart and can be downloaded in CSV
User feedback and future opportunity
In the conversations we had with government teams, researchers and operations specialists found value in easily categorizing and counting large datasets. Even with existing LLM tools, counting was still a manual task, often done using spreadsheets. They saw a tool like Cluster Faster as a way to save hours per study while improving accuracy. Many guests at Demo Day were similarly interested and wanted to use Cluster Faster in their work.
Testers also wanted an option to generate categories from the data instead of defining them upfront. This was one of their main uses of other LLM tools today, and has the same hallucination risk. We have started exploring how Cluster Faster’s human + LLM analysis method can extend to category identification and mitigate hallucination risk in the same way
Beyond extending to cover more of the end-to-end analysis process, having a workflow solution also allows process improvements like the ability to do more complicated and more robust sampling, subcategory analysis, as well as better output visualisation and filtering.
Check us out

Explore the prototype here and reach out to us with this form if you want to use Cluster Faster for your own research!
Team members
Tan Jie Yin | Darren Ng