Research and Methodology
Last updated:
1. Why We Published This
Recolect is not a black box. Hiring decisions affect careers, and assessment tools that cannot be interrogated should not be trusted. This page sets out, in plain terms, how assessments are designed, how responses are scored, and where human oversight sits in the process. It is written for three audiences: expert reviewers who need to evaluate the platform's scientific basis; legal and compliance teams assessing regulatory alignment; and candidates who deserve to understand how they are being evaluated. Transparency is a design principle at Recolect, not a disclosure obligation.
2. The Assessment Framework
Every assessment is built against competencies derived from the client brief, not drawn from a fixed question bank. This means each candidate is evaluated against the specific role they applied for, using the criteria the hiring organisation considers essential.
The scoring approach follows Behaviourally Anchored Rating Scales (BARS) principles. Rather than assigning scores based on subjective impression, each rating point on the scale is anchored to observable, role-relevant behaviours derived from the brief and the competency framework.
Seniority tier determines question format, depth of response expected, and the degree of AI involvement in generation and scoring. Entry-level assessments prioritise potential and learning indicators. Executive assessments prioritise strategic judgement, accountability, and proven leadership behaviours, with increased human oversight at every stage.
Technical detail: BARS methodology and validity evidence
Behaviourally Anchored Rating Scales were first formalised by Smith and Kendall (1963) and refined extensively since. The core principle is that rating points are defined by descriptions of actual observable behaviour at each level, rather than adjectives or vague descriptors. This substantially reduces rater subjectivity and increases inter-rater reliability.
Schmidt and Hunter's landmark 1998 meta-analysis, covering 85 years of selection research, established that structured assessment methods consistently outperform unstructured interviews as predictors of job performance. Structured behavioural assessment achieves validity coefficients in the range of r = 0.51 to 0.63 when properly anchored to role-relevant criteria. Recolect's approach is grounded in this evidence base and updated to reflect subsequent validation guidance from the Society for Industrial-Organisational Psychology (SIOP).
The seniority gradient in assessment design reflects findings from multiple benchmark studies, including Criteria Corp's analysis of 35,000 firms, which shows that the cost of a poor executive hire exceeds 200% of annual salary, making rigorous assessment at senior levels commercially as well as ethically warranted.
3. Question Design Principles
Questions are generated fresh for each assessment from the client brief. There is no static bank of questions to memorise, share, or game. This is a deliberate design choice: a fixed bank degrades in validity the moment it becomes known.
Recolect uses three question formats, selected based on seniority tier and competency type:
- Situational judgement.
- Realistic scenarios presenting a work-relevant dilemma, used primarily at mid to senior levels where judgement quality is the key discriminator. Situational judgement tests achieve high predictive validity for role-specific competencies.
- Open scenario response.
- Free-text narrative questions that ask candidates to describe how they have handled relevant situations. Used at Senior Manager level and above, where depth of prior experience can be evidenced directly.
- Behavioural questions.
- Structured around the critical incident technique, asking for specific past examples linked to defined competencies. Anchored to observable behaviours rather than intentions or hypothetical preferences.
Questions adapt in depth, format, and expected response complexity by seniority level. Entry-level assessments target potential and learning agility. Executive assessments target strategic judgement and proven accountability. These tiers are never conflated.
Technical detail: psychometric design principles
Good psychometric items share three properties: construct validity (they measure what they claim to measure), content validity (they adequately sample the competency domain), and criterion-related validity (scores predict job performance outcomes). Recolect's AI generation process is designed to satisfy all three by anchoring question generation to the brief, competency framework, and seniority-appropriate behavioural anchors.
The British Psychological Society (BPS) guidelines on psychological testing require that assessments be administered standardly, that instructions are clear and consistent, and that scoring criteria are defined in advance. Recolect satisfies these requirements through automated delivery (standardised by design), plain-language instructions reviewed for readability, and BARS anchors generated from the brief before any candidate responses are reviewed.
The platform does not score grammar, writing style, vocabulary range, or sentence structure. These dimensions correlate with educational background and native language proficiency, not with the competencies being assessed. This is an explicit fairness control, not a technical limitation.
4. Scoring Model
Responses are scored on a 1 to 5 BARS-anchored scale. Each score point corresponds to a behavioural description generated from the brief, not to a general label applied post-hoc. Every score is linked to specific evidence drawn directly from the candidate's response; inferences from tone, writing style, or phrasing are explicitly excluded.
Must-have criteria designated in the brief trigger hard-fail logic. A candidate who does not evidence a must-have criterion receives a fail outcome regardless of performance on other competencies. This prevents high scores on secondary criteria masking a fundamental gap.
Confidence weighting is applied where the evidence is thin. Where a candidate's response is ambiguous or underspecified, the score is flagged rather than inferred, and the recruiter is alerted to probe further in interview.
Score scale reference
| Score | Label | Description |
|---|---|---|
| 5 | Distinguished | Consistently exceeds expectations with clear, specific evidence of advanced capability |
| 4 | Advanced | Exceeds expectations in most areas with well-evidenced competency |
| 3 | Proficient | Meets the expected standard with adequate evidence across the competency |
| 2 | Developing | Below expected standard; evidence is partial or insufficiently specific |
| 1 | Insufficient | Significantly below standard; little or no relevant evidence provided |
5. AI Role and Human Oversight
Recolect uses AI at two points in the assessment process. Human oversight is mandatory at every stage where a decision could affect a candidate outcome.
Question Generation
AI reads the brief and generates competency-anchored questions for the seniority tier
AI Scoring
AI scores candidate responses against BARS anchors, flagging thin evidence and hard-fails
Human Review
A human reviewer approves the report before it is shared with any client or hiring manager
Human review before report release is not optional and cannot be bypassed by platform configuration. Every report carries a label identifying which elements were AI-generated, so that any person reading the report can understand the provenance of each finding.
Candidates are informed at the start of their assessment that their responses will be processed by artificial intelligence and reviewed by a human professional before any report is produced.
6. Fairness, Adverse Impact, and Bias Controls
Recolect monitors for adverse impact, defined as a statistically significant difference in selection rates between protected characteristic groups that cannot be justified by job-relevance evidence. Adverse impact analysis will be conducted as the platform scales and sufficient assessment volume is available for statistically meaningful analysis.
- No shared question banks.
- Assessment questions are generated fresh per assessment and are never reused across different seniority tiers. This prevents lower-tier questions appearing in senior assessments and vice versa, which would distort both scoring validity and adverse impact analysis.
- Grammar is not scored.
- Writing quality, vocabulary, and grammatical accuracy are explicitly excluded from all scoring criteria. These dimensions correlate with educational background and native language, not with job-relevant competency. Non-native English speakers are not disadvantaged by this platform.
- EU AI Act awareness.
- Hiring AI systems are classified as high-risk under Annex III of the EU AI Act. Recolect is building toward full compliance with the technical documentation, conformity assessment, and human oversight requirements that classification requires, ahead of the August 2026 obligations date.
7. Regulatory Alignment
Recolect is designed from the ground up to operate within the regulatory frameworks that apply to AI-assisted hiring in the UK and EU:
- UK GDPR and DPA 2018. Candidate data is isolated per client. Lawful basis for processing is documented. Rights including access, rectification, and erasure are supported. See our Privacy Policy for full data handling detail.
- ICO AI in recruitment guidance.The ICO's 2024 guidance on AI in hiring requires transparency, meaningful human involvement, and the ability to explain automated decisions. Each of these requirements is addressed by Recolect's design.
- EU AI Act. Hiring AI is classified as high-risk under Annex III. Recolect is working toward full compliance with the technical documentation, human oversight, and conformity assessment requirements ahead of the August 2026 deadline.
8. Research and Evidence Base
Recolect's methodology is grounded in published research in occupational psychology, psychometrics, and AI fairness. The core evidence base is listed below.
Show all references
- Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124(2), 262-274.
- Bartram, D. (2005). The Great Eight competencies: A criterion-centric approach to validation. Journal of Applied Psychology, 90(6), 1185-1203.
- British Psychological Society (BPS). Guidelines on psychological testing and assessment in occupational contexts. Available at: bps.org.uk
- Information Commissioner's Office (2024). Guidance on AI and automated decision-making in recruitment and HR. ICO, United Kingdom.
- Society for Industrial-Organisational Psychology (SIOP) (2023). Guidelines for validation and use of AI-based assessment tools in personnel selection.
- European Parliament and Council of the European Union (2024). Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (EU AI Act). Annex III: High-risk AI systems.
9. Limitations
An honest methodology page must acknowledge what the platform cannot do.
Not a complete assessment
No single-mode assessment can evaluate all competencies relevant to a role. Recolect is designed to complement, not replace, structured interviewing. SIOP guidance explicitly recommends combining assessment methods for final hiring decisions.
Depends on honest engagement
The platform cannot detect whether a candidate engaged with their assessment seriously, had assistance from a third party, or submitted responses that do not reflect their actual experience. Recruiters should treat scored reports as structured evidence to probe, not as conclusive verdicts.
AI scoring is not infallible
AI scoring models can misread nuance, miss context, or apply anchors inconsistently on unusual responses. This is why human review before report release is mandatory. Recruiters are expected to override AI scores where their professional judgement identifies an error.
Not a substitute for structured interviewing
Assessment output generates interview probes and flags evidence gaps precisely because the assessment itself is not the final word. Hiring decisions should always incorporate at least one further structured evaluation stage.
10. Contact
For methodology enquiries, including requests for technical documentation, validation evidence, or regulatory compliance information, please contact methodology@trinnovo.com.