METHODOLOGY

Using a leaked report and NIST data to audit facial age estimation AI for efficacy and bias.



As political leaders across Europe promise tougher measures to crack down on migration, governments have increasingly turned to artificial intelligence as a tool for managing borders, screening asylum seekers and assessing claims. Despite the increasing spread and reach of these technologies, journalists and civil society actors have often struggled to scrutinise them. Governments release little information about how they work and rebuff attempts to find out more, citing national security concerns or commercial confidentiality.

Migration has long been a testing ground for facial recognition technology. Governments have deployed facial recognition systems at airports, border crossings and visa-processing centers. In Greece, authorities have deployed AI facial recognition in refugee camps, while the European Union’s new Entry/Exist system will rely on facial images collected from millions of non-EU travelers.

Against this backdrop, a closely related technology has quietly begun to surface in asylum systems: facial age estimation. Rather than identifying an individual from an image, these AI systems use photographs to predict their age.

The technology has its roots in systems designed to prevent minors from accessing adult content or products, like cigarettes, alcohol or internet pornography. But more recently, facial age estimation systems (FAE) are beginning to be repurposed for asylum procedures, where they are used to inform decisions that can have life-changing consequences for minors who are often in their most vulnerable moments.

European officials already carry out age assessments when asylum seekers lack official identity and their age is disputed. These age determinations have profound implications on asylum seekers’ legal protections, housing, asylum procedure, and access to services like social care. Governments have floated the idea of using AI facial age estimation technology to conduct these age assessments. A European Parliament briefing noted that AI could be “used to provide analytics for assessing elements of asylum applications” such as to “help determine the age of an applicant.”

In November 2025, the UK Home Office announced that it would be piloting the use of FAE to predict the age of asylum seekers and launched a procurement process for the system. In an accompanying press release, the agency wrote that AI would “root out illegal migrants gaming the system by posing as children.” Six months later, in May 2026, it released further details, including the identity of the vendor selected to provide the FAE technology, Cognitec Systems, a Dresden-based facial recognition company.

An investigation by Lighthouse Reports, in collaboration with WIRED and The Independent, investigated the system further. This methodology explains how we analysed a leaked internal Home Office report and publicly available data from the National Institute of Standards and Technology (NIST) in the United States, which conducts regular assessments of FAE vendors. We reviewed the Home Office’s testing methodology and findings, and conducted our own analysis of NIST’s benchmark data for Cognitec’s facial age estimation system. Our analysis found evidence of substantial error rates and demographic disparities, particularly affecting groups that make up a large share of asylum seekers subjected to age assessments. The data and code used in this analysis can be found on Github.

The Home Office's Own Tests Found Bias Against Sub-Saharan Africans

Lighthouse obtained a leaked internal report from the Home Office, dated April 2025, that tests seven FAE algorithms. The Home Office tested the algorithms against 2.5 million images, primarily visa and residence permit photographs. A “smaller subset” of images were of asylum seekers. The testing data also contains images of around 500 people who had photos taken at first encounter, which is when age assessments are usually raised. The report notes those photos had capture qualities that were “routinely worse” compared to photos taken of the same people months later.

The Home Office report primarily discusses results from the “best performing supplier.” It is unclear if this is Cognitec, or a different company. The report found that this vendor’s FAE technology tends to overpredict the age of minors. On average, a 17-year-old is predicted to be over 18. It also found that it performs worse on females.

The system described in the report shows significant bias against Sub-Saharan Africans. The average error rate for male Sub-Saharan Africans is double that of other groups. According to figures in the report, the average error rate for female Sub-Saharan Africans below the age of 18 is even higher, sitting at 4.6 years, meaning that a 14-year-old could potentially be predicted to be an adult.

This is particularly significant because in 2025, the largest group of asylum seekers who had age assessments raised were from Sub-Saharan Africa, according to the Home Office’s own data. In other words, the system performs worst on the group that it is most likely to be used against.

How We Obtained Benchmark Data

The National Institute of Standards and Technology (NIST) in the United States carries out regular assessments of FAE vendors as part of its Face Analysis Technology Evaluation (FATE) program. NIST publishes the aggregated results of these assessments, including for Cognitec, online. We analysed Cognitec’s report card from its most recent FATE submission, which was submitted to NIST in April, 2026.

NIST’s report card pages contain various charts using PlotlyJS. We extracted the underlying data in these visualisations, which are embedded in <script> tags. In some cases, we directly analyzed metrics reported by NIST. In others, we used the underlying data to calculate additional metrics not explicitly reported (see Appendix I for more information).

Photo quality has a large impact on the accuracy of FAE systems. This includes capture equipment, subject positioning, lighting and background. NIST performs multiple tests that rely on several datasets of varying photo quality to measure performance, including an “Application” dataset consisting of images that were taken during applicant interviews at an immigration office which were “collected using dedicated capture equipment and lighting,” have uniform backgrounds, and are in “good conformance” with the ISO/IEC19794-5 Full Frontal image standard.

Excerpt from NIST’s FATE report showing example images from the Application dataset.

NIST also tests FAE algorithms using a “Border” dataset, which are photos taken by immigration officers “towards a cooperating subject” using a webcam. NIST says that because the images are captured under time constraints, the faces are not always consistently posed and can sometimes be off-angle, under-exposed, and may not have uniform backgrounds.

Excerpt from NIST’s FATE report showing example images from the Border dataset.

Based on what the Home Office has said in both internal and external communications, it’s possible the image quality of the “Border” dataset is more representative of the photos processed by its FAE tool. In a May publication about its use of FAE technology, the Home Office said that immigration officers make age assessments at the point of first encounter, and are often making those decisions while “under pressure” to quickly process new arrivals. Furthermore, the Home Office writes in its report that many asylum seekers may undergo stress-induced aging from trauma or travel that “appear to impact FAE accuracy,” although it was unable to systematically test this.

In its internal report, it says that the image quality of photos taken at initial encounters is “routinely worse” than that of photos taken of the same people months later. According to the report, the difference was so pronounced that the authors were unable to determine whether poor age estimation results were primarily driven by asylum seekers’ health conditions or by the poor quality of the images taken at first encounter.

This distinction is important because NIST’s evaluation data reflects testing under relatively controlled conditions. It may not fully capture the operational environment in which the Home Office intends to deploy the technology. As a result, real-world performance could differ from the results reported in NIST’s evaluations.

Cognitec's System Misclassifies Many Minors — and Shows Stark Regional Bias

We focused our analysis on minors classified by Cognitec’s system as being over 18 in both the Border and Application datasets. NIST’s report cards only publish results for two ages, 16-year-olds and 25-year-olds. We therefore did not focus on adults classified as minors, because few 25-year-olds are predicted to be under 18.

Amongst the lower-quality Border dataset photos, Cognitec’s system misclassified more than two thirds of 16-year-olds as adults. Even amongst the higher quality Application dataset more than a third of 16-year-olds were misclassified.

The large difference between the two datasets demonstrates the extent to which image quality and capture conditions affect the accuracy of FAE systems. But it also shows how, even under optimal conditions, the system misclassifies a significant number of minors as adults.

Like the system described in the Home Office report, Cognitec’s FAE system shows significant regional biases, particularly against photos of individuals born in Africa. Across all datasets, the system predicts more than half of 16-year-old West Africans as being over 18, while less than a quarter of 16-year-old Eastern Europeans are classified as adults. In 2025, Europeans made up less than two percent of asylum seekers subjected to age assessments.

We also found gender disparities, but these were not consistent across datasets. When assessing images from the Border dataset, Cognitec’s system is more likely to misclassify a 16-year-old male minor as an adult than a 16-year-old female minor. This disparity flips for the Application dataset, where female minors are more likely to be misclassified than male minors.

A Discussed Mitigation Comes With Its Own Tradeoff

According to both NIST and the Home Office’s internal report, one common way to reduce errors in facial age estimation systems, particularly for individuals close to the age of 18, is to use a higher “challenge age”. A challenge age is essentially a threshold set above 18. For example, if the challenge age is set at 20, anyone predicted to be 20 or younger is classified as a minor. In practice, the challenge age acts as a cutoff below which the system gives the “benefit of the doubt.”

The chosen challenge age, if any is chosen at all, has large implications for the share of minors predicted as adults. The higher the challenge age, the fewer minors wrongly predicted as adults.

It also introduces an inevitable tradeoff. Increasing the challenge age threshold reduces the risk of children being treated as adults, but it also increases the number of adults who are treated as children. This is an issue clearly acknowledged in the Home Office’s report, where the authors write: “It is therefore important to consider what threshold is set. A higher threshold reduces the number of children falsely classified as adults (False Reject Rate) but at the expense of increasing the False Accept Rate where adults are misclassified as children.”

NIST’s Cognitec report card contains detailed data on misclassification rates across different challenge age thresholds, making it possible to quantify the tradeoff.

While increasing the challenge age reduces the overall number of minors classified as adults, it actually increases the relative disparities between gender and ethnic groups. With a challenge age of 18, a 16-year-old West-African is more than two times more likely to be misclassified as an adult. With a challenge age of 24, a West African 16-year-old is more than 10 times more likely to be misclassified as an adult compared to an Eastern European minor.

The Home Office’s Response

Lighthouse and its partners sent a detailed set of findings and questions around its use of FAE. In response, the Home Office provided the following statement.

“Robust age assessments are a vital tool in maintaining border security and safeguarding children. We have rigorous processes in place to verify an individual’s age, and are working to modernise these through the testing of fast and effective Facial Age Estimation technology.

This groundbreaking assistive tool is designed as an additional source of information for immigration officers, and does not replace or overrule human judgement. In cases of uncertainty, individuals will always be treated as children until a further assessment is conducted.

The world-leading National Physical Laboratory has been commissioned to carry out the independent review of testing and trial reports.”

Cognitec’s Response

Lighthouse and partners similarly sent a detailed set of findings and questions to Cognitec. In an email response, Cognitec wrote that it is “not allowed to comment on the Home Office project, its use case, and their testing methods.”

Cognitec wrote that it could not comment on some of our specific findings, because it did not understand where we obtained the data. We pointed Cognitec to the corresponding NIST report card page, which we had already linked in our initial email, but did not receive further comment. The company commented generally that the demographic bias raised in our analysis “apply to all algorithms on the NIST FATE AEV” and that the “reasons for bias are extremely complex and often related to image quality issues.”

Cognitec wrote that the bias of its algorithms are “low compared to other algorithms of similar accuracy” and that they were working to reduce bias by “developing specific testing methodologies, designing loss functions in our network training, and by diversifying the training and testing data.”

Appendix I. Calculating predicted age distribution

NIST’s report card publishes False Positive Rate (FPR) curves for 16-year-olds across a continuous series of challenge ages. At each challenge age, the FPR is the share of 16-year-olds whose predicted age came at or above that threshold. This means the FPR curve is effectively a survival curve for predicted age.

This makes it possible to reverse engineer the full distribution of predicted ages for 16-year-olds. The share predicted to be a given age is the drop in FPR between that age and the next one up. For example, if 70 percent of 16-year-olds are predicted to be 17 or older, and 60 percent are predicted to be 18 or older, then 10 percent were predicted to be exactly 17.