Vaccinating Europe’s Undocumented: A Policy Scorecard

We developed the methodology in collaboration with experts on access to health for undocumented people, and data journalists and data scientists.

Below is a short explanation of how we went about developing the Scorecards, from the data collection process, to data validation and analysis. If you would like to read a more in-depth explanation, and access the materials we used for the Scorecards, you can click here.

Data Collection

Lighthouse Reports, in consultation with PICUM — an umbrella group for organisations providing assistance to and advocating for the rights of undocumented migrants in Europe — developed a Scorecard to assess the transparency and accessibility of the coronavirus vaccine to undocumented people in Europe and the United Kingdom, according to official national policies..

To complete the Scorecard, Lighthouse Reports recruited volunteer researchers for each of the countries in the study. Most of the volunteers came from Birmingham University’s Masters’ in Data Journalism. Volunteers were selected based on their availability, language proficiency and previous experience and interest in collaborative data journalism projects. In a few cases, one volunteer was responsible for two countries.

Researchers attended a virtual orientation session in which they were introduced to the project and scorecard methodology. The volunteers were responsible for collecting material for analysis, following the points below on the type of material needed:

– Official vaccine policies, at the national level
– National vaccination implementation plans
– National vaccination registration website

This a non-exhaustive list of acceptable sources:

– Official national document: official vaccine policy, national implementation plan
– Other government communication: government press release, speech by authorities
– Parliamentary records, social media post from official government account
– Media: news items (print, online, TV, radio, ..)
– NGO: civil society organisation press release, statement, NGO information on
– Government policy
– Registration website: vaccine registration website
– Academic source: academic analysis of government policy or implementation

This is a non-exhaustive list of materials to exclude:

– Research reports
– Advocacy statements, press releases
– Unverified/Unofficial statements
– Statements that cannot be found online (eg. a TV statement which is untraceable)

Volunteers collected all relevant material and registered it in a Google Sheet which is cleaned and stored here . This spreadsheet served as the basis for the data collection and inputting. All documents were given a country and number code to identify them. All materials were also collected in separate country Google Drive folders.

Following document collection, volunteers were provided with the questions included in the questionnaire for analysis, and were asked to identify whether the documents they found were relevant for answering scorecard questions. They marked a document with Yes or No respectively.

During a joint sprint session, the volunteers convened online with Eva Constantaras, Lighthouse Reports Data Editor, and Francesca Pierigh, Project Coordinator, for the data inputting sessions. Volunteers were asked to answer all questions to the best of their abilities, and doubts were addressed by the organisers. Upon completion of the questionnaires, volunteers were asked to double-check that all materials used to answer questions were appropriately marked as such in the Data Collection spreadsheet.

Following the data inputting session, the data cleaning process took approximately two months and included a round of questions specifically targeted to each researcher. Cleaning steps included:

– Verifying that the answer, source document, document citation and research notes were consistent.
– Reformulating questions for clarity
– Refining and recoding question responses for clarity
– Supplementing responses with additional source material when available

Following unavailability of three researchers, and lack of in-house language knowledge for those countries, three countries were dropped from the data cleaning process: Sweden, Hungary and Croatia.

The data cleaning continued for the remaining 21 countries. This process led to the elimination of a number of questions, which were deemed so vague that responses were inconsistent or so specific that the majority of researchers were not able to provide answers.

The first data collection session took place in June 2021. Additional data collection took place over the following months, through September 2021. The data collected covers a period of time spanning from November 2020 to July 2021. Data cleaning took place from July to September 2021, and data processing from September and October 2021.

Data Definitions and Properties

Question Categories

Each question is grouped into exactly one of the following five categories, which are used for further analysis:

1. Policy Transparency
2. Undocumented Access
3. Identification and Residency Requirements
4. Marginalized Access
5. Privacy Guarantees

Openness Questions

In addition to these five categories, 12 questions are marked to identify a country’s openness about the vaccine rollout policies. These openness questions are used to identify countries with not enough information for analysis and report.

Question ID

Questions are given a unique identifier of the form “T1” or “A5”. These identifiers do not bear any meaning for this project.

Response Options

There are a total of 4 possible responses: {Yes, No, Unknown, NA}. However, depending on the question, certain responses are not allowed. For example:

– “T1” can be responded with one of these options: {Yes, No}
– “A18” with: {Yes, No, Unknown}
– “A5” with: {Yes, No, Unknown, NA}
– And so on

Data Preprocessing

Point System for Responses

Each possible response to each question is given a point system to convert qualitative code {Yes, No, Unknown, NA} to a quantitative number. These points represent how good or bad the response is in relation to providing vaccines to the undocumented population. The following a snapshot of the point system:

Question Importance Scale

In addition to the point system, each question is given an importance from {not so important, important, very important}, based on how impactful these questions are to the overall score of each country.

The importance value is mapped so that {not so important, important, very important} are weighted at {0.5, 1.0, 1.5}, respectively.

Data Validity Tests

A few tests are run on the collected and processed data to test the reliability, validity and internal consistency within the dataset.

Inter-item correlation is tested on all categories. As expected, all semantically related questions are correlated as expected while non-related questions do not show any correlation. Interrelated questions show great reliability as they scored correlations between 0.6 and 0.99.

Cronbach’s Alpha is also calculated to test the internal consistency of the dataset. The Cronbahc’s Alpha value of 0.729 indicates that the internal consistency of the dataset is good for exploratory research.

For validity test, the trends in the dataset is also roughly compared with the results shown in The COVID-19 Vaccines and Undocumented Migrants: What Are European Countries Doing?. After converting PICUM ratings to number scale, PICUM scores and our country scores show a strong statistically significant correlation, which provides additional validity to our dataset.

Data Analysis

Several types of analysis are run on the dataset: cluster analysis, similarity between country pairs, statistical distributions, and correlation analysis with scatter plots.

Appendix

List of Sources Referenced and the Descriptions (for description, please see the section “list_of_documents_used_for_national_scorecards.csv”)

Processed Data and the Descriptions

One key analysis that we have used in this analysis is dividing countries into confused, low score and high score groups. These groupings take precedence in the order given below:

– Confused (for total score): A country is classified as “confused” overall if the overall confidence score of the country is in the below 50-percentile.
– Confused: A country is classified as “confused” in a given category (this is one of the five categories or the total score) if the confidence score of the country in that category is less than 0.5, regardless of the aggregate score for that category.
– Low score: A country is classified as “low score” in a given category if the score for that country in that category is below 50 percentile.
– High score: A country is classified as “high score” in a given category if the score for that country in that category is above 50 percentile.

The confused classification has 2 approaches due to the limited number of questions in the category “Identification and Residency Requirements” which contains only 2 questions. This makes it technically impossible to use the percentile approach to classify the countries into the confused category. On the other hand, using the percentile approach when there’s enough data points is preferable. Therefore the overall confidence score is grouped by using percentile approach.

These groupings help us define which countries are unclear about their policies, which are clear but exclusionary and which are clear and inclusive of the undocumented population.