A quick guide on uploading data to RIDL

A short guideline to facilitate using the Raw Internal Data Library

v1.4



What do I need to upload data to RIDL?

1. About 15 minutes of time (to upload and provide information)

2. Your data files (in excel, csv, Stata, r)

3. The questionnaire used (if available)

4. The metadata files (if available)



1. What is RIDL?

UNHCR recently launched the Raw Internal Data Library (RIDL), a globally-supported, centralized, and secure internal data repository. Its purpose is to enable UNHCR to use its valuable raw data to its full potential and preserve it for future analysis and use.

In addition to RIDL, UNHCR is also launching a Microdata Library (MDL) which can serve as an externally available data repository for partners and researchers. Before being published in MDL, data will go through a curation process and will be anonymized. In cooperation with the regional and field offices, this process will be overseen by the curation team at UNHCR HQ.


How to access RIDL?

The UNHCR RIDL platform can be found at: ridl.unhcr.org




Figure 1.1. Login page of RIDL




Login to the RIDL website is easy: use your UNHCR email address without @unhcr.org, and the password that you use to log in to your computer, email, MSRP or other internal UNHCR applications.


CKAN

RIDL is based on a technology called CKAN, a web-based open-source management system for the storage and distribution of open data. Think of a content management system like WordPress, but for data instead of pages and blogs.
The datasets published on RIDL consist of two parts,

1. the metadata which contains information about the data, and

2. the resources, including the data files, but also questionnaires, the final report based on the data, etc.





2. Where to upload your data?

After logging in at ridl.unhcr.org, the Dashboard will appear which shows RIDL’s News feed in the first tab, My Datasets in the second, and My Data Containers in the third tab.




Figure 2.1 The Dashboard


To be able to upload datasets to RIDL, the user needs to have access to at least one container as an ‘Editor’ or as ‘Admin’. To request access to an existing data container, please send an email to: microdata@unhcr.org.


Figure 2.2 Editor access to the 'Cameroon' container


The data containers are geographically ordered. There is an option to request a new data container under an existing parent container, which might be useful when several datasets are available covering the same topic. To request a new data container: go to Data Containers and click the button Request Data Container (see figure 2.3).


Figure 2.3. Request a new data container


The tab My Datasets includes all the datasets the user has access to and allows the upload of new data. After logging in, go to the tab My datasets.




Figure 2.4 Adding a dataset


This provides two options to upload data:

1. Add dataset – the user will need to create the dataset and linked resources by filling out the metadata elements one by one.

2. Import dataset from DDI/XML - which should be used if there are special xml and rdf files previously generated in Nesstar Publisher.

Option 1: Add dataset


At this stage you are asked to add information about the dataset - the metadata. This metadata will be visible for all colleagues who log in to RIDL. The data and resources itself are not automatically accessible for everyone, but you can switch the Raw Data Access level (default setting is private) to Internally visible.


The following fields are mandatory:

Title: This title will be unique across RIDL. Preferably, the title should follow the convention: survey-name, year.

Description: You can add a longer description with information users of the data need to know.

Data container: These are geographically ordered, select the country and sub-folder within the country.

Data collector: Select the organization(s) responsible for the data collection.

Topic classifications:You can select multiple topics to describe the dataset. This will ensure the dataset is easy to find.

Unit of measurement:What is the unit of measurement, i.e. what is each observation? A household, individual, child, business, etc.?

Data collection technique:You can select the data collection technique from a scroll down menu.

Archived yes/no:Is the data archived, or still active?

Besides the required fields, there is room to provide additional information, such as Sampling Procedure. It is recommended to add all relevant information. The next RIDL page provides the opportunity to upload the actual data (see Figure 3.2). Data can take any format such as excel, csv, Stata. It also allows to upload additional documents. It is recommended to upload the questionnaire, as well as the field-work report and the final report.



Option 2: Import Dataset from DDI


! IMPORTANT ! To enable the importation of DDI files, the RIDL Nesstar template should be used to create the DDI files. Download the template from here.


If you choose the second option, Import dataset from DDI/XML, you will be asked to:

1. Upload or link an XML and RDF file (see figure 2.3).

2. Select the data container: the country or geographical area and the sub-folder to which the data belongs.

3. Set the visibility, private or public.

Private: The data and other added resources are only accessible for colleagues who have access to the same data container. Note that the metadata is visible for all colleagues using RIDL.

Internally visible: The data and resources added are accessible for all colleagues within UNHCR.


Once all fields are filled, click the button Import to upload the files. Once the files are uploaded, you can add data (see Figure 3.1) and other resources.



Figure 2.3. Uploading a dataset using xml and rdf




3. What to upload?

After uploading the DDI files, or providing the information using the RIDL form as discussed above, you will be asked to add resources. Data files as well as the questionnaire and report should be added.


Data files




Figure 3.1. Adding data files


To upload data, click on Upload and select the relevant file from your computer, and provide additional information about the data (see Figure 3.1). Each data file needs to be uploaded separately. The following information is required:


  • Name: which should follow the naming convention: UNHCR, ISO country code, data series (SEA, SENS, KAP, etc), year (if multiple per year, followed by a, b, c), (if relevant, the observation level- e.g., child, household), version of data. Example: UNHCR_CMR_SENS_2018_women_v1_3.
  • Date: the start and end date of the data collection, in the format year/month/day, e.g. 2018/01/20.
  • Version: the version (number) of the data.
  • File process status: is the data raw, cleaned, or cleaned and anonymized?
  • Identifiability: is the data raw, cleaned, or cleaned and anonymized?

Questionnaire and other supporting documents

Please include the questionnaire, preferably the final version with variable names that correspond to the variable names in the data. Other resources, such as the final report, or field-work reports, can also be added. To add these, click the file attachment, upload the file and add the name of the resource and its format (pdf, word, etc), a short description may be added (e.g. questionnaire version number).



4. RIDL’s Data Deposit

RIDL includes a data deposit option to share data that is not yet ready for publication. The Curation Team can assist with the data documentation and, if relevant, take care of the data anonymization which is required before sharing the data on UNHCR’s Microdata Library.


The Data Deposit user guide can be downloaded from here.




5. API Documentation

As with all CKAN instances, all the functionality available to the RIDL browser interface can be accessed programmatically via the API. For an introduction to working with the CKAN API and a comprehensive list of all methods available please refer to the official CKAN API documentation

Authentication and Authorization

The API uses the same authorization that the web interface so users will be able to perform (or not perform) the same actions that they have permission to do on the UI when using the API.

To authenticate as a particular user the Authorization header must be sent as part of the request, including the API key provided to each user on the user details page:


For instance:

curl -X GET -H Authorization:API-KEY
https://ridl.unhcr.org/api/3/action/package_search

For simplicity we won’t include the Authorization header in the rest of examples in this section, but it is required on all API requests to the site.


Important notes on naming:

  • For historical reasons the CKAN API refers to datasets as packages (eg package_show, package_search, etc)
  • Data Containers are internally implemented using CKAN’s Organizations feature, so when interacting with them via the API the organization_* methods need to be used (eg organization_create, organization_list, etc)
  • Examples


    Loading data containers

    curl -X GET https://ridl.unhcr.org/api/3/action/organization_list?type=data-container

    Note the type parameter


    Creating or Requesting a Data Container

    curl -X POST https://ridl.unhcr.org/api/3/action/organization_create \
    -H "Content-Type: application/json" \
    -d '{"type": "data-container", "name": "test-data-container", "title": "Test Data Container", "country": "ANG", "geographic_area": "West Africa", "groups": [{"name": "africa"}]}'

    Note the type key provided, and the groups object used to define the parent Data Container. As in the UI, if a non-Sysadmin user performs the API request, the Data Container will be created with an “approval_pending” state.


    Searching Datasets

    curl -X GET https://ridl.unhcr.org/api/3/action/package_search?q=markets


    Creating a Dataset

    curl -X POST https://ridl.unhcr.org/api/3/action/package_create \
    -H "Content-Type: application/json" \
    -d '{"name": "test-dataset-pen", "title": "Test Dataset PEN", "notes": "Some description", "owner_org": "africa", "data_collector": ["unhcr"], "keywords": ["food", "shelter"], "unit_of_measurement": "kg", "data_collection_technique": "f2f", "archived": "False"}'


    Note that the values used are the ones internally expected by the validators, not the ones displayed in the UI. Also note that data files (type=data) or file attachments (type=attachment) are created separately once the dataset is created using the resource_create method.