Raw Internal Data Library

A short guideline to facilitate using RIDL

v1.5



What do I need to upload data to RIDL?

1. About 15 minutes of time (to upload and provide information)

2. Your data files (in excel, csv, Stata, r)

3. The questionnaire, concept note, final report, scripts or other technical documentation (if available)

4. The metadata files (if available)



1. What is RIDL?

UNHCR recently launched the Raw Internal Data Library (RIDL), a globally-supported, centralized, and secure internal data repository. Its purpose is to enable UNHCR to use its valuable raw data to its full potential and preserve it for future analysis and use.

In addition to RIDL, UNHCR is also launching a Microdata Library (MDL) which can serve as an externally available data repository for partners and researchers. Before being published in MDL, data will go through a curation process and will be anonymized. In cooperation with the regional and field offices, this process will be overseen by the Curation Team at UNHCR HQ.


How to access RIDL?

The UNHCR RIDL platform can be found at: ridl.unhcr.org




Figure 1.1. Login page of RIDL




Login to the RIDL website is easy: use your UNHCR email address without @unhcr.org, and the password that you use to log in to your computer, email, MSRP or other internal UNHCR applications.


CKAN

RIDL is based on a technology called CKAN, a web-based open-source management system for the storage and distribution of open data. Think of a content management system like WordPress, but for data instead of pages and blogs.
The datasets published on RIDL consist of two parts,

1. the metadata which contains information about the data, and

2. the resources, including the data files, but also questionnaires, the final report based on the data, etc.





2. Where to upload your data?

After logging in at ridl.unhcr.org, the Dashboard will appear which shows RIDL’s News feed in the first tab, My Datasets in the second, and My Data Containers in the third tab.




Figure 2.1 The Dashboard


To be able to upload datasets to RIDL, the user needs to have access to at least one container as an ‘Editor’ or as ‘Admin’. The Dashboard's My Data Containers tab will list all containers the user has access to.


Figure 2.2 Editor access to the 'Cameroon' container


To request access to an existing data container, navigate to the container by clicking on its name and locate the 'Request Access' button.


Figure 2.3 Request access to container


The data containers are geographically ordered. There is an option to request a new data container under an existing parent container, which might be useful when several datasets are available covering the same topic. To request a new data container: go to Data Containers and click the button Request Data Container (see figure 2.3). The name of the new container should follow the naming convention: start with the 3 letter code of the country, followed by a semi-colon and the name of the container. For example, Iraq : WASH KAP.




Figure 2.3. Request a new data container


The tab My Datasets includes all the datasets the user has access to and allows the upload of new data. After logging in, go to the tab My Datasets.




Figure 2.4 Adding a dataset


This provides two options to upload data:

1. Add dataset – the user will need to create the dataset and linked resources by filling out the metadata elements one by one.

2. Import dataset from DDI/XML - which should be used if there are special xml and rdf files previously generated in Nesstar Publisher.



Option 1: Add dataset


At this stage you are asked to add information about the dataset - the metadata. This metadata will be visible for all colleagues who log in to RIDL.


The following fields are mandatory:

Title: This title will be unique across RIDL. Preferably, the title should follow the convention: survey title, year.

Description: You can add a longer description with information users of the data need to know.

Data container: These are geographically ordered, select the country and sub-folder within the country.

Internal Access Level: Who can download the data?

  • Private – Data and its associated resources (e.g., questionnaire, reports, etc.) are only accessible by those who are Members, Editors or Admins of the data container where the data are located. Others can see the metadata and can request access to the data from the Admin of the data container.
  • Internally visible – Data and its associated resources are available to all with access to RIDL (i.e., all UNHCR colleagues with UNHCR credentials).

Data collector: Select the organization(s) responsible for the data collection.

Topic classifications: You can select multiple topics to describe the dataset. This will ensure the dataset is easy to find.

Unit of measurement: What is the unit of measurement, i.e. what is each observation? A household, individual, child, business, etc.?

Data collection technique: You can select the data collection technique from a scroll down menu.

Archived yes/no: Is the data archived, or still active?


Besides the required fields, there is room to provide additional information, such as Sampling Procedure. It is recommended to add all relevant information. The next page provides the opportunity to upload the actual data (see Figure 3.2). Data can take any format such as xlsx, csv, Stata. It also allows to upload additional documents. It is recommended to upload the questionnaire, as well as the field-work report and the final report.



Option 2: Import Dataset from DDI


! IMPORTANT ! To enable the importation of DDI files, the RIDL Nesstar template should be used to create the DDI files. Download the template from here.


If you choose the second option, Import dataset from DDI/XML, you will be asked to:

1. Upload or link an XML and RDF file (see figure 2.3).

2. Select the data container: the country or geographical area and the sub-folder to which the data belongs.

3. Set the visibility, private or public.

Private: The data and other added resources are only accessible for colleagues who have access to the same data container. Note that the metadata is always visible for all colleagues using RIDL.

Internally visible: The data and resources added are accessible for all colleagues within UNHCR.


Once all fields are filled, click the button Import to upload the files. Once the files are uploaded, you can add data (see Figure 3.1) and other resources.



Figure 2.3. Uploading a dataset using xml and rdf




3. What to upload?

After uploading the DDI files, or providing the information using the RIDL form as discussed above, you will be asked to add resources. Data files as well as the questionnaire and report should be added.


Data files




Figure 3.1. Adding data files


To upload data, click on Upload and select the relevant file from your computer, and provide additional information about the data (see Figure 3.1). Each data file needs to be uploaded separately. The following information is required:


  • Name: which should follow the naming convention: UNHCR, ISO country code, data series (SEA, SENS, KAP, etc.), year (if multiple per year, followed by a, b, c), (if relevant, the observation level- e.g., child, household), version of data. Example: UNHCR_CMR_SENS_2018_women_v1_3.
  • Date: the start and end date of the data collection, in the format year/month/day, e.g. 2018/01/20.
  • Version: the version (number) of the data.
  • File process status: is the data raw, cleaned, or cleaned and anonymized?
  • Identifiability: is the data raw, cleaned, or cleaned and anonymized?

Questionnaire and other supporting documents

Please include the questionnaire, preferably the final version with variable names that correspond to the variable names in the data. Other resources, such as the concept note, the final report or field-work reports, and scripts written to process and analyze the data should also be added. To add these, click the file attachment, upload the file and add the name of the resource and its format (pdf, word, etc.), a short description may be added (e.g. questionnaire version number).



4. Request Data

RIDL provides an option for users to request access to data in private datasets. To do this, navigate to the dataset and locate the 'Request access' button or the link in the notification. Once clicking on either option, a pop-up window will appear, where additional information should be supplied about the requestor and the intended use of the dataset.



Figure 4.1. Request access to dataset

Once the request is submitted, a notification email will be sent to the administrators of the data container, who will then approve or reject the request.



5. RIDL’s Data Deposit

RIDL includes a data deposit option to share data that is not yet fully documented. The Curation Team can assist with the data documentation and, if relevant, take care of the data anonymization which is required before sharing the data on UNHCR’s Microdata Library.


The Data Deposit user guide can be downloaded from here.




6. API Documentation

As with all CKAN instances, all the functionality available to the RIDL browser interface can be accessed programmatically via the API. For an introduction to working with the CKAN API and a comprehensive list of all methods available please refer to the official CKAN API documentation

Authentication and Authorization

The API uses the same authorization that the web interface so users will be able to perform (or not perform) the same actions that they have permission to do on the UI when using the API.

To authenticate as a particular user the Authorization header must be sent as part of the request, including the API key provided to each user on the user details page:


For instance:

curl -X GET -H Authorization:API-KEY
https://ridl.unhcr.org/api/3/action/package_search

For simplicity we won’t include the Authorization header in the rest of examples in this section, but it is required on all API requests to the site.


Important notes on naming:

  • For historical reasons the CKAN API refers to datasets as packages (e.g. package_show, package_search, etc.)
  • Data Containers are internally implemented using CKAN’s Organizations feature, so when interacting with them via the API the organization_* methods need to be used (eg organization_create, organization_list, etc.)
  • Examples


    Loading data containers

    curl -X GET https://ridl.unhcr.org/api/3/action/organization_list?type=data-container

    Note the type parameter


    Creating or Requesting a Data Container

    curl -X POST https://ridl.unhcr.org/api/3/action/organization_create \
    -H "Content-Type: application/json" \
    -d '{"type": "data-container", "name": "test-data-container", "title": "Test Data Container", "country": "ANG", "geographic_area": "West Africa", "groups": [{"name": "africa"}]}'

    Note the type key provided, and the groups object used to define the parent Data Container. As in the UI, if a non-Sysadmin user performs the API request, the Data Container will be created with an “approval_pending” state.


    Searching Datasets

    curl -X GET https://ridl.unhcr.org/api/3/action/package_search?q=markets


    Creating a Dataset

    curl -X POST https://ridl.unhcr.org/api/3/action/package_create \
    -H "Content-Type: application/json" \
    -d '{"name": "test-dataset-pen", "title": "Test Dataset PEN", "notes": "Some description", "owner_org": "africa", "data_collector": ["unhcr"], "keywords": ["food", "shelter"], "unit_of_measurement": "kg", "data_collection_technique": "f2f", "archived": "False"}'


    Note that the values used are the ones internally expected by the validators, not the ones displayed in the UI. Also note that data files (type=data) or file attachments (type=attachment) are created separately once the dataset is created using the resource_create method.