1. About 15 minutes of time (to upload and provide information)
2. Your data files (in excel, csv, Stata, r)
3. The questionnaire, concept note, final report, scripts or other technical documentation (if available)
4. The metadata files (if available)
UNHCR recently launched the Raw Internal Data Library (RIDL), a globally-supported, centralized, and secure internal data repository. Its purpose is to enable UNHCR to use its valuable raw data to its full potential and preserve it for future analysis and use.
In addition to RIDL, UNHCR is also launching a Microdata Library (MDL) which can serve as an externally available data repository for partners and researchers. Before being published in MDL, data will go through a curation process and will be anonymized. In cooperation with the regional and field offices, this process will be overseen by the Curation Team at UNHCR HQ.
The UNHCR RIDL platform can be found at: ridl.unhcr.org
Login to the RIDL website is easy: use your UNHCR email address without @unhcr.org, and the password that you use to log in to your computer, email, MSRP or other internal UNHCR applications.
RIDL is based on a technology called CKAN, a web-based open-source management system for the storage and distribution of open data. Think of a content management system like WordPress, but for data instead of pages and blogs.
The datasets published on RIDL consist of two parts,
1. the metadata which contains information about the data, and
2. the resources, including the data files, but also questionnaires, the final report based on the data, etc.
After logging in at ridl.unhcr.org, the Dashboard will appear which shows RIDL’s News feed in the first tab, My Datasets in the second, and My Data Containers in the third tab.
To be able to upload datasets to RIDL, the user needs to have access to at least one container as an ‘Editor’ or as ‘Admin’. The Dashboard's My Data Containers tab will list all containers the user has access to.
To request access to an existing data container, navigate to the container by clicking on its name and locate the 'Request Access' button.
The data containers are geographically ordered. There is an option to request a new data container under an existing parent container, which might be useful when several datasets are available covering the same topic. To request a new data container: go to Data Containers and click the button Request Data Container (see figure 2.3). The name of the new container should follow the naming convention: start with the 3 letter code of the country, followed by a semi-colon and the name of the container. For example, Iraq : WASH KAP.
The tab My Datasets includes all the datasets the user has access to and allows the upload of new data. After logging in, go to the tab My Datasets.
This provides two options to upload data:
1. Add dataset – the user will need to create the dataset and linked resources by filling out the metadata elements one by one.
2. Import dataset from DDI/XML - which should be used if there are special xml and rdf files previously generated in Nesstar Publisher.
At this stage you are asked to add information about the dataset - the metadata. This metadata will be visible for all colleagues who log in to RIDL.
The following fields are mandatory:
Title: This title will be unique across RIDL. Preferably, the title should follow the convention: survey title, year.
Description: You can add a longer description with information users of the data need to know.
Data container: These are geographically ordered, select the country and sub-folder within the country.
Internal Access Level: Who can download the data?
Data collector: Select the organization(s) responsible for the data collection.
Topic classifications: You can select multiple topics to describe the dataset. This will ensure the dataset is easy to find.
Unit of measurement: What is the unit of measurement, i.e. what is each observation? A household, individual, child, business, etc.?
Data collection technique: You can select the data collection technique from a scroll down menu.
Archived yes/no: Is the data archived, or still active?
Besides the required fields, there is room to provide additional information, such as Sampling Procedure. It is recommended to add all relevant information. The next page provides the opportunity to upload the actual data (see Figure 3.2). Data can take any format such as xlsx, csv, Stata. It also allows to upload additional documents. It is recommended to upload the questionnaire, as well as the field-work report and the final report.
! IMPORTANT ! To enable the importation of DDI files, the RIDL Nesstar template should be used to create the DDI files. Download the template from here.
If you choose the second option, Import dataset from DDI/XML, you will be asked to:
1. Upload or link an XML and RDF file (see figure 2.3).
2. Select the data container: the country or geographical area and the sub-folder to which the data belongs.
3. Set the visibility, private or public.
Private: The data and other added resources are only accessible for colleagues who have access to the same data container. Note that the metadata is always visible for all colleagues using RIDL.
Internally visible: The data and resources added are accessible for all colleagues within UNHCR.
Once all fields are filled, click the button Import to upload the files. Once the files are uploaded, you can add data (see Figure 3.1) and other resources.
After uploading the DDI files, or providing the information using the RIDL form as discussed above, you will be asked to add resources. Data files as well as the questionnaire and report should be added.
To upload data, click on Upload and select the relevant file from your computer, and provide additional information about the data (see Figure 3.1). Each data file needs to be uploaded separately. The following information is required:
Please include the questionnaire, preferably the final version with variable names that correspond to the variable names in the data. Other resources, such as the concept note, the final report or field-work reports, and scripts written to process and analyze the data should also be added. To add these, click the file attachment, upload the file and add the name of the resource and its format (pdf, word, etc.), a short description may be added (e.g. questionnaire version number).
RIDL provides an option for users to request access to data in private datasets. To do this, navigate to the dataset and locate the 'Request access' button or the link in the notification. Once clicking on either option, a pop-up window will appear, where additional information should be supplied about the requestor and the intended use of the dataset.
Once the request is submitted, a notification email will be sent to the administrators of the data container, who will then approve or reject the request.
RIDL includes a data deposit option to share data that is not yet fully documented. The Curation Team can assist with the data documentation and, if relevant, take care of the data anonymization which is required before sharing the data on UNHCR’s Microdata Library.
The Data Deposit user guide can be downloaded from here.
As with all CKAN instances, all the functionality available to the RIDL browser interface can be accessed programmatically via the API. For an introduction to working with the CKAN API and a comprehensive list of all methods available please refer to the official CKAN API documentation
The API uses the same authorization that the web interface so users will be able to perform (or not perform) the same actions that they have permission to do on the UI when using the API.
To authenticate as a particular user the Authorization header must be sent as part of the request, including the API key provided to each user on the user details page:
For instance:
curl -X GET -H Authorization:API-KEY
https://ridl.unhcr.org/api/3/action/package_search
For simplicity we won’t include the Authorization header in the rest of examples in this section, but it is required on all API requests to the site.
Important notes on naming:
curl -X GET https://ridl.unhcr.org/api/3/action/organization_list?type=data-container
Note the type parameter
curl -X POST https://ridl.unhcr.org/api/3/action/organization_create \
-H "Content-Type: application/json" \
-d '{"type": "data-container", "name": "test-data-container", "title": "Test Data Container", "country": "ANG", "geographic_area": "West Africa", "groups": [{"name": "africa"}]}'
Note the type key provided, and the groups object used to define the parent Data Container. As in the UI, if a non-Sysadmin user performs the API request, the Data Container will be created with an “approval_pending” state.
curl -X GET https://ridl.unhcr.org/api/3/action/package_search?q=markets
curl -X POST https://ridl.unhcr.org/api/3/action/package_create \
-H "Content-Type: application/json" \
-d '{"name": "test-dataset-pen", "title": "Test Dataset PEN", "notes": "Some description", "owner_org": "africa", "data_collector": ["unhcr"], "keywords": ["food", "shelter"], "unit_of_measurement": "kg", "data_collection_technique": "f2f", "archived": "False"}'
Note that the values used are the ones internally expected by the validators, not the ones displayed in the UI. Also note that data files (type=data) or file attachments (type=attachment) are created separately once the dataset is created using the resource_create method.