Raw Internal Data Library

A short guideline to facilitate using RIDL

v1.5



What do I need to upload data to RIDL?

1. About 15 minutes of time (to upload and provide information)

2. Your data files (in excel, csv, Stata, r)

3. The questionnaire, concept note, final report, scripts or other technical documentation (if available)

4. The metadata files (if available)



1. What is RIDL?

UNHCR recently launched the Raw Internal Data Library (RIDL), a globally-supported, centralized, and secure internal data repository. Its purpose is to enable UNHCR to use its valuable raw data to its full potential and preserve it for future analysis and use.

In addition to RIDL, UNHCR is also launching a Microdata Library (MDL) which can serve as an externally available data repository for partners and researchers. Before being published in MDL, data will go through a curation process and will be anonymized. In cooperation with the regional and field offices, this process will be overseen by the Curation Team at UNHCR HQ.


How to access RIDL?

The UNHCR RIDL platform can be found at: ridl.unhcr.org




Figure 1.1. Login page of RIDL




Login to the RIDL website is easy: use your UNHCR email address without @unhcr.org, and the password that you use to log in to your computer, email, MSRP or other internal UNHCR applications.


CKAN

RIDL is based on a technology called CKAN, a web-based open-source management system for the storage and distribution of open data. Think of a content management system like WordPress, but for data instead of pages and blogs.
The datasets published on RIDL consist of two parts,

1. the metadata which contains information about the data, and

2. the resources, including the data files, but also questionnaires, the final report based on the data, etc.





2. How to upload your data?

In this section covers where to upload your data and what to upload.

After logging in at ridl.unhcr.org, the Dashboard will appear which shows RIDL’s News feed in the first tab, My Datasets in the second, and My Data Containers in the third tab.




Figure 2.1 The Dashboard


To be able to upload datasets to RIDL, the user needs to have access to at least one container as an ‘Editor’ or as ‘Admin’. The Dashboard's My Data Containers tab will list all containers the user has access to.


Figure 2.2 Editor access to the 'Cameroon' container


To request access to an existing data container, navigate to the container by clicking on its name and locate the 'Request Access' button.


Figure 2.3 Request access to container


The data containers are geographically ordered. There is an option to request a new data container under an existing parent container, which might be useful when several datasets are available covering the same topic. To request a new data container: go to Data Containers and click the button Request Data Container (see figure 2.3). The name of the new container should follow the naming convention: start with the 3 letter code of the country, followed by a semi-colon and the name of the container. For example, Iraq : WASH KAP.




Figure 2.3. Request a new data container


The tab My Datasets includes all the datasets the user has access to and allows the upload of new data. After logging in, go to the tab My Datasets.




Figure 2.4 Adding a dataset


This provides two options to upload data:

1. Add dataset – the user will need to create the dataset and linked resources by filling out the metadata elements one by one.

2. Import dataset from DDI/XML - which should be used if there are special xml and rdf files previously generated in Nesstar Publisher.



Option 1: Add dataset


At this stage you are asked to add information about the dataset - the metadata. This metadata will be visible for all colleagues who log in to RIDL.


The following fields are mandatory:

Title: This title will be unique across RIDL. Preferably, the title should follow the convention: survey title, year.

Description: You can add a longer description with information users of the data need to know.

Data container: These are geographically ordered, select the country and sub-folder within the country.

Internal Access Level: Who can download the data?

  • Private – Data and its associated resources (e.g., questionnaire, reports, etc.) are only accessible by those who are Members, Editors or Admins of the data container where the data are located. Others can see the metadata and can request access to the data from the Admin of the data container.
  • Internally visible – Data and its associated resources are available to all with access to RIDL (i.e., all UNHCR colleagues with UNHCR credentials).

Data collector: Select the organization(s) responsible for the data collection.

Topic classifications: You can select multiple topics to describe the dataset. This will ensure the dataset is easy to find.

Unit of measurement: What is the unit of measurement, i.e. what is each observation? A household, individual, child, business, etc.?

Data collection technique: You can select the data collection technique from a scroll down menu.

Archived yes/no: Is the data archived, or still active?


Besides the required fields, there is room to provide additional information, such as Sampling Procedure. It is recommended to add all relevant information. The next page provides the opportunity to upload the actual data (see Figure 3.2). Data can take any format such as xlsx, csv, Stata. It also allows to upload additional documents. It is recommended to upload the questionnaire, as well as the field-work report and the final report.



Option 2: Import Dataset from DDI


! IMPORTANT ! To enable the importation of DDI files, the RIDL Nesstar template should be used to create the DDI files. Download the template from here.


If you choose the second option, Import dataset from DDI/XML, you will be asked to:

1. Upload or link an XML and RDF file (see figure 2.3).

2. Select the data container: the country or geographical area and the sub-folder to which the data belongs.

3. Set the visibility, private or public.

Private: The data and other added resources are only accessible for colleagues who have access to the same data container. Note that the metadata is always visible for all colleagues using RIDL.

Internally visible: The data and resources added are accessible for all colleagues within UNHCR.


Once all fields are filled, click the button Import to upload the files. Once the files are uploaded, you can add data (see Figure 3.1) and other resources.



Figure 2.3. Uploading a dataset using xml and rdf




3. What to upload?

After uploading the DDI files, or providing the information using the RIDL form as discussed above, you will be asked to add resources. Data files as well as the questionnaire and report should be added.


Data files




Figure 3.1. Adding data files


To upload data, click on Upload and select the relevant file from your computer, and provide additional information about the data (see Figure 3.1). Each data file needs to be uploaded separately. The following information is required:


  • Name: which should follow the naming convention: UNHCR, ISO country code, data series (SEA, SENS, KAP, etc.), year (if multiple per year, followed by a, b, c), (if relevant, the observation level- e.g., child, household), version of data. Example: UNHCR_CMR_SENS_2018_women_v1_3.
  • Date: the start and end date of the data collection, in the format year/month/day, e.g. 2018/01/20.
  • Version: the version (number) of the data.
  • File process status: is the data raw, cleaned, or cleaned and anonymized?
  • Identifiability: is the data raw, cleaned, or cleaned and anonymized?

Questionnaire and other supporting documents

Please include the questionnaire, preferably the final version with variable names that correspond to the variable names in the data. Other resources, such as the concept note, the final report or field-work reports, and scripts written to process and analyze the data should also be added. To add these, click the file attachment, upload the file and add the name of the resource and its format (pdf, word, etc.), a short description may be added (e.g. questionnaire version number).



4. How to import data to RIDL from KoBoToolbox

If you are a user with Editor or Admin permission for (at least) one Data Container you will be able to create new datasets by importing your survey data directly from Kobo.






... or to your user page, where you will find a new tab called ‘KoBoToolbox’.




Autorization

You will need to add your Kobo token. You can get it following this KoBo tutorial.





Once you set up your Kobo token you will see a list of all the Kobo surveys you have access to.





Some surveys may not have been deployed or have already been imported (and you cannot import a survey twice).

The Import button will show you which surveys are ready to import.


Once you start the import process, you will see the regular New dataset form, which is now called Create dataset from Kobo. Some fields will be pre-filled with metadata from KoBo, such as the Original ID field, which will contain the KoBo ID of this survey.





Upon creating your dataset from the import, you can configure the import using the same settings that are available directly in Kobo when exporting your data (e.g., labels vs xml values, group separator, etc.).

Click on ‘Survey import settings’ to open the panel to adjust the import settings, select which fields to include in the data files or even execute a Mongo query to filter submissions to import.



After describing your data by providing as many details as possible, continue to next step. You will automatically see some resources:

  • the form definition in XLS format
  • the data in JSON, CSV and XLS format

The actual data download process happens in the background. Until completed, you will see the legend The resource is pending download for each resource.





You can edit metadata for any of these resources, but you can't change the data files imported from Kobo.







Once a dataset and its resources are synced from Kobo, you are able to update existing resources by pulling the latest changes from KoboKobo. At this time, it is into possible to carry out this update by creating new resources or a completely new dataset

The KoboToolbox tab in your Dashboard will detect when new submissions are available to an already imported project. The update can be initiated directly from here by clicking on the Update Kobo data button.





The same process can also be triggered from the dataset itself using the same button.





The resources will be in a ‘sync pending’ state until the update of all resources is finished.





In case there are no new submission, the update can still be forced to fetch changes to existing records (edits, validations, etc.).









5. Request Data

RIDL provides an option for users to request access to data in private datasets. To do this, navigate to the dataset and locate the 'Request access' button or the link in the notification. Once clicking on either option, a pop-up window will appear, where additional information should be supplied about the requestor and the intended use of the dataset.



Figure 5.1. Request access to dataset

Once the request is submitted, a notification email will be sent to the administrators of the data container, who will then approve or reject the request.



6. RIDL’s Data Deposit

RIDL includes a data deposit option with which UNHCR colleagues and partner organizations can share data (note that this is the only sharing option for partners).


Partners sharing data via RIDL will need to request an account mentioning their UNHCR focal point, which will be reviewed by the country/regional RIDL focal points. Once data is uploaded, regional or country-level RIDL focal points will evaluate it for completeness and if needed, contact the Curation Team which can assist with the data anonymization required before sharing the data on UNHCR’s Microdata Library.


Please download the Data Deposit user guide applicable to you below:

UNHCR colleagues: Internal Data Deposit

Partner organizations: External Data Deposit




7. API Documentation

As with all CKAN instances, all the functionality available to the RIDL browser interface can be accessed programmatically via the API. For an introduction to working with the CKAN API and a comprehensive list of all methods available please refer to the official CKAN API documentation

Authentication and Authorization

The API uses the same authorization that the web interface so users will be able to perform (or not perform) the same actions that they have permission to do on the UI when using the API. For API operations an API token is used for authentication.

How to get your user token?

You can get your user token by logging in to RIDL and going to: https://ridl.unhcr.org/user/yourusername, or by clicking on the user settings.






You will be able to see your tokens under the dedicated tab.




RIDL allows for the creation of multiple tokens for different purposes.




To authenticate as a particular user the Authorization header must be sent as part of the request, including the API key provided to each user on the user details page:

For instance:

curl -X GET -H Authorization:API-KEY
https://ridl.unhcr.org/api/3/action/package_search

For simplicity we won’t include the Authorization header in the rest of examples in this section, but it is required on all API requests to the site.


Important notes on naming:

  • For historical reasons the CKAN API refers to datasets as packages (e.g. package_show, package_search, etc.)
  • Data Containers are internally implemented using CKAN’s Organizations feature, so when interacting with them via the API the organization_* methods need to be used (eg organization_create, organization_list, etc.)
  • Examples


    Loading data containers

    curl -X GET https://ridl.unhcr.org/api/3/action/organization_list?type=data-container

    Note the type parameter


    Creating or Requesting a Data Container

    curl -X POST https://ridl.unhcr.org/api/3/action/organization_create \
    -H "Content-Type: application/json" \
    -d '{"type": "data-container", "name": "test-data-container", "title": "Test Data Container", "country": "ANG", "geographic_area": "West Africa", "groups": [{"name": "africa"}]}'

    Note the type key provided, and the groups object used to define the parent Data Container. As in the UI, if a non-Sysadmin user performs the API request, the Data Container will be created with an “approval_pending” state.


    Searching Datasets

    curl -X GET https://ridl.unhcr.org/api/3/action/package_search?q=markets


    Creating a Dataset

    curl -X POST https://ridl.unhcr.org/api/3/action/package_create \
    -H "Content-Type: application/json" \
    -d '{"name": "test-dataset-pen", "title": "Test Dataset PEN", "notes": "Some description", "owner_org": "africa", "data_collector": ["unhcr"], "keywords": ["food", "shelter"], "unit_of_measurement": "kg", "data_collection_technique": "f2f", "archived": "False"}'


    Note that the values used are the ones internally expected by the validators, not the ones displayed in the UI. Also note that data files (type=data) or file attachments (type=attachment) are created separately once the dataset is created using the resource_create method.