Data set import file format

Context

Data sets allow you to synchronize any type of data that you would like to re-use in HR Service Delivery. We call such data a data set. Data sets help you manage in one place long lists of items. The lists can be hierarchical, and support several languages.

For example, let’s say that you have a list of facilities across different regions. The format of a data set allows you to specify the facilities for each region. You could also go further and specify the countries for each region, and the cities in which the facilities are located.

APIs

The data sets can be created using our APIs. See Data sets API documentation for more information.

Import file

The synchronization of a data set occurs through a regular import of a CSV file (separator: comma, encoding: UTF-8). The ingestion of the file may be done manually through the administration web interface for Document Manager, or automatically by transfer of the CSV file via SFTP, in the in/dse directory.

Note

The last 25 data set imports are kept in our databases. This includes, for each import, the data set import source file and the matching error report, when applicable.

Data set import

Warning

The synchronization of a data set is complete meaning that if the file is in the correct format, all the values before the synchronization are deleted and replaced by the new ones. This only applies for values and not dimensions (see below for more information on the differences between values and dimensions).

Data set synchronization should be done regularly to keep the list of values up to date.

Import file structure

Warning

  • CSV files must be UTF-8 encoded

  • The order of the column is not meaningful, only their names matter

Note

Text values can be surrounded by double quotes " , which is very useful if you need to use the delimiter char , in your text. If you need to use the double quote in your text value, you have to escape this character by doubling it. For example, for a final label today is "the" day, you have to type: "today is ""the"" day".

The structure of the CSV file is always made of at least four columns:

  • dimension_id

  • parent_code

  • code

  • and the translated label in the platform default language

If several languages are needed, several language columns can be added.

dimension_id

Type: VARCHAR(255)

Presence: mandatory

Restrictions: Alpha numerical characters only. Case sensitive.

This column represents the code of a dimension. A dimension could be seen as a level.
Referring to the data set of the different facilities, a dimension could be “region”, “country” or “city”.
When the file is processed, if the dimension identified by this id does not exist, it is created.

parent_code

Type: VARCHAR(255)

Presence: optional

Restrictions: Same restrictions that apply on code column. Alpha numerical characters only. Case sensitive.

Code of the parent item, i.e. code of the value of the parent dimension of the current value.
In our example “emea” is the parent code of “france”.
For a value to refer to a parent code, the parent code has to be created before (as a value). Meaning that it should be higher up in the file.
Once a data set has more than two dimensions, a value of the deepest dimension necessarily has a parent that is the child of an other value. To synchronize this information and to guarantee its unicity, you have to pass the full path of the parent codes separated by a period “.”.
In our example, the parent code for a facility in Paris is “emea.france”.

code

Type: VARCHAR(255)

Presence: mandatory for value rows, empty for dimension rows

Restrictions: Alpha numerical characters only. Case sensitive.

Code of the value. A code is required for each value for its unicity and to be able to refer to it as a parent.
This column must remain empty for dimensions.

language_code

Type: VARCHAR(255)

Presence: mandatory at least one language column

Restrictions: The name of the column should be one of the following: en-us, en-gb, fr-fr, fr-ca, de-de, pl-pl, zh-hans, zh-hant, cs-cz, da-dk, nl-nl, fi-fi, el-gr, hu-hu, it-it, ja-jp, nb-no, pt-pt, pt-br, ro-ro, ru-ru, sr-rs, sk-sk, sl-si, es-es, es-419, sv-se, tr-tr, vi-vn (IETF BCP 47 language code specifications <https://tools.ietf.org/html/bcp47>)_.

Only a subset of the languages supported by HR Service Delivery products might be available on your platform depending on your configuration.
The column matching the language set as default on your platform is mandatory.
You can then add as many columns as you want for the other languages, the column name being the IETF BCP 47 language code.
The content of the column is the label displayed for this the value in the corresponding language.

Example of a data set file

This table shows the sample data that can be found in a data set file.

Download the sample data set import file.

On the right, a comment column explains how the data in each row is interpreted in the data set, based on our facility locations example → DO NOT ADD THIS LAST COLUMN IN YOUR IMPORT FILE

dimension_id

parent_code

code

en-us

fr-fr

The first row is the header row. It is really important as the order of columns in the data set import file is not important, but the column headers are.

region

Region

Région

This row defines the region dimension, defined by the dimension_id column. This is the root dimension, so it has no parent. This row also contains the 2 translated labels to display depending on the user language.

country

region

Country

Pays

This row defines the country dimension, defined by the dimension_id column. This dimension is child to the previous region dimension, so its parent code is region. This row also contains the 2 translated labels to display depending on the user language.

city

region.country

City

Ville

This row defines the city dimension, defined by the dimension_id column. This dimension is child to the previous country dimension, so its parent code is region.country, showing all the hierarchy. This row also contains the 2 translated labels to display for this dimension depending on the user language.

region

NORAM

Canada, the United States of America and Mexico

Canada, Etats Unis d’Amérique et Mexique

This row defines a value for the region dimension. This value has a code NORAM, defined by the code column. This row also contains the 2 translated labels to display for this value depending on the user language.

region

LATAM

Latin America and the Caribbean

Amérique du Sud et Caraibes

This row defines a value for the region dimension. This value has a code LATAM, defined by the code column. This row also contains the 2 translated labels to display for this value depending on the user language.

region

EMEA

Europe, the Middle East and Africa

Europe, Moyen Orient et Afrique

This row defines a value for the region dimension. This value has a code EMEA, defined by the code column. This row also contains the 2 translated labels to display for this value depending on the user language.

region

APAC

Asia Pacific and Japan

Asie Pacifique et Japon

This row defines a value for the region dimension. This value has a code APAC, defined by the code column. This row also contains the 2 translated labels to display for this value depending on the user language.

country

EMEA

france

France

France

This row defines a value for the country dimension. This value has a code france, defined by the code column. As the country dimension is child to region dimension, this value has to be defined as child to a region value. In the parent_code column, the EMEA region code is filled. This row also contains the 2 translated labels to display for this value depending on the user language.

city

EMEA.france

paris

Paris

Paris

This row defines a value for the city dimension. This value has a code paris, defined by the code column. As the city dimension is child to region.country dimensions, this value has to be defined as child to a region.country value path. In the parent_code column, the EMEA.france code path is filled. This row also contains the 2 translated labels to display for this value depending on the user language.

country

NORAM

usa

United States of America

Etats Unis d’Amérique

This row defines a value for the country dimension. This value has a code usa, defined by the code column. As the country dimension is child to region dimension, this value has to be defined as child to a region value. In the parent_code column, the NORAM region code is filled. This row also contains the 2 translated labels to display for this value depending on the user language.

city

NORAM.usa

nyc

New York City

New York

This row defines a value for the city dimension. This value has a code nyc, defined by the code column. As the city dimension is child to region.country dimensions, this value has to be defined as child to a region.country value path. In the parent_code column, the NORAM.usa code path is filled. This row also contains the 2 translated labels to display for this value depending on the user language.

country

LATAM

argentina

Argentina

Argentine

This row defines a value for the country dimension. This value has a code argentina, defined by the code column. As the country dimension is child to region dimension, this value has to be defined as child to a region value. In the parent_code column, the LATAM region code is filled. This row also contains the 2 translated labels to display for this value depending on the user language.

city

LATAM.argentina

buenosaires

Buenos Aires

Buenos Aires

This row defines a value for the city dimension. This value has a code buenosaires, defined by the code column. As the city dimension is child to region.country dimensions, this value has to be defined as child to a region.country value path. In the parent_code column, the LATAM.argentina code path is filled. This row also contains the 2 translated labels to display for this value depending on the user language.

country

APAC

japan

Japan

Japon

This row defines a value for the country dimension. This value has a code japan, defined by the code column. As the country dimension is child to region dimension, this value has to be defined as child to a region value. In the parent_code column, the APAC region code is filled. This row also contains the 2 translated labels to display for this value depending on the user language.

city

APAC.japan

tokyo

Tokyo

Tokyo

This row defines a value for the city dimension. This value has a code tokyo, defined by the code column. As the city dimension is child to region.country dimensions, this value has to be defined as child to a region.country value path. In the parent_code column, the APAC.japan code path is filled. This row also contains the 2 translated labels to display for this value depending on the user language.

What are the limits of data sets?

Some limits on data sets are checked to prevent some failures.

Warning

Data sets cannot have more than:

  • 10 dimensions

  • 400 000 values

Note

Using a data set can improve the performance of a form, when creating a custom_form. For example, instead of creating a form using hundreds of values, we advise to create a data set and use this to add selectors to the form.

Reports

We do not create a file to provide a report after a synchronization is run.

However, an API is available to give you the full status of an import (See API documentation) as well as the different errors reported during the synchronization.