Data set import file format¶

Context¶

Data sets allow you to synchronize any type of data that you would like to re-use in HR Service Delivery. We call such data a data set. Data sets help you manage in one place long lists of items. The lists can be hierarchical, and support several languages.

For example, let’s say that you have a list of facilities across different regions. The format of a data set allows you to specify the facilities for each region. You could also go further and specify the countries for each region, and the cities in which the facilities are located.

APIs¶

The data sets can be created using our APIs. See Data sets API documentation for more information.

Import file¶

The synchronization of a data set occurs through a regular import of a CSV file (separator: comma, encoding: UTF-8). The ingestion of the file may be done manually through the administration web interface for Document Manager, or automatically by transfer of the CSV file via SFTP, in the in/dse directory.

Note

The last 25 data set imports are kept in our databases. This includes, for each import, the data set import source file and the matching error report, when applicable.

Data set import¶

Warning

The synchronization of a data set is complete meaning that if the file is in the correct format, all the values before the synchronization are deleted and replaced by the new ones. This only applies for values and not dimensions (see below for more information on the differences between values and dimensions).

Data set synchronization should be done regularly to keep the list of values up to date.

Import file structure¶

Warning

CSV files must be UTF-8 encoded
The order of the column is not meaningful, only their names matter

Note

Text values can be surrounded by double quotes " , which is very useful if you need to use the delimiter char , in your text. If you need to use the double quote in your text value, you have to escape this character by doubling it. For example, for a final label today is "the" day, you have to type: "today is ""the"" day".

The structure of the CSV file is always made of at least four columns:

dimension_id

parent_code

code

and the translated label in the platform default language

If several languages are needed, several language columns can be added.

dimension_id¶

Type: VARCHAR(255)

Presence: mandatory

Restrictions: Alpha numerical characters only. Case sensitive.

This column represents the code of a dimension. A dimension could be seen as a level.

Referring to the data set of the different facilities, a dimension could be “region”, “country” or “city”.

When the file is processed, if the dimension identified by this id does not exist, it is created.

parent_code¶

Type: VARCHAR(255)

Presence: optional

Restrictions: Same restrictions that apply on code column. Alpha numerical characters only. Case sensitive.

Code of the parent item, i.e. code of the value of the parent dimension of the current value.

In our example “emea” is the parent code of “france”.

For a value to refer to a parent code, the parent code has to be created before (as a value). Meaning that it should be higher up in the file.

Once a data set has more than two dimensions, a value of the deepest dimension necessarily has a parent that is the child of an other value. To synchronize this information and to guarantee its unicity, you have to pass the full path of the parent codes separated by a period “.”.

In our example, the parent code for a facility in Paris is “emea.france”.

code¶

Type: VARCHAR(255)

Presence: mandatory for value rows, empty for dimension rows

Restrictions: Alpha numerical characters only. Case sensitive.

Code of the value. A code is required for each value for its unicity and to be able to refer to it as a parent.

This column must remain empty for dimensions.

language_code¶

Type: VARCHAR(255)

Presence: mandatory at least one language column

Restrictions: The name of the column should be one of the following: en-us, en-gb, fr-fr, fr-ca, de-de, pl-pl, zh-hans, zh-hant, cs-cz, da-dk, nl-nl, fi-fi, el-gr, hu-hu, it-it, ja-jp, nb-no, pt-pt, pt-br, ro-ro, ru-ru, sr-rs, sk-sk, sl-si, es-es, es-419, sv-se, tr-tr, vi-vn (IETF BCP 47 language code specifications <https://tools.ietf.org/html/bcp47>)_.

Only a subset of the languages supported by HR Service Delivery products might be available on your platform depending on your configuration.

The column matching the language set as default on your platform is mandatory.

You can then add as many columns as you want for the other languages, the column name being the IETF BCP 47 language code.

The content of the column is the label displayed for this the value in the corresponding language.

Example of a data set file¶

This table shows the sample data that can be found in a data set file.

Download the sample data set import file.

On the right, a comment column explains how the data in each row is interpreted in the data set, based on our facility locations example → DO NOT ADD THIS LAST COLUMN IN YOUR IMPORT FILE

dimension_id	parent_code	code	en-us	fr-fr	The first row is the header row. It is really important as the order of columns in the data set import file is not important, but the column headers are.
region			Region	Région	This row defines the region dimension, defined by the dimension_id column. This is the root dimension, so it has no parent. This row also contains the 2 translated labels to display depending on the user language.
country	region		Country	Pays	This row defines the country dimension, defined by the dimension_id column. This dimension is child to the previous region dimension, so its parent code is region. This row also contains the 2 translated labels to display depending on the user language.
city	region.country		City	Ville	This row defines the city dimension, defined by the dimension_id column. This dimension is child to the previous country dimension, so its parent code is region.country, showing all the hierarchy. This row also contains the 2 translated labels to display for this dimension depending on the user language.
region		NORAM	Canada, the United States of America and Mexico	Canada, Etats Unis d’Amérique et Mexique	This row defines a value for the region dimension. This value has a code NORAM, defined by the code column. This row also contains the 2 translated labels to display for this value depending on the user language.
region		LATAM	Latin America and the Caribbean	Amérique du Sud et Caraibes	This row defines a value for the region dimension. This value has a code LATAM, defined by the code column. This row also contains the 2 translated labels to display for this value depending on the user language.
region		EMEA	Europe, the Middle East and Africa	Europe, Moyen Orient et Afrique	This row defines a value for the region dimension. This value has a code EMEA, defined by the code column. This row also contains the 2 translated labels to display for this value depending on the user language.
region		APAC	Asia Pacific and Japan	Asie Pacifique et Japon	This row defines a value for the region dimension. This value has a code APAC, defined by the code column. This row also contains the 2 translated labels to display for this value depending on the user language.
country	EMEA	france	France	France	This row defines a value for the country dimension. This value has a code france, defined by the code column. As the country dimension is child to region dimension, this value has to be defined as child to a region value. In the parent_code column, the EMEA region code is filled. This row also contains the 2 translated labels to display for this value depending on the user language.
city	EMEA.france	paris	Paris	Paris	This row defines a value for the city dimension. This value has a code paris, defined by the code column. As the city dimension is child to region.country dimensions, this value has to be defined as child to a region.country value path. In the parent_code column, the EMEA.france code path is filled. This row also contains the 2 translated labels to display for this value depending on the user language.
country	NORAM	usa	United States of America	Etats Unis d’Amérique	This row defines a value for the country dimension. This value has a code usa, defined by the code column. As the country dimension is child to region dimension, this value has to be defined as child to a region value. In the parent_code column, the NORAM region code is filled. This row also contains the 2 translated labels to display for this value depending on the user language.
city	NORAM.usa	nyc	New York City	New York	This row defines a value for the city dimension. This value has a code nyc, defined by the code column. As the city dimension is child to region.country dimensions, this value has to be defined as child to a region.country value path. In the parent_code column, the NORAM.usa code path is filled. This row also contains the 2 translated labels to display for this value depending on the user language.
country	LATAM	argentina	Argentina	Argentine	This row defines a value for the country dimension. This value has a code argentina, defined by the code column. As the country dimension is child to region dimension, this value has to be defined as child to a region value. In the parent_code column, the LATAM region code is filled. This row also contains the 2 translated labels to display for this value depending on the user language.
city	LATAM.argentina	buenosaires	Buenos Aires	Buenos Aires	This row defines a value for the city dimension. This value has a code buenosaires, defined by the code column. As the city dimension is child to region.country dimensions, this value has to be defined as child to a region.country value path. In the parent_code column, the LATAM.argentina code path is filled. This row also contains the 2 translated labels to display for this value depending on the user language.
country	APAC	japan	Japan	Japon	This row defines a value for the country dimension. This value has a code japan, defined by the code column. As the country dimension is child to region dimension, this value has to be defined as child to a region value. In the parent_code column, the APAC region code is filled. This row also contains the 2 translated labels to display for this value depending on the user language.
city	APAC.japan	tokyo	Tokyo	Tokyo	This row defines a value for the city dimension. This value has a code tokyo, defined by the code column. As the city dimension is child to region.country dimensions, this value has to be defined as child to a region.country value path. In the parent_code column, the APAC.japan code path is filled. This row also contains the 2 translated labels to display for this value depending on the user language.

What are the limits of data sets?¶

Some limits on data sets are checked to prevent some failures.

Warning

Data sets cannot have more than:

10 dimensions
400 000 values

Note

Using a data set can improve the performance of a form, when creating a custom_form. For example, instead of creating a form using hundreds of values, we advise to create a data set and use this to add selectors to the form.

Reports¶

We do not create a file to provide a report after a synchronization is run.

However, an API is available to give you the full status of an import (See API documentation) as well as the different errors reported during the synchronization.