Info

This page is still being edited and reviewed.

Overview of the Converter functions

The purpose of the Chemotion Converter (in short: ChemConverter) is to convert different file formats and types from integrated devices to a common and well known data type, namely e.g. jcamp .jdx or .json. It is also possible to extract columns for 2D data visualization and Metadata. Currently, the ChemConverter can handle most text readable files like txt/csv/dta if they are properly structured, but developers can create their own readers if a rule is notable for the newly added file type.

Most of the time, the ChemConverter runs as a background process when the user or a measurement device uploads a analytic file to the ELN. The prerequisite is that an admin has created a suitable profile. There is also a standalone version of the converter which you can find and install from Github.

Technical aspect

Technically the converter is built based on three layers, namely the reader, the admin GUI and last but not least the upload & processing. The creator of reader requires mostly knowledge in (e.g python) coding while the admin of the second layer has to know the measurement technique and device very well, because it decides which info the use will get and see at the end of the third layer. The last layer is a simple Webapp executing the code or an automated process during the file upload. How the files are converted will be decided by specific rules which are listening on predefined identifiers.

First layer: The reader

The reader is one part of programming code responsible for the software to import and handle the given data format. It is the most powerful layer due to flexibility (e.g. calculation of custom Metadata or special and complex column rules and calculations).

Source codes for the readers and examples can also be found on Github. There are readers for common file types and specific ones which are heirs of the basic ones. The following list shows some (but not all) of the current readers available:

ASCII (aka simple TXT)
CSV
- NOVA (Specific reader for an instrument without Metadata, also an example for custom Metadata calculation)
DTA

Second layer: The admin Profile creation GUI

The final rules to convert a data file into a standardized Bagit-it-zip-Container containing .jdx/.json files will be determined by profiles. Which profile is responsible is determined via identifiers. Like in case of the readers almost everything the input file contains can be used as an identifier, as long as the sum of identifiers for one profile is unique. The most common identifiers are:

The file extension (.txt, .csv, .dta, ...)
The first human readable line(s) of the header
The data column titles

Purpose of the Profile GUI vs. Reader coding

The aim of the profiles is to give admin users, which are less or non-experience with Python Coding, the possibility to define rule sets for conversion. It's also possible to define different profiles using the same Reader but slightly different data files (e.g., if only one column is varying or two columns are mixed).

Third layer: Upload, processing and output

The normal user will only see the upload GUI, shown by the picture as standalone version (left) or inside the ELN (right).

Two possible upload GUIs

After the file is uploaded, it will be converted as long as there exists a suitable reader and profile. It's possible (via deep coding) to define which file types are redirected to the converter and which ones are only displayed as they are (e.g. in case of binary files like pictures) to evade errors from the Converter app.

Managing Profiles

The following section is important for privileged users and admins to help them manage their own adjustments and profiles.

Find existing profiles

If using the standalone version just write your "converter-URL"/admin/ into your browser's address bar. And if you are using the ELN integrated version you have to be logged in as an admin and click on the "Converter Profiles"-Tab on the left side. In both cases you will see the following GUI where all your profiles are listed (if there are already some of course):

Admin list of all current existing profiles

After clicking on the "Create new new profile"-Button you can finally start to ...

Create your own profile

To create a profile a first example file has to be uploaded. This file should be a maxima example with as much information as possible, even if later data files sometimes have fewer. The file type and it contents must fit to an existing reader or an error will occur. After the upload, the following default GUI will be shown:

Default screen after upload a file for profile creation - Screen 1

After all the adjustments and settings are done, the "Create profile" button will finish the job and the screen will be back to the existing profiles page. Now a non-admin user or an automatic process can upload and convert files.

Specific settings

Data Class

The "Data Class" drop down menu defines the kind of jdx output. Currently there are four used classes:

Data Class	description	requirements
XY POINTS	typical "modern data curves" where each data point is described by two coordinates (x and y). Difference between each x need not to be equal	exactly one column with x values & one column with y values
XY DATA	Classical jcamp files with the intension to save hard drive space. Difference between each x is always equal.	exactly one column with y values and defined values for FIRSTX, LASTX and DELTAX
PEAK TABLE
NTUBLES

Data Type

A drop-down list of ontologies (mostly chemical analytic techniques) defining the kind of layout for displaying and plotting curves and spectra in ChemSpectra. Not all ontologies known in the ELN are findable here because it's only for layout purposes. If you think that something is missing, feel free to tell us.

X & Y Units

A list of physical units containing the most common units with and without prefixes. If you need additional units, please tell us or use the "scalar operation" feature to convert the column values.

Table columns

A drop-down menu for defining the x and y column values while a preview can be seen on the end of the left side as "Input table data". Currently only 2D data extraction is possible so if you need more, you can create additional Output tables and merge them together with external tools. Alternative you can use the "column operation" feature to add, subtract, divide or multiply columns together.

In a future version it will become possible to create profiles without Output tables if you intend to use the converter only for Metadata extraction.

Metadata and Identifiers

Both share some similarities, even in the backend (both are called "identifier" there), with the exception that Metadata have an assigned "optional = true" flag. Every info and content the file provides and which is not explicit declared as (measurement) data by the reader can been seen as Metadata. Metadata can often be further broken into "key & value". Sometimes the key is provided together with the value, which makes it easier for the reader to recognize the string as Metadata, and in other cases, it has to be defined or assigned by a user with background and RegEx knowledge. While creating a profile you have to assign one of three match parameters to the desired key.

Match	behavior	default for
Exact Value	The value of the key will only be extracted or valid for profile choosing if it's the same as the one of the example file during profile creation.	Identifiers
Any Value	The value will always be extracted as long as the key is found in the given file.	Metadata
RegEx	The value from the input field is translated into a regular expression and compared to the string value from the input file. If the RegEx is valid, the first group of the result will be extracted as the value for the wanted key.	n.a. but often used for searching table headers with/-out line number

The extraction and assignment is now done by text fields and drop-down menus. You just have to follow this steps:

Press "Add Metadata/Identifier"
Choose a key or table (+ optional line number) as input
Choose on of three matches and if it is not "any", write a value or RegEx into the text field
Choose an output layer (comparable with a category)
Choose an output key
Optional: add scalar Operators (+,-,*,:) to the calculation of your output

In the end, it could look like this:

GUI during Metadata extraction

If the converter is integrated in the ELN and an already designed Dataset is chosen, bullet points 4 and 5 are chosen via drop-down menus otherwise typing a value is necessary.

Save

To finish your creation or save changes click on the "Create/Update profile" button or else everything will be lost.

Summary

The following picture shows the course of the three layers.

Flow diagram of the three Layers

Technical aspect​

First layer: The reader​

Second layer: The admin Profile creation GUI​

Purpose of the Profile GUI vs. Reader coding​

Third layer: Upload, processing and output​

Managing Profiles​

Find existing profiles​

Create your own profile​

Specific settings​

Data Class​

Data Type​

X & Y Units​

Table columns​

Metadata and Identifiers​

Save​

Summary​