In Holistics, you can import data from other sources (new SQL sources have not been connected, CSVs, Google Sheets...) with Import Models. The general workflow is:
- Go to Data Modeling page, click Create -> Add Data Model from Other Sources, or click the (+) next to the folder that you want to put your Import Model in.
- Select a data source type. If there is no available data source of that type, you will be prompted to connect a relevant source (visit Data Sources for more details)
- Select the data you want to import.
- Change Destination Settings and Sync Configuration to your liking.
- Click Create to finish the process.
When creating an Import Model, this is what happens behind the scene:
- Holistics connects to your third-party source via its API and downloads the requested data.
- A new empty table is created in a schema of your choice.
- Downloaded data is then inserted into the empty table
- The downloaded data is cleared.
- A Data Model is created on top of the persisted table.
Advance Settings section gives you more granular control of your data import:
Here you can specify the schema and the table name to write your data to.
- The Schema Name will default to the Default Schema that you selected when you first connected your data source.
- The Table Name by default will be prefixed with
persisted_. If a table with the same name already existed in the schema, the new name will be suffixed with a random number to differentiate it. However, you can choose to overwrite the existed table.
For example, your Source
Table name is Users and the
Default Schema in your Destination is public, then your corresponding table in the destination will be public.persisted_users.
By setting the Refresh Schedule, the persisted table can be automatically updated with new data. The default option is
Daily at 7:00.
Holistics provides you with four loading modes to accommodate your different needs:
- Full mode: Your old data will be dropped entirely and replaced with new data
- Append mode: Append new data to the table while retaining old data.
- Incremental: Rely on
Incremental Columnto load only new data from Source to Destination and keep existing Data from Destination
- Upsert: Combination of Update and Insert.
Incremental Column: this column will be checked to select and load new records from Source to Destination
Primary Key: new records with same primary key value will overwrite existing ones
The Full mode is selected by default. For more information on when you should use which mode, please visit Storage settings.
Source Column names could be arbitrary if the data source is not a standardized one (for example CSV files, or Google Sheets). By default, Holistics will normalize the source column names (use all lowercase alpha-numeric characters and underscores). However, users should still pay attention to the different naming conventions supported by databases and make changes accordingly.
To best assist users when importing data, Holistics will:
- Map your data to one of the Generic Data Types first (Whole Number, Decimal, TrueFalse, Date, DataTime, Text)
- Then select the suitable data type in your destination database.
For example, you want to import data from SQL Server and your destination Data Warehouse is Google BigQuery. The source table in SQL Server has columns in
TINYINT. What will happen in Holistics:
- The integer columns in SQL Server are mapped to our Whole Number data type
- Next, the Whole Number data type is mapped to
INT64type in BigQuery
Please refer to the Data Type Mappings section for more details.
In most cases, Holistics can interpret the data being loaded in and map your fields to data types supported by the destination database. However, in more complicated cases you can manually map data types by using the Custom type selection:
If this is checked, the column is allowed to have NULL. If unchecked, the loading operation will fail if there is a row in the column with no value. This particularly is useful when you want to validate your data logic.
By default, all the columns in Sync Configuration will be Nullable
If you want to exclude any columns from being loaded to the Destination, you can remove them here.
Currently, this option is not available for no-SQL Data Sources (Spreadsheet, CSV, MongoDB,...)
smallint., int., serial., smallserial.
double., real., decimal., numeric.
varchar., char., enum., text., binary., bigint., bigserial.*
string, bytes, int64
tinyint., smallint., mediumint., int.
decimal., float., double.*
varchar., char., text., longtext., enum., binary., blob., varbinary., bigint.*
tinyint, int, integer, smallint
decimal., dec., double precision, float., real, numeric.
datetime, datetime2, datetimeoffset, smalldatetime
text, nchar., varchar., nvarchar., ntext, xml, uniqueidentifier, char., character.*, bigint
varchar., nvarchar., varchar2., char., nchar., nvarchar2., long., blob., raw., long raw.
array., object., bigint.*
time., string., currency.*
timestamp without timezone
- The suggested data type is based on a sample of your data, so in some cases, it could fail if there are unexpected values in your data (for example, in a Google Sheet the first few rows can have numeric values but in a later row a string value can be mixed in.)
- If the data type cannot be interpreted, it will be mapped to Text type.
Updated 5 months ago