CSV import via COPY SQL
caution
For partitioned tables, the best COPY performance can be achieved only on a
machine with a local, physically attached SSD. It is possible to use a network
block storage, such as an AWS EBS volume to perform the operation, with the
following impacts:
- Users need to configure the maximum IOPS and throughput setting values for the volume.
- The required import time is likely to be 5-10x longer.
The COPY SQL command is the preferred way to import large CSV files into partitioned tables. It should be used to migrate data from another database into QuestDB. This guide describes the method of migrating data to QuestDB via CSV files. For the time being this is the only way to migrate data from other databases into QuestDB.
This guide is applicable for QuestDB version 6.5 and higher.
Prepare the import#
Preparation is key. Import is a multi-step process, which consists of:
- Export the existing database as CSV files
- Enable and configure COPYcommand to be optimal for the system
- Prepare target schema in QuestDB
Export the existing database#
Export data using one CSV file per table. Make sure to export a column, which can be used as timestamp. Data in CSV is not expected to be in any particular order. If it is not possible to export the table as one CSV, export multiple files and concatenate these files before importing into QuestDB.
Concatenate multiple CSV files#
The way to concatenate files depends on whether the CSV files have headers.
For CSV files without headers, concatenation is straightforward:
- Linux
- macOS
- Windows PowerShell
For CSV files with headers, concatenation can be tricky. You could manually remove the first line of the files before concatenating, or use some smart command line to concatenate and remove the headers. A good alternative is using the open source tool csvstack.
This is how you can concatenate multiple CSV files using csvstack:
Things to know about COPY#
- COPYis disabled by default, as a security precaution. Configuration is required.
- COPYis more efficient when source and target disks are different.
- COPYis parallel when target table is partitioned.
- COPYis serial when target table is non-partitioned, out-of-order timestamps will be rejected.
- COPYcannot import data into non-empty table.
- COPYindexes CSV file; reading indexed CSV file benefits hugely from disk IOPS. We recommend using NVME.
- COPYimports one file at a time; there is no internal queuing system yet.
Configure COPY#
- Enable COPYand configureCOPYdirectories to suit your server.
- cairo.sql.copy.rootmust be set for- COPYto work.
Create the target table schema#
If you know the target table schema already, you can skip this section.
QuestDB could analyze the input file and "guess" the schema. This logic is activated when target table does not exist.
To have QuestDB help with determining file schema, it is best to work with a sub-set of CSV. A smaller file allows us to iterate faster if iteration is required.
Let's assume we have the following CSV:
- Extract the first 1000 line to test_file.csv(assuming both files are in thecairo.sql.copy.rootdirectory):
- Use a simple - COPYcommand to import- test_file.csvand define the table name:
Table weather is created and it quickly returns an id of asynchronous import
process running in the background:
| id | 
|---|
| 5179978a6d7a1772 | 
- In the Web Console right click table and select - Copy Schema to Clipboard- this copies the schema generated by the input file analysis.
- Paste the table schema to the code editor: 
- Identify the correct schema: - 5.1. The generated schema may not be completely correct. Check the log table and log file to resolve common errors using the id (see also Track import progress and FAQ): 
| ts | id | table | file | phase | status | message | rows_handled | rows_imported | errors | 
|---|---|---|---|---|---|---|---|---|---|
| 2022-08-08T16:38:06.262706Z | 5179978a6d7a1772 | weather | test_file.csvtest_file.csv | finished | 999 | 999 | 0 | ||
| 2022-08-08T16:38:06.226162Z | 5179978a6d7a1772 | weather | test_file.csvtest_file.csv | started | 0 | 
Check rows_handled, rows_imported, and message for any errors and amend
the schema as required.
5.2. Drop the table and re-import test_file.csv using the updated schema.
- Repeat the steps to narrow down to a correct schema. - The process may require either truncating: - or dropping the target table: 
- Clean up: Once all the errors are resolved, copy the final schema, drop the small table. 
- Make sure table is correctly partitioned. The final schema in our example should look like this: 
- Ready for import: Create an empty table using the final schema. 
Import CSV#
Once an empty table is created in QuestDB using the correct schema, import can be initiated with:
It quickly returns id of asynchronous import process running in the background:
| id | 
|---|
| 55020329020b446a | 
Track import progress#
COPY returns an id for querying the log table (sys.text_import_log), to
monitor the progress of ongoing import:
| ts | id | table | file | phase | status | message | rows_handled | rows_imported | errors | 
|---|---|---|---|---|---|---|---|---|---|
| 2022-08-03T14:00:40.907224Z | 55020329020b446a | weather | weather.csv | null | started | null | null | null | 0 | 
| 2022-08-03T14:00:40.910709Z | 55020329020b446a | weather | weather.csv | analyze_file_structure | started | null | null | null | 0 | 
| 2022-08-03T14:00:42.370563Z | 55020329020b446a | weather | weather.csv | analyze_file_structure | finished | null | null | null | 0 | 
| 2022-08-03T14:00:42.370793Z | 55020329020b446a | weather | weather.csv | boundary_check | started | null | null | null | 0 | 
Looking at the log from the newest to the oldest might be more convenient:
Once import successfully ends the log table should contain a row with a 'null' phase and 'finished' status :
| ts | id | table | file | phase | status | message | rows_handled | rows_imported | errors | 
|---|---|---|---|---|---|---|---|---|---|
| 2022-08-03T14:10:59.198672Z | 55020329020b446a | weather | weather.csv | null | finished | 300000000 | 300000000 | 0 | 
Import into non-partitioned tables uses single-threaded implementation (serial
import) that reports only start and finish records in the status table. Given an
ordered CSV file weather1mil.csv, when importing, the log table shows:
| ts | id | table | file | phase | status | message | rows_handled | rows_imported | errors | 
|---|---|---|---|---|---|---|---|---|---|
| 2022-08-03T15:00:40.907224Z | 42d31603842f771a | weather | weather1mil.csv | null | started | null | null | null | 0 | 
| 2022-08-03T15:01:20.000709Z | 42d31603842f771a | weather | weather1mil.csv | null | finished | null | 999999 | 999999 | 0 | 
The log table contains only coarse-grained, top-level data. Import phase run
times vary a lot (e.g. partition_import often takes 80% of the whole import
execution time), and therefore
the server log provides an alternative
to follow more details of import:
If the ON ERROR option is set to ABORT,
import stops on the first error and the error is logged. Otherwise, all errors
are listed in the log.
The reference to the error varies depending on the phase of an import:
- In the indexing phase, if an error occurs, the absolute input file line is referenced:
- In the data import phase, if an error occurs, the log references the offset as related to the start of the file.
The errored rows can then be extracted for further investigation.
FAQ#
What happens in a database crash or OS reboot?
If reboot/power loss happens while partitions are being attached, then table might be left with incomplete data. Please truncate table before re-importing with:
If reboot/power loss happens before any partitions being attached, the import should not be affected.
I'm getting "COPY is disabled ['cairo.sql.copy.root' is not set?]" error message
Please set cairo.sql.copy.root setting, restart the instance and try again.
I'm getting "could not create temporary import work directory [path='somepath', errno=-1]" error message
Please make sure that the cairo.sql.copy.root and cairo.sql.copy.work.root
are valid paths pointing to existing directories.
I'm getting "[2] could not open read-only [file=somepath]" error message
Please check that import file path is valid and accessible to QuestDB instance users.
If you are running QuestDB using Docker, please check if the directory mounted
for storing source CSV files is identical to the one cairo.sql.copy.root
property or QDB_CAIRO_SQL_COPY_ROOT environment variable points to.
For example, the following command can start a QuestDB instance:
However, running:
Results in the "[2] could not open read-only [file=/tmp/questdb_wrong/weather_example.csv]" error message.
I'm getting "column count mismatch [textColumnCount=4, tableColumnCount=3, table=someTable]" error message
There are more columns in input file than in the existing target table. Please remove column(s) from input file or add them to the target table schema.
I'm getting "timestamp column 'ts2' not found in file header" error message
Either input file is missing header or timestamp column name given in COPY
command is invalid. Please add file header or fix timestamp option.
I'm getting "column is not a timestamp [no=0, name='ts']" error message
Timestamp column given by the user or (if header is missing) assumed based on
target table schema is of a different type.
Please check timestamp column name in input file header or make sure input file
column order matches that of target table.
I'm getting "target table must be empty [table=t]" error message
COPY doesn't yet support importing into partitioned table with existing data.
Please truncate table before re-importing with:
or import into another empty table and then use INSERT INTO SELECT:
to copy data into original target table.
I'm getting "io_uring error" error message
It's possible that you've hit a IO_URING-related kernel error.
Please set cairo.iouring.enabled setting to false, restart QuestDB instance,
and try again.
I'm getting "name is reserved" error message
The table you're trying import into is in bad state (metadata is incomplete).
Please either drop the table with:
and recreate the table or change the table name in the COPY command.
I'm getting "Unable to process the import request. Another import request may be in progress." error message
Only one import can be running at a time.
Either cancel running import with:
or wait until the current import is finished.
Import finished but table is (almost) empty
Please check the latest entries in log table:
If "errors" column is close to number of records in the input file then it may mean:
- FORMAToption of- COPYcommand or auto-detected format doesn't match timestamp column data in file
- Other column(s) can't be parsed and ON ERROR SKIP_ROWoption was used
- Input file is unordered and target table has designated timestamp but is not partitioned
If none of the above causes the error, please check the log file for messages like:
or
that should explain why rows were rejected. Note that in these examples, the former log message mentions the absolute input file line while the latter is referencing the offset as related to the start of the file.
Import finished but table column names are `f0`, `f1`, ...
Input file misses header and target table does not exist, so columns received
synthetic names . You can rename them with the ALTER TABLE command: