Create my first dataset

The goal of this guide is to explore QuestDB's features and to interact with time series data and assumes you have an instance running. You can find guides to setup QuestDB on the introduction page. In this tutorial, you will learn how to:

  1. Create tables
  2. Populate tables with sample data
  3. Run simple and advanced queries
  4. Delete tables

As an example, we will look at hypothetical temperature readings from a variety of sensors.

info

All commands are run through the Web Console accessible at http://localhost:9000.

You can also run the same SQL via the Postgres endpoint or the REST API.

Creating a table#

The first step is to create tables. One table will contain the metadata of our sensors, and the other will contain the actual readings (payload data) from these sensors.

Let's start by creating the sensors table:

CREATE TABLE sensors (ID LONG, make STRING, city STRING);

For more information about this statement, please refer to the CREATE TABLE reference documentation.

Inserting data#

Let's populate our sensors table with procedurally-generated data:

Insert as SELECT
INSERT INTO sensors
SELECT
x ID, --increasing integer
rnd_str('Eberle', 'Honeywell', 'Omron', 'United Automation', 'RS Pro') make,
rnd_str('New York', 'Miami', 'Boston', 'Chicago', 'San Francisco') city
FROM long_sequence(10000) x
;

For more information about insert statements, refer to the INSERT reference documentation. To learn more about the functions used here, see the random generator and row generator pages.

Our sensors table now contains 10,000 randomly-generated sensor values of different makes and in various cities. Use this command to view the table:

'sensors';

It should look like the table below:

IDmakecity
1HoneywellChicago
2United AutomationMiami
3HoneywellChicago
4OmronMiami
.........

Let's now create some sensor readings. In this case, we will create the table and generate the data at the same time:

Create table as
CREATE TABLE readings
AS(
SELECT
x ID,
timestamp_sequence(to_timestamp('2019-10-17T00:00:00', 'yyyy-MM-ddTHH:mm:ss'), rnd_long(1,10,0) * 100000L) ts,
rnd_double(0)*8 + 15 temp,
rnd_long(0, 10000, 0) sensorId
FROM long_sequence(10000000) x)
TIMESTAMP(ts)
PARTITION BY MONTH;

The query above demonstrates how to use the following features:

  • TIMESTAMP(ts) elects the ts column as a designated timestamp. This enables partitioning tables by time.
  • PARTITION BY MONTH creates a monthly partitioning strategy where the stored data is effectively sharded by month.

The generated data will look like the following:

IDtstempsensorId
12019-10-17T00:00:00.000000Z19.373739119160
22019-10-17T00:00:00.600000Z21.911846179671
32019-10-17T00:00:01.400000Z16.583678348731
42019-10-17T00:00:01.500000Z16.693088153447
52019-10-17T00:00:01.600000Z19.679915697985
............

Running queries#

Let's select all records from the readings table (note that SELECT * FROM is optional in QuestDB):

readings;

Let's also select the count of records from readings:

SELECT count() FROM readings;
count
10,000,000

and the average reading:

SELECT avg(temp) FROM readings;
average
18.999217780895

We can now use the sensors table alongside the readings table to get more interesting results using a JOIN:

SELECT *
FROM readings
JOIN(
SELECT ID sensId, make, city
FROM sensors)
ON readings.sensorId = sensId;

The results should look like the table below:

IDtstempsensorIdsensIdmakecity
12019-10-17T00:00:00.000000Z16.47220046098232113211OmronNew York
22019-10-17T00:00:00.100000Z16.59843203359923192319HoneywellSan Francisco
32019-10-17T00:00:00.100000Z20.29368174700987238723HoneywellNew York
42019-10-17T00:00:00.100000Z20.939263119843885885RS ProSan Francisco
52019-10-17T00:00:00.200000Z19.33666005902932003200HoneywellSan Francisco
62019-10-17T00:00:01.100000Z20.94664357695440534053HoneywellMiami
Aggregation keyed by city
SELECT city, max(temp)
FROM readings
JOIN(
SELECT ID sensId, city
FROM sensors) a
ON readings.sensorId = a.sensId;

The results should look like the table below:

citymax
New York22.999998786398
San Francisco22.999998138348
Miami22.99999994818
Chicago22.999991705861
Boston22.999999233377
Aggregation by hourly time buckets
SELECT ts, city, make, avg(temp)
FROM readings timestamp(ts)
JOIN
(SELECT ID sensId, city, make
FROM sensors
WHERE city='Miami' AND make='Omron') a
ON readings.sensorId = a.sensId
WHERE ts IN '2019-10-21;1d' -- this is an interval between 21-10 and 1 day later

The results should look like the table below:

tscitymakeaverage
2019-10-21T00:00:44.600000ZMiamiOmron20.004285872098
2019-10-21T00:00:52.400000ZMiamiOmron16.68436714013
2019-10-21T00:01:05.400000ZMiamiOmron15.243684089291
2019-10-21T00:01:06.100000ZMiamiOmron17.193984104315
2019-10-21T00:01:07.100000ZMiamiOmron20.778686822666
............

For more information about these statements, please refer to the SELECT and JOIN pages.

Deleting tables#

We can now clean up the demo data by using DROP TABLE SQL. Be careful using this statement as QuestDB cannot recover data that is deleted in this way:

DROP TABLE readings;
DROP TABLE sensors;