Introducing DuckPlot: the missing link between DuckDB and visualization

Motivation

Existing data visualization workflows make you repeat yourself. In one environment you write a SQL query to format your data, and in another you express how you want to display it. Add the headache of translating your SQL results (arrow tables? parquet files?) into a format that your charting library understands, and now you have three problems to solve to make a single chart.

We wanted a charting library that interacted directly with our DuckDB databases, so we built DuckPlot. DuckPlot uses a single configuration to generate both the underlying SQL and chart settings needed to materialize your desired visualization. We’re able to accomplish this by elevating the computational efficiency and schematic awareness of DuckDB directly into the charting library, instead of treating it as an external accessory.

DuckPlot uses your chart configuration to infer what SQL needs to be written, so you can focus on what you want to see rather than how to get it.

Built on top of Observable Plot and D3, DuckPlot is a flexible, feature rich, and open-source library for generating charts directly from your DuckDB instance.

Digging in

Imagine you have this table of Olympic athletes in a DuckDB instance:

Loading data...

The first question we’ll answer is

Which countries have won the most olympic medals?

DuckPlot leverages it’s connection to DuckDB to generate both the SQL required to answer the question and the Plot configuration needed to visualize the result:

// Create a DuckPlot instance
const duckPlot = new DuckPlot(ddbInstance)

// Plot medals by nationality
duckPlot
  .table('athletes') // Use the athletes table
  .x('nationality')
  .y(['gold', 'silver', 'bronze']) // Stack and sum these!
  .mark('barY')
  .options({
    color: { range: ['gold', 'silver', '#CD7F32'] }
  })

💡Hint: click on the legend to toggle series visiblity.

With just a few lines of configuration, DuckPlot handles a lot of overhead, including:

Data transformations that convert your data into a format suitable for plotting
Efficient aggregation at the database layer
Margin and label adjustments so your labels just fit
Interactive legends for quick filtering

DuckPlot writes SQL so you don’t have to

Under the hood, DuckPlot generates and executes SQL queries that perform the necessary pivots and aggregations to create your charts. Charting libraries like Observable Plot expect a particular format (e.g., different columns for y and color) so DuckPlot creates them for you.

Aggregation

Let’s start with a simple chart - the total number of gold medals by sex, which we can describe with this configuration:

duckPlot
  .table('athletes')
  .x('sex') // aggregate to one mark per category
  .y('gold')
  .mark('barY')

DuckPlot aggregates values at the Database layer based on the provided configuration to generate this chart.

Here’s the SQL that’s generated to create that chart (which you can see if you call the .queries() method)

// Create a DuckPlot instance
const duckPlot = new DuckPlot(ddbInstance).table('athletes')
// ... any additional configuration

// Return the SQL query
duckPlot.queries()

Transformations

Let’s look at a more complex example where we want to plot each medal type for each sex. Because there are separate columns for each medal type, we need to transform the data to get it into a format for plotting.

Pivoting multiple columns into a pair of key-value columns is a common operation, so we built DuckPlot to support specifying an array of columns. DuckPlot automatically pivots arrays of columns to create separate columns for y and color.

duckPlot
  .table('athletes')
  .x('sex')
  .y(['bronze', 'silver', 'gold']) // Stack and sum these!
  .mark('barY')

Wow, that’s a lot of SQL just to make a chart! Nice that you didn’t have to write it…

If you just want a description that you can actually read, call the duckPlot.describe() method to get a more human-readable description of what DuckPlot is doing.

Additional aggregation methods

In addition to the implicit summing that’s happening above, you can also specify other aggregation methods like mean, min, max, etc. Here’s the average height of athletes by sex:

duckPlot.table('athletes').x('sex').y('gold').mark('barY').config({
  aggregate: 'mean' // use mean instead of sum
})

Computing percentages

Given the common use case of displaying percentages, DuckPlot supports computing percentages at the database level. Rather than having a JavaScript library do these calculations, we construct a query with partitions to get the percentages.

For example, here’s the percentage of each medal type won by five different countries:

duckPlot
  .query(`SELECT * FROM athletes WHERE nationality in ('USA', 'CHN', 'ESP', 'CAN', 'MEX')`)
  .table('athletes')
  .x('nationality')
  .y(['gold', 'silver', 'bronze'])
  .mark('barY')
  .config({ percent: true }) // compute percentages

The underlying SQL that powers that chart is:

But wait, I want to write SQL!

That’s great, we love writing SQL too. If you want to do your own data transformations, you can use the .query() method to provide a SQL string that will be run before the automatic aggregations.

// Athletes whose names start with 'O'
duckPlot
  .query(
    `
  SELECT * FROM athletes WHERE LOWER(name) LIKE 'o%';
  `
  )
  .table('athletes')
  .x('nationality')
  .y(['gold', 'silver', 'bronze'])
  .mark('barY', { sort: { x: 'y', limit: 20, reverse: true } })
  .options({
    color: { range: ['gold', 'silver', '#CD7F32'] }
  })

Advanced chart types, out of the box

While DuckPlot largely uses Observable Plot’s API, it also provides additional chart types.

Pie charts

Treemaps

Circle Pack

Grouped bar charts

Faceted charts

Working on the server

DuckPlot is designed to work in both client and server environments. While you can use DuckDB WASM or a popular wrapper like DuckDB async, it’s worth noting that the row order is not consistent across these different database implementations without an explicit ORDER BY clause.

For consistency, we now use DuckDB WASM in both the client and server environments in our tests and examples (see details here).

Conclusion

DuckPlot puts DuckDB at the center of the visualization development experience, abstracting away common data transformations while still allowing you to write your own SQL. We’ve found that interacting directly with a DuckDB instance is a faster and clearer way to understand our data.

Want to see what DuckPlot can do in a full application environment? Sign up for free now, and join our Slack community.

FAQ

Does it work without a DuckDB instance?

Sure thing! If you just have an array of objects (e.g., if you're just fetch()ing a CSV file), you can use the .rawData() method to pass in your data directly. DuckPlot won't do any automatic SQL transformations to your raw data, but you can still use the .query() method to provide your own SQL transformations.

What if I don't want to make a whole website just to make a chart?

No worries, DuckPlot is supported in both client and server environments! Just npm i and work in whatever environment is best for you.

I found a bug - will you fix it?

We love bug reports - please file an issue, and we'll do our best to prioritize it.