Introducing DuckPlot: the missing link between DuckDB and visualization
We made DuckDB the heart of our charting suite and it turned out great. Now open sourced for all to enjoy.
by Mike Freeman
Motivation
Existing data visualization workflows make you repeat yourself. In one environment you write a SQL query to format your data, and in another you express how you want to display it. Add the headache of translating your SQL results (arrow tables? parquet files?) into a format that your charting library understands, and now you have three problems to solve to make a single chart.
We wanted a charting library that interacted directly with our DuckDB databases, so we built DuckPlot. DuckPlot uses a single configuration to generate both the underlying SQL and chart settings needed to materialize your desired visualization. We’re able to accomplish this by elevating the computational efficiency and schematic awareness of DuckDB directly into the charting library, instead of treating it as an external accessory.
Built on top of Observable Plot and D3, DuckPlot is a flexible, feature rich, and open-source library for generating charts directly from your DuckDB instance.
Digging in
Imagine you have this table of Olympic athletes in a DuckDB instance:
Loading data...
The first question we’ll answer is
Which countries have won the most olympic medals?
DuckPlot leverages it’s connection to DuckDB to generate both the SQL required to answer the question and the Plot configuration needed to visualize the result:
// Create a DuckPlot instance
const duckPlot = new DuckPlot(ddbInstance)
// Plot medals by nationality
duckPlot
.table('athletes') // Use the athletes table
.x('nationality')
.y(['gold', 'silver', 'bronze']) // Stack and sum these!
.mark('barY')
.options({
color: { range: ['gold', 'silver', '#CD7F32'] }
})
With just a few lines of configuration, DuckPlot handles a lot of overhead, including:
- Data transformations that convert your data into a format suitable for plotting
- Efficient aggregation at the database layer
- Margin and label adjustments so your labels just fit
- Interactive legends for quick filtering
DuckPlot writes SQL so you don’t have to
Under the hood, DuckPlot generates and executes SQL queries that perform the
necessary pivots and aggregations to create your charts. Charting libraries like
Observable Plot expect a particular format (e.g., different columns for y
and
color
) so DuckPlot creates them for you.
Aggregation
Let’s start with a simple chart - the total number of gold medals by sex, which we can describe with this configuration:
duckPlot
.table('athletes')
.x('sex') // aggregate to one mark per category
.y('gold')
.mark('barY')
DuckPlot aggregates values at the Database layer based on the provided configuration to generate this chart.
Here’s the SQL that’s generated to create that chart (which you
can see if you call the .queries()
method)
// Create a DuckPlot instance
const duckPlot = new DuckPlot(ddbInstance).table('athletes')
// ... any additional configuration
// Return the SQL query
duckPlot.queries()
Transformations
Let’s look at a more complex example where we want to plot each medal type for each sex. Because there are separate columns for each medal type, we need to transform the data to get it into a format for plotting.
Pivoting multiple columns into a pair of key
-value
columns is a common
operation, so we built DuckPlot to support specifying an
array of columns. DuckPlot automatically pivots arrays of columns to create separate
columns for y
and color
.
duckPlot
.table('athletes')
.x('sex')
.y(['bronze', 'silver', 'gold']) // Stack and sum these!
.mark('barY')
Wow, that’s a lot of SQL just to make a chart! Nice that you didn’t have to write it…
If you just want a description that you can
actually read, call the duckPlot.describe()
method to get a more
human-readable description of what DuckPlot is doing.
Additional aggregation methods
In addition to the implicit summing that’s happening above, you can also
specify other aggregation methods like mean
, min
, max
, etc. Here’s
the average height of athletes by sex:
duckPlot.table('athletes').x('sex').y('gold').mark('barY').config({
aggregate: 'mean' // use mean instead of sum
})
Computing percentages
Given the common use case of displaying percentages, DuckPlot supports computing
percentages at the database level. Rather than having a JavaScript library do these
calculations, we construct a query with partitions
to get the percentages.
For example, here’s the percentage of each medal type won by five different countries:
duckPlot
.query(`SELECT * FROM athletes WHERE nationality in ('USA', 'CHN', 'ESP', 'CAN', 'MEX')`)
.table('athletes')
.x('nationality')
.y(['gold', 'silver', 'bronze'])
.mark('barY')
.config({ percent: true }) // compute percentages
The underlying SQL that powers that chart is:
But wait, I want to write SQL!
That’s great, we love writing SQL too. If you want to do your own data
transformations, you can use the .query()
method to provide a SQL string that
will be run before the automatic aggregations.
// Athletes whose names start with 'O'
duckPlot
.query(
`
SELECT * FROM athletes WHERE LOWER(name) LIKE 'o%';
`
)
.table('athletes')
.x('nationality')
.y(['gold', 'silver', 'bronze'])
.mark('barY', { sort: { x: 'y', limit: 20, reverse: true } })
.options({
color: { range: ['gold', 'silver', '#CD7F32'] }
})
Advanced chart types, out of the box
While DuckPlot largely uses Observable Plot’s API, it also provides additional chart types.
Pie charts
Treemaps
Circle Pack
Grouped bar charts
Faceted charts
Working on the server
DuckPlot is designed to work in both client and server environments. While you
can use DuckDB WASM or a popular wrapper like DuckDB
async, it’s worth noting that the
row order is not consistent across these different database implementations
without an explicit ORDER BY
clause.
For consistency, we now use DuckDB WASM in both the client and server environments in our tests and examples (see details here).
Conclusion
DuckPlot puts DuckDB at the center of the visualization development experience, abstracting away common data transformations while still allowing you to write your own SQL. We’ve found that interacting directly with a DuckDB instance is a faster and clearer way to understand our data.
Want to see what DuckPlot can do in a full application environment? Sign up for free now, and join our Slack community.
FAQ
Does it work without a DuckDB instance?
fetch()
ing a CSV file), you can use the .rawData()
method to pass in your
data directly. DuckPlot won't do any automatic SQL transformations to your raw
data, but you can still use the .query()
method to provide your own SQL transformations.What if I don't want to make a whole website just to make a chart?
npm i
and work in whatever environment is best for you.