Design and implementation of DuckDB internals

mrtimo · 2026-04-14T04:11:12 1776139872

If you are a data scientist or do anything with data... duckdb is like a swiss army knife. So many great ways it can help your workflow. The original video from CMU in 2020 [1] is a classic. Minutes 3-8 present a good argument for adding duckdb to your data cleaning/processing workflow.

And if you want to add a semantic layer on top of data, Malloy [2] is my favorite so far (it has duckdb built in):

[1]: https://www.youtube.com/watch?v=PFUZlNQIndo [2]: https://docs.malloydata.dev/documentation/

anitil · 2026-04-15T02:52:53 1776221573

Thank you for the recommendation on that video! I've already adopted to using DuckDB for my ad-hoc analytics work but I didn't know the background

owlstuffing · 2026-04-14T05:33:06 1776144786

Analytics with type-safe raw SQL (including DuckDb’s awesome extensions) is pure gold:

https://github.com/manifold-systems/manifold/blob/master/doc...

password4321 · 2026-04-14T11:31:23 1776166283

Over the years I've seen anecdotes here on HN that DuckDB crashes often for several people. Is this still an issue for anyone?

wenc · 2026-04-14T16:32:57 1776184377

I use DuckDB daily.

In short — It doesn’t crash often at all.

What you may be remembering were reports of exceptional cases where it didn’t handle out of memory errors well. I was one of the people affected. I was running complex analytic queries on 400 GB parquets and I only had 128GB memory. It used jemalloc which didn’t gracefully degrade. They fixed a lot of the OOM issues so it’s more robust now. I haven’t had a crash for a long time.

On normal sized datasets it never crashes.

xtracto · 2026-04-14T14:07:43 1776175663

We use it heavily at my workplace. It doesn't crash at all if you use it as OLAP. But if you use it incorrectly, it will crash.

It's pretty solid.

jazzpush2 · 2026-04-14T17:47:06 1776188826

Never seen this and have several products that use it...

mpweiher · 2026-04-14T05:45:01 1776145501

The actual slides are linked from the intro-text:

https://github.com/DBatUTuebingen/DiDi

chkrishnatej · 2026-04-14T16:03:18 1776182598

That was a good start for understanding DuckDB internals!!!

fg137 · 2026-04-14T03:38:59 1776137939

Unfortunately it does not seem that there are lecture videos.

viccis · 2026-04-14T01:31:13 1776130273

Am I missing something or is the content empty?

esafak · 2026-04-14T01:33:14 1776130394

https://github.com/DBatUTuebingen/DiDi

viccis · 2026-04-14T01:35:01 1776130501

Thank you, I didn't realize all of the course counted as "slides and auxiliary material" haha

edit: Really great stuff in here. Every day at work I think about how much I love DuckDB

esafak · 2026-04-14T14:03:46 1776175426

What do you use it for? What's the best part for you?

viccis · 2026-04-17T19:28:00 1776454080

Sorry I missed this reply.

I'm currently using it to provide a SQL interface for doing data analytics for customers that don't have data large enough to justify a large platform like Snowflake or Databricks. Its syntax is close enough to the other two that, especially since the logic is already normalized through abstracted query definitions, it's a drop in replacement.

Given that it's so lightweight, I can use it run searches in an AWS Lambda function, which is crazy useful.

goerch · 2026-04-14T16:34:10 1776184450

Computing page rank on par with NetworkX: https://github.com/idesis-gmbh/WikiExperiments Educational local DW from Github Archive events: https://github.com/idesis-gmbh/GitHubExperiments

It is quite fast for OLAP applications. It works on low cost hardware.

buryat · 2026-04-14T04:45:09 1776141909

thank you! Learned why DuckDB is named this way