Watch Neon Developer Days #3 🚀
Community

Change Data Capture with Neon and Materialize

Replicate Neon data in real time to build better, faster products

Post image

Now that Neon supports logical replication, your head might be filled with questions about how to turn this new feature into interactive data apps, better customer experiences, and other use cases ripe for fresh data. Do you need to rethink your architecture? Does your team need to catch up with all the streaming jargon? In this blog post, we’ll walk through how you can use Materialize to replicate data from Neon in real-time without needing to rethink much.

Why Materialize?

Logical replication allows capturing INSERTs, UPDATEs, and DELETEs as these operations happen in your Neon database, making this data available to an external system for processing — a pattern known as Change Data Capture (CDC). If your use case requires more than simple mirroring, Materialize can help you transform the replicated data into actionable results that are always fresh and consistent.

Materialize shines at operational use cases, where an analytical data warehouse would be too slow or too costly, and a stream processor would introduce too much complexity. It can handle arbitrarily complex transformations on changing data using — you’ve guessed it — just SQL. For CDC from a PostgreSQL database like Neon, it has the added benefit of requiring no additional infrastructure (e.g. Debezium), while guaranteeing transactional consistency.

Enough talk!

Let’s see what it takes to build a CDC pipeline to replicate data from Neon to Materialize in real time. If you want to follow along, sign up for a Materialize free trial!

Get started with Neon + Materialize

Configure logical replication in Neon

1. First, enable logical replication for your Neon project:

  1. Select your project in the Neon Console.
  2. On the Neon Dashboard, select Settings.
  3. Select Beta.
  4. Click Enable to enable logical replication.

🖐️ It’s important to note that enabling logical replication in Neon cannot be reverted, and triggers a restart of all active compute endpoints in your Neon project. Any active connections will be dropped and have to reconnect.

2. Double check that logical replication is enabled by running the following query:

Once logical replication is enabled, the next step is to create a publication with the tables that you want to replicate to Materialize. It’s also recommended to create a dedicated user for replication.

4. Grant the replication user the required permissions on the tables you want to replicate:

For each table that you want to replicate to Materialize, set the replica identity to FULL:

REPLICA IDENTITY FULL ensures that the replication stream includes the previous data of changed rows, in the case of UPDATE and DELETE operations, and enables Materialize to ingest Neon data with minimal in-memory state.

5. Create a publication with the tables you want to replicate:

The mz_demo publication will contain the set of changes generated for the specified table, and will be used to ingest the replication stream into Materialize next!

Create a source in Materialize

Logical replication in Neon follows PostgreSQL’s native replication protocol, so you can jump straight into action using Materialize’s direct PostgreSQL source.

1. Log in to Materialize. You’ll be dropped into the SQL shell, which you can use to run the commands in the next steps.

2. Use the connection details of your Neon endpoint to create a connection:

2. Create a PostgreSQL source using the connection you just created, as well as the publication you created upstream:

Materialize will create a subsource for each table in the publication, which will first get hydrated with a snapshot of the existing data, and then carry on receiving new data as it arrives:

Transform data in Materialize

To get a quick taste of what Materialize can do, let’s keep track of the row count in playing_with_lr, and see how adding and deleting data in Neon immediately affects the results in Materialize.

1. In Materialize, create a view with a count aggregate function:

2. Switch back to Neon, and insert some new data into the playing_with_lr table. Then, query the lr_cnt view in Materialize one more time:

Post image

This isn’t a very complex task, but…it’s still somewhat magic to see the count instantly update, no? Hopefully, your use case will be strictly more complex than this, and Materialize will make you Ooh! and Aah! at how efficiently it can handle CDC pipelines (which can be…surprisingly costly).

What’s next?

If you’re as excited as we are about logical replication support in Neon, we encourage you to learn more about Materialize and how it can help you build better products, faster. Check out the Materialize customer stories to see operational use cases in the wild, and reach out if you’d like to get a hands-on walkthrough with our team.