profile
viewpoint
Martin Kleppmann ept University of Cambridge Cambridge, UK http://martin.kleppmann.com Distributed systems researcher at University of Cambridge; author of Designing Data-Intensive Applications; formerly Rapportive/LinkedIn

automerge/automerge 9194

A JSON-like data structure (a CRDT) that can be modified concurrently by different users, and merged again automatically.

ept/ddia-references 2203

Literature references for “Designing Data-Intensive Applications”

automerge/mpl 237

a p2p document synchronization system for automerge

ept/avrodoc 131

Documentation tool for Avro schemas

ept/crdt-website 111

Source of the crdt.tech website

ept/dist-sys 29

Distributed systems lecture notes

ept/cap-critique 16

Source of paper “A critique of the CAP theorem”

ept/blog 9

Source of my personal blog, using Markdown, Jekyll and Heroku

ept/bespin-on-rails 6

Bespin on Rails is a simple Ruby on Rails plugin that allows you to embed the Mozilla Bespin code editor component in your Rails views using simple helper tags.

ept/compsci 6

Computer Science exercises and teaching materials

startedept/ddia-references

started time in 10 hours

startedartsy/cohesion

started time in 13 hours

startedept/ddia-references

started time in 13 hours

fork alloy/eigen

The Art World in Your Pocket or Your Trendy Tech Company's Tote, Artsy's iOS app.

fork in 15 hours

pull request commentautomerge/automerge

Allow hooking into patch application

Thanks @ept that is very useful for my use case. It will simplify the code receiving remote changes and applying them both to the Automerge document and the user facing representation of the document.

ept

comment created time in 18 hours

pull request commentautomerge/automerge

add Automerge.getClock and Automerge.getMissingChanges

@HerbCaudill By the way, I just realised that Cevitxe isn't listed as one of the data sync layers in the README. I've put a brief one-sentence description in here: 6679330 — does this look ok? Feel free to change it to be better.

I'd like to tweak the wording a bit - why don't I just make a separate PR, since it's not really related to this one.

josharian

comment created time in 19 hours

startedept/hermitage

started time in 20 hours

Pull request review commentautomerge/automerge

Allow hooking into patch application

 function getAllChanges(doc) {   return getChanges(init(), doc) } -function applyChanges(doc, changes) {+function applyChanges(doc, changes, options = {}) {

I agree that readability would improve on the call side. But I think that's caused by missing labeled arguments in JavaScript which we should not try to fix.

When just reading function applyChanges(doc, changes, options = {}) I would expect to pass a document, some changes and a map which only contains key value pair, which help configuring the change method. I would not expect to pass references into options which provide me with the resulting patches.

On the other hand when reading function applyChanges(doc, changes, patchCallback) it is clear to me that I can pass a document, changes and a callback which provides some kind of patches. (It is also future prove to just add another parameter)

This is not really a strong opinion, so I am happy to go with options map as well :)

ept

comment created time in 20 hours

pull request commentautomerge/automerge

Allow hooking into patch application

On second thought I do see that this is a little spooky since you're getting a new Automerge document every time.

const state = Automerge.init({}, { patchCallback }) 
const state1 = Automerge.change(state, s => s.foo = 42)
const state2 = Automerge.change(state, s => s.boo = 'pizza') // maybe it is weird for patchCallback to fire here
ept

comment created time in 21 hours

pull request commentautomerge/automerge

Allow hooking into patch application

Overall this seems like a step in the right direction. It always seemed odd to me that you could watch for changes on a DocSet but not on an individual document.

An alternative API would be to pass in a callback function once when initialising the document, and automatically calling it on all subsequent updates. But I thought that had a bit of a "spooky action at a distance" feel, while the callback on every change/applyChanges call was more explicit. Though it could also be annoying for app developers to have to add the option to every single Automerge.change call, which might be scattered around an application. Any thoughts on this?

I'd imagine most developers would find the idea of a change listener to be totally natural, and not spooky at all.

Is there a reason why we can't have both?

const patchCallback = patch => alert(patch)
const changeFn = doc => { doc.foo = 42 }

// this works
const state = Automerge.init({}, { patchCallback }) // patchCallback will fire on every change
const newState = Automerge.change(state, changeFn)

// this works too
const state = Automerge.init({})
const newState = Automerge.change(state, changeFn, { patchCallback }) // patchCallback only fires on this change
ept

comment created time in 21 hours

Pull request review commentautomerge/automerge

Allow hooking into patch application

 function getAllChanges(doc) {   return getChanges(init(), doc) } -function applyChanges(doc, changes) {+function applyChanges(doc, changes, options = {}) {

I prefer the interface @ept suggests here - the options map makes it easier to add the change API further in the future. It also makes the code calling this function more readable.

ept

comment created time in 21 hours

issue commentautomerge/automerge

Python Implementation of the Automerge Server?

Is it completely automerge agnostic? I would expect the interest in a topic resided in the content of the automerge CRDT messages?

No, the "topic" in this sense generally represents one Automerge document or a set of Automerge documents (e.g. a DocSet, or what Hypermerge and Cevitxe both refer to as a repository).

echarles

comment created time in 21 hours

pull request commentautomerge/automerge

add Automerge.getClock and Automerge.getMissingChanges

Thanks for doing clean-up. :)

josharian

comment created time in a day

startedept/ddia-references

started time in a day

Pull request review commentautomerge/automerge

Allow hooking into patch application

 function getAllChanges(doc) {   return getChanges(init(), doc) } -function applyChanges(doc, changes) {+function applyChanges(doc, changes, options = {}) {

why is there an options map needed. Could we just pass the callback directly?

ept

comment created time in 2 days

Pull request review commentautomerge/automerge

Allow hooking into patch application

 function docFromChanges(options, changes) {   const doc = init(options)   const [state, _] = Backend.applyChanges(Backend.init(), changes)   const patch = Backend.getPatch(state)+  if (options && options.patchCallback) options.patchCallback(Object.assign({}, patch))

Same as above

ept

comment created time in 2 days

Pull request review commentautomerge/automerge

Allow hooking into patch application

 function getAllChanges(doc) {   return getChanges(init(), doc) } -function applyChanges(doc, changes) {+function applyChanges(doc, changes, options = {}) {   const oldState = Frontend.getBackendState(doc)   const [newState, patch] = Backend.applyChanges(oldState, changes)+  if (options.patchCallback) options.patchCallback(Object.assign({}, patch))

I would prefer a operate line + indentation, which improves readability, safety and debuggablility.

ept

comment created time in 2 days

Pull request review commentautomerge/automerge

Allow hooking into patch application

 function makeChange(doc, requestType, context, options) {    if (doc[OPTIONS].backend) {     const [backendState, patch] = doc[OPTIONS].backend.applyLocalChange(state.backendState, request)+    if (options && options.patchCallback) options.patchCallback(patch)

I would prefer a operate line + indentation, which improves readability, safety and debuggablility.

ept

comment created time in 2 days

startedpetyosi/react-virtuoso

started time in 2 days

issue commentautomerge/automerge

Python Implementation of the Automerge Server?

Relaying

Agree. We have that notion in place in our experiments https://github.com/jupyterlab/rtc/tree/main/packages/relay

All it needs to do is establish that Alice and Bob are interested in the same topic,

Is it completely automerge agnostic? I would expect the interest in a topic resided in the content of the automerge CRDT messages?

Availability

Agree also with your explanations. On top of availability, server also allows to ensure persistence (in jupyter case, the persistence of the notebooks).

Authentication

+1

echarles

comment created time in 2 days

pull request commentautomerge/automerge

add Automerge.getClock and Automerge.getMissingChanges

Beat me to it! 😆 I've had this TODO in my code for over a year:

image

https://github.com/DevResults/cevitxe/blob/master/packages/cevitxe/src/clocks.ts#L28-L41

josharian

comment created time in 2 days

startedept/ddia-references

started time in 2 days

startedept/ddia-references

started time in 2 days

fork pchalcol/crdt-website

Source of the crdt.tech website

http://crdt.tech/

fork in 3 days

issue commentautomerge/automerge

Python Implementation of the Automerge Server?

@echarles I think it's helpful to break down the services that a server traditionally provides and rethink how you might meet those needs in a peer-to-peer context.

  • Relaying: One big advantage of a server is simply that it has a stable public IP address, so you never have any trouble making a connection with it from anywhere on the internet. It's theoretically possible for Alice's laptop to communicate directly over the internet with Bob's phone, but with the standards we have today (WebRTC etc.) it's really hard to pull off reliably in practice. It's a lot easier for Alice and Bob both to connect if there's a known, stable, public endpoint that they can each connect to, which can then act as an intermediary. As @ept's example above shows, this kind of "server" can be exceedingly simple - it's not really a server, and it needn't know anything about Automerge. All it needs to do is establish that Alice and Bob are interested in the same topic, and then pipe their sockets together and let them chat away. In Cevitxe we call this a "signal server" - perhaps "relay server" would be more descriptive.

  • Availability: Another advantage of a server is that it's always turned on and always online, Alice's laptop and Bob's phone may very well be switched off or offline at any given time, and if they can only synchronize when they're both online, it can be hard to work asynchronously. Rather than create a whole new server codebase to address this need, I like the idea of creating an always-on version of your client that you can deploy somewhere, and that can reliably persist its state using a traditional database or something. I haven't actually done this - it's on the roadmap for Cevitxe. I'd be curious to know if anyone else has actually done this.

  • Authentication: The relay server approach described above can provide a very basic sort of security, in the sense that it will only connect two peers if they both request the same topic (a.k.a. channel or document key). This can be good enough for many purposes, especially if that key is long and randomly generated. But it doesn't give the kind of fine-grained permissions control that a lot of applications require, and there's not an obvious remedy if it's compromised. One solution would be to add basic OAuth authentication to the signal server; I sketched out a possible design for that here. I eventually decided against that approach in favor of a completely decentralized solution. I've been fleshing this out for the past few months in a separate project called taco-js.

echarles

comment created time in 4 days

issue closedautomerge/automerge

Looking for a minimal implementation of Automerge/CRDTs

Hello!

Sorry for the essay - TL;DR: Do you know of any small CRDTs like the Micromerge you wrote?

Thanks for all your work on Automerge. I've been researching CRDTs as a way of building git-like local-first software and feel like I've come to the right place. It's been very interesting to learn from watching conference talks (including yours!) and reading papers that are referenced in CRDT codebases.

Along with the local-first mindset, I also like to build apps/libraries that are small and more easily understandable, hackable, and interoperable. There's some overlap with those who care about bundle size too; devs who choose lighter libraries options like Preact/NanoID/Svelte/Sinuous may be doing so not only because it helps push against web bloat, but also that there's a better chance they (and future devs to come) will be able to wrap their head around the codebase.

I've been trying to find a small basic CRDT implementation but haven't had much luck. I know CRDTs have a certain level of essential-complexity (especially when focused on efficient text editing), but I know many projects would be happy to have basic list syncing (i.e todo lists, tictactoe, or pixelpusher-like apps that don't focus on text). Similarly, I've been surprised to not yet stumble on something that vibes well with vanilla JS either.

Projects like Automerge, Y.js, and Logux all have a lot of features. That's great for devs who want them, but adds a lot of complexity to apps that are trying to be small. For instance, if I have a 3kb (min+gzip) app that is an orderable todo list with websockets, then Automerge is ~20x larger than the entire app...

I know this isn't true for a lot of React+Redux apps that are already very large and buying into concepts like immutable data. I think it's fair to assume developers might choose those technologies; even Logux is built around assumption that you'll use Redux to implement undo/redo.

Still, I think having small CRDTs will help a lot with their adoption! If I can tell other devs "Paste these 300 lines of code and you'll have basic collaborative syncing" that's huge. Especially if there's a chance they can understand and hack on it. I have actually found this! In an article, https://josephg.com/blog/crdts-are-the-future/

Martin managed to make a tiny, slow implementation of automerge in only about 100 lines of code.

Leading me to your Micromerge CRDT https://github.com/automerge/automerge/blob/performance/test/fuzz_test.js. It looks great. I'm going to move forward with this and try writing some code to help in the manual construction of changes; https://github.com/heyheyhello/etch/blob/work/packages/shared/crdt.ts

I'm writing to you to ask if you've considered more formally publishing something like Micromerge, since there are developers (and article writers) who are looking for examples of small CRDTs. If not, are there small CRDTs you have in mind that you could link to which are more suitable for micro-frameworks?

Thanks!

closed time in 4 days

heyheyhello

issue commentautomerge/automerge

Looking for a minimal implementation of Automerge/CRDTs

Update: Found this gem https://github.com/jlongster/crdt-example-app ✨ from James Long which implements todo list CRDT in 600 lines using the HLC and merkle tree methods. There's also annotated notes (!) to explain how it all works. https://github.com/clintharris/crdt-example-app_annotated/blob/master/NOTES.md I'll close this, thanks all!

heyheyhello

comment created time in 4 days

issue commentautomerge/automerge

Looking for a minimal implementation of Automerge/CRDTs

Wonderful. Thx all and sorry for hijacking this PR.

heyheyhello

comment created time in 4 days

issue commentautomerge/automerge

Looking for a minimal implementation of Automerge/CRDTs

@heyheyhello @echarles That was kind of a throwaway proof of concept - not totally sure what state it's in.

If you're looking for a basic working example, todo demo in Cevitxe is a better place to start: clone http://github.com/DevResults/cevitxe , then yarn dev:todo:start.

heyheyhello

comment created time in 4 days

more