Hacker News new | comments | show | ask | jobs | submitlogin
OT and CRDT trade-offs for Real-Time collaboration (www.tiny.cloud)
178 points by andfrob 5 months ago | hide | past | web | 59 comments | favorite

I’ve been doing a pretty deep dive on CRDTs and OTs to get the right user experience for our collaborative website builder[1]. I’m still not done with the implementation and subsequent testing (i.e., splitting text nodes, as mentioned in the post). But the core algorithm behind Automerge[2], formalized in the OpSets paper[3] is extremely promising. A good example of its power is the ability to do an atomic tree “move”, whilst preserving strong eventual consistency.

I'm so glad to see that the HN community cares about this problem space. Maybe I should finally write that blog post on CRDTs and how to implement them yourself following the OpSets spec.

[1] https://www.makeswift.com

[2] https://www.github.com/automerge/automerge

[3] https://arxiv.org/abs/1805.04263

I think that blog post would be a great idea, would love to read it

Really? I’ll get something out this week. My Twitter handle is in my profile. I’ll probably tweet about it when I write it.

Wanderlog (https://wanderlog.com) is a Google Docs for planning travel, and naturally, we had to figure this out early in the process.

If you're on a Node.js/React stack, we highly recommend using the combination of OT-JSON0 [1] and ShareDB [2], two excellent libraries. OT-JSON0 lets you perform operational transforms on any JSON-serializable structure pretty intuitively, and ShareDB handles synchronizing it between multiple clients and resolving conflicts. Building your own is going to be a pain, so unless your core business is a text editor like TinyMCE, it's probably best to stand on the shoulders of giants.

Another approach that we didn't end up using is Figma's: especially if you're working on anything other than text editing, ask if you actually want to use operational transform. Something simpler might be good enough -- in Figma's case, they use "last write wins" because they were able to break the document into small enough, atomic elements. [3]

[1] https://www.npmjs.com/package/ot-json0

[2] https://www.npmjs.com/package/sharedb

[3] https://www.figma.com/blog/how-figmas-multiplayer-technology...

We’re actually using ShareDB at Makeswift right now and for us it has become a horrible burden. There are very subtle bugs on their data persistence layer with concurrency and the API is a bit of a mess to work with in my opinion. I’m glad to see other people are having success with it, though.

I've been given advice that ShareDB is "pretty antiquated and not well supported". I'm told it is not difficult to rewrite in modern NodeJS, replacing things like callbacks and event emitters with async/await produces a much simpler and more reliable server.

Ah that's unfortunate; we did find it a bit weird that you have to think of all changes as `ops`, but otherwise, I'm curious as to what you've had issues with.

We also have a custom back-end that's MySQL that may have changed how it work a bit

I came to the same "LWW on fragments" solution for peraspera.io. In a lot of cases, all OT gets you is being able to watch someone else's cursor move around. The new wore off on that a long time ago.

Could you elaborate a bit on that?

I'll cover this in the second post but we will be using ShareDB for our initial launch (although probably building our own once we hit production release).

atomic elements doesn't really extend to arbitrary HTML, which is fine for the simpler editing solutions but not TinyMCE.

To do something like that in arbitrary HTML you wind up locking at the top-level-block boundaries. Locking sounds great as an engineer but it leads to a garbage user experience.

There's one thing I'd add here, as advice to potential implementers of collaborative editing algorithms. Test the system rigorously, autogenerating concurrent workloads of all the operations (including undo), and also various delays. If this had been done early on, all the broken algorithms (some published with formal proofs of correctness!) would have been caught. A lot of the broken algorithms work under light loads - TP2 requires at least three concurrent edits, two inserts and one delete, to trigger.

We're using property-based testing where we can, which I hope to cover in a future post

At Mixcut.com (Building real time collaborative video editing) we use a combination of OTs and a variant/simpler version of CRDTs that provide the most coverage of the various features needed. The usage of low latency websockets to shuttle OTs and building completely around OTs is key.

It’s working great for us in beta, we’ve recently shipped multiplayer to our users and overall it’s very stable. With a lot more in store soon, please check it out!

Thanks for linking to my blog post :)

"This is the exact "split node" scenario I described earlier; applying bold near the start of a text node necessarily has to split the text node into three parts (before, bold, and after)."

Not sure why you split it up into sections. In my CRDT implementation, I would add meta-data to each character, with the boolean property which is either bold or not. It's certainly cumbersome to keep the cursor at the right place when inserts are being made, but it's doable. https://pierrehedkvist.com/posts/1-creating-a-collaborative-...

I personally never understood how OT actually works, clearly, Google Docs and others find it useful. But to me, CRDT has more solid proof and reasoning behind it, and it is easier to comprehend. https://medium.com/@pierrehedkvist/creating-a-collaborative-...

I definitely read a few of your posts during my research, thanks for your hard work :)

The best CRDTs do treat each character separately (as I think I mentioned at one point?). But not all of them - the video and my example is an example of one that doesn't.

It's also very difficult to support arbitrary nested trees with a per-character data type, which leads to the compromises I talked about.

Is the article suggesting to apply a string CRDT to an entire JSON structure? Are people doing that?

At the very least, I would expect different JSON data types to use different CRDTs. For a collaborative model, I would expect the semantics of the data model to be reflected in the choice of CRDTs for different substructures in the JSON.

No, the state of the art CRDT solution for json is something like Automerge (https://github.com/automerge/automerge), that treats the document like a collection/combination of CRDTs of different types. (the author doesn't mention it explicitly, but it's linked at the end)

I have been using Automerge recently for a project, and I have found it to be very, very user-friendly.

Our use-case is offline-editing of documents with an eventual sync-with-yourself-online. It's mostly there as a sync tool, not as a p2p colalb editing. Unless you count yourself as a peer, I guess!

We store a document in local storage which is the result of `Automerge.save(automergedoc) => serializable string`. That can be loaded with `Automerge.load(s) => doc`.

"Changes" are a first-class concept in automerge (makes sense, I guess) so you can do

   before = Automerge.from({..some doc..})
   after = Automerge.change(before, "optional commit msg", (doc) => doc.hello = "world);
With that done, you can get a diff, `Automerge.getChanges(before, after) => [changes]`.

Those `[changes]` are what we try and send to a server and keep them in a `Set()` for each document (background sync, service worker, etc make a nice experience.

This is all wrapped up in Redux, and a middleware that captures the changes, and a reduxStore subscriber that puts the document back into local storage after any changes.

It is for the application programmer virtually transparent, and so far we've yet to find a fault with it.

I'm sure I'll come to regret making "changes" my principle data type, as the server implementation just sees serialized blobs of data, and has no concept of the structure of the document. That's solveable if I need it ever, because there's a protcol compatible implementation in Rust, and I could always use the JavaScript one on the server too, just haven't needed yet.

The serialization is also quite large, in the order of kilobytes for what are actually quite small docs, but they do contain the original commit messages too, and the Automerge team is planning to release an optimized serialization format early this year, which they say will be especially useful for fields with lots of changes, such as text fields, which currently pay a heavy toll in terms of tracking `[changes]`. Fortunately that also doesn't affect me.

I can also +1 some of the advice to keep your whole document in a CRDT. We use React+Redux, but since React Hooks made stateless functional components a reality, we felt it was appropriate to use React exclusively for the UI state, and track only document states in Redux reducers. It feels light-weight and sustainable, although we're only about a month and 5k LOC into the project. Time will tell I guess.

Yeah, personally the feature I'm looking forward to is materializing "changes" into a document without losing editing history.

At some point your set of changes grows too large or you want to trim it to, say, a few weeks or months. Afaik that isn't possible yet without losing the ability to merging later on.

I'm farmilliar with Automerge, but not its specific features on state compression. In general change set compression is possible with CRDTs but there are some "gotchas" you need to look out for. Most notably, all peers interested in a changeset, have to be completely in sync at the moment of compression, otherwise, depending on which underlying CRDT type you use, there can end up with duplicate entries in for example a Set type. Depending on the use case, 100% peer sync is possible, in others its more challenging.

I'm a big fan of Automerge, I look forward to where the research takes it!

So.. we’re left with no real explanation to why OT is better than CRDT other than a 5s video showing a poorly managed text node update that moves the cursor. I feel tricked.


> In OT, every user action is broken down into one or more operations. These operations are transmitted between clients along with their baseline reference; if two users perform actions at the same time, incoming operations must be transformed to include the local operations that have happened since that baseline. They are then applied locally and form the new baseline.

> This constant transformation of operations turned out to have too many edge cases where clients were found to not produce the same baseline (the "wrong" papers above). When that happens, the clients will never converge on the same result and break the fundamental assumption of collaboration.

And that’s exactly right. It’s fairly easy to have an OT system that is very vulnerable, because clients can cheaply generate change sets that are extremely computationally expensive on the server side. I’ve seen a system where a single mobile phone could, in a few seconds, lock up the synchronisation server for days.

But the article is arguing in favor of OT, not CRDT. A couple examples of those edge cases would go a long way, the one given is not very helpful to make a point.

What's wrong with the one given? If my cursor moves because some other user makes some other text bold, that doesn't sound like a desirable experience?

Because it doesn’t highlight a particularly relevant deficiency of the CRDT protocol. Preserving cursor position is not hard in that case. The post even mentions stronger downsides of OT, the example does not support the conclusion. Why is OT better?

Huh, the article gave me the impression that preserving the position was a crucial problem there.

As for the last question, I think the later posts will get to that.

I wasn't really trying to say OT is better than CRDT. Nor was I intending to provide a complete detailed explanation (there's enough research already).

I was driving at CRDT not being capable enough for arbitrary HTML structures, and giving a quick summary of what was an extended multi-week research process.

> the prospect of peer-to-peer editing with end-to-end encryption is an exciting one

Have you heard of CryptPad ? Not exactly peer-to-peer but has encryption so that the server sees nothing.

I haven't! This is what our server team is looking at as an option for encrypting our RTC traffic.

https://tweetnacl.js.org/ https://github.com/dchest/tweetnacl-js/wiki/Examples

this is a wonderful writeup! thank you!

I'm researching the same thing.

Any tutorial on OT implementation? Especially, I need to handle structured data, like json.

This article popped up on HN recently and is a great intro to CRDT implementation. It says OT is much more difficult.


That's included in my research links, yes :D

Here is a 4+ hour class recording on the topic. I started going through it and it has helped me understand ot better.


The second post will cover which editor model we chose, and talk a little bit about the OT implementation details ;)

Looking forward to it! Thanks again!

The only thing I remember from the CKEditor post from last year was that it took "42 man-years" to implement: https://news.ycombinator.com/item?id=18220020

I wonder if it's better understood now and can cost less R&D time.

It is, as I'll cover in the second post ;)

w00h00, this is my area of expertise.

After 8 years of working on this, I have changed my thoughts:

- The correct algorithm is not always the correct user experience.

- End-to-end encryption is too important to not have.

- Offline support is great, but it behaving consistently is more important than it behaving "intently".

- Biggest pain points can most easily be solved at the editing layer, not data layer.

As a result, OT gets thrown out the door immediately.

I built the most popular CRDT powered database on the market (https://github.com/amark/gun, 11K+ stars, 11M+ monthly installs), but have some harsh words for CRDT obsessed people:

CRDTs are great, but if you create an infinitely complex one to handle "every word editing operation" it'll actually result in an inferior user experience.

As such, for example, GUN supports lock-free concurrent operations on any depth of data in a graph, but keeping text/strings behavior atomic is way more important than having built-in automatic string merges.

Why? It is much simpler to build a correct text/string CRDT on top of a predictable, stable, graph CRDT that is understandable.

Another example is, in our blog editing tools, despite having spent years researching & implementing (+ others in our community) precise character-by-character CRDT resolution schemes, I found I personally had a much better offline & local first user experience with a predictable per-paragraph sync scheme.

This is a stupid simple approach, but what it means is that I as a user, can easily predict whats going to happen, so if I'm offline I know I should just copy a new paragraph and fiddle with things there, cause it isn't gonna get "auto delete regrammar merged".

Generally speaking, me and colleagues don't "fight" over the same paragraph, we're usually concurrently writing different "sections" at the same time, it is pretty rare to grammar fix their edit same paragraph as they're typing it - that is also just kinda, rude looking.

Anyways, finally, is that the majority of text styling and DOM edge cases can be handled by having a deterministic canonical DOM hierarchy that always gets applied at the editing layer, BEFORE any CRDT operations even occur.

So for instance, re-arrange the DOM tree such that <i> is always inside <u> inside <b>, or something like that.

We built this into a library called normalize, and it instantly eliminated so much complexity, especially at the CRDT level. Ping me if you want a demo of this library.

Finally, for everyone else, we also built an INTERACTIVE CARTOON EXPLAINER of an extremely basic text CRDT to help others understand the more detailed concepts:


User experience is definitely why we didn't go for an existing CRDT solution.

The model we've chosen is fairly agnostic to whether OT or CRDT is used to collaborate, as I'll expand on in the second post, so if we see an opportunity to switch to a CRDT-based solution we'll certainly jump on it!

You only need to normalise things like bold/italic/underline when using a browser DOM, every collaborative rich text model I've seen represents formatting as unordered attributes which is much easier to collaborate on.

You are a charming and energized speaker and perhaps that is why Tim Draper thinks you a good investment as with others he has made. Acronyms like 'PTSD' and 'PARTY' in the project is likely good at attracting people who think they sound fun and do not question many details.

It is difficult to get brief clear details about how gun.js works. There are distracting or simple documents, discussions about high level ideas and tangentially-related topics. In some cases 'worse is better' yes but it sets a tone that the project is hiding internal problems of implementation or can not easy explain its own design.

The transparency of gun.js enterprise organizations also raises questions. Are people on your team related to you? Do you pay yourself and team and open source contributors?

Thank you!

They also like the math & science behind it, some investors do DD, some don't.

You're tone is pretty assuming, smug, and insulting.

It is running in production with millions of people, yes there are some hiccups, but surviving & we're improving it to scale even more.

I rebuild GUN from scratch in 30min on stage, that is how little/simple GUN is:


Do I pay myself? Lol. Yes. No, I do not pay OSS, but yes people contribute regardless, strangers help me, my kids help me, investors help me, family & friends help me.

With end-to-end encryption in GUN, what role does the server play? I assume the merging type activity happens in the clients?

BTW, the name makes it almost impossible to search for info about GUN.

Yes, merging is per peer, not server transformed.

"Servers" act as WebRTC bootstrapping signals, storage backups, relay/routing, etc., and you can have as many of them as you want (no need to be centralized).

True. `gunDB` tag should help a little.

Do you have any document that explain what resolution algorithm uses in what cases?

For example, one peer change a property value and the other peer deletes it.

Same algorithm for everything.

This cartoon explains it:


It is a vector + timestamp + lexical sort.

A delete is changing a value to `null`, it would lose (I assume you are asking if/when these 2 changes happen at the exact same microsecond time, conflicting?) as is it has a lower lexical rank.

Can you refer me to a document where you explain the "lexical rank" that you are referring to?

The cartoon explanation do not tackle those terms.

JSON.stringify('a') < JSON.stringify('z')


Cool! Make sense.

Let's say our state have an array like ['a','b'].

What about if Peer-A splice the first element (0) which results in ['b'] and Peer-B wants to mutate 'a' to 'A' which results in ['A','b'] and they both make the mutation at the same millisecond?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact