Nhekos Event Store V2

Dienstag, 25. August 2020, um 22:14 UTC

I'm currently implementing a new event store for Nheko. This blog post should explain, why this is necessary, what is required of the new event store and how this actually turns out in practice. This post will be a bit more technical in nature, but I hope I can make it easy enough to understand.

What is an event store?

The life of a client dev basically consists of the following parts (simplified):

Reading the spec.
Implementing handling for different event types in the client.
Answering questions from users and looking at their bug reports.
Cursing at Synapse, the Spec and other Clients and complaining about their brokenness.

I may have left out most of the things you actually do and have blown some parts way out of proportion, but the important take away is: Matrix is a giant distributed event database and most features require you to parse, send, store, relate or otherwise handle events.

You have a few ways to receive events. The most important ones are:

Regular event updates by long polling /sync
Paginating backwards in history by calling /messages
Fetching missing events via /event

All persistent events can be addressed by their event_id and for each pagination token stream there is basically one order the events can be in. By calling /sync in a loop, you will receive events usually in the order your server receives them in. For example consider the flow:

Your sync is up to date
Someone sends the message A
You call /sync
You send the message B
You call /sync

If you ignore all the effects of network latency, other rooms, etc, your first sync should contain the message A and the second message should contain the message B, both in the timeline section for that specific room.

A message may be a reply or a reaction, edit or otherwise related to some event. In some of those cases you may decide to fetch a related event by event id via the /event endpoint.

If your user scrolls up, they may want to see messages you do not have available at the moment. To load messages before a certain point, you would call /messages with the prev_batch token you got from /sync. Be careful, while synapse allows you to use your sync token here, this is not required by the spec and other servers may not be as lenient! Messages you fetch by paginating backwards may not neccessarily be ordered by arrival order on the server. They may be ordered by their logical order you get from the federation DAG, which I don't want to go into here. It's just something to be aware of (for now).

Now that is all well and good, but you need to do something with those events. While you can render them to the screen and then throw them away, that means your memory usage will grow proportionally to the amount of events you received. You may alleviate that a bit, by throwing away older rendered messages, but in any case you will need to fetch events from the server again, if the client gets restarted.

Alternatively you may want to store (some) events on disk, so that after a restart you can show some events until /sync returns with the up to date state. This is what this blog post is about. How do you store them efficiently and how do you access those stored/cached events?

Requirements for the event store

Note: We are talking about an event store form the "Timeline", which shows the messages. This post will not go into how you can store the room state for fast and easy access. Usually a key-value store of state events works well enough for that.

Events need to be easily accessible by their id. If you need a related event, you just want to store the id for the relation, not the full event.
You need to store the order of events. If you naively sort them by origin_server_ts, they will not be ordered by arrival time. If you couple that with lazily rendering events from the event store, the events will not be appearing at the bottom of the timeline, if an old server comes online, leading to weird notifications and a timeline, that not reflects /sync
If you want to have a 1:1 mapping of a visible event to an event in the database, you need to somehow store the order of visible events as well as the order of all events relative to each other.
If you receive or send an event, that relates to another event, this event will have a key to which event it relates to (usually called m.relates_to). The event, that is the target of this relationship, doesn't know about there stalkers (the related events) though, since you received it, before the relations were even sent (usually). On the other hand, in many cases you want to render the event, that is being related to, differently, depending on its related events (think reactions, edits, etc). For that it makes sense to store a relation mapping with all the events relating to a certain event_id.
To paginate further backwards, you need to persist the prev_batch token. While only the most recent token is neccessary, it may be useful to persist all of them, if you ever want to shrink your event store and drop old events.

Overview of Nhekos implementation

This implementation is specific to Nheko. Specifically, Nheko uses lmdb (a key value store/B-Tree database). While the ideas can probably also be mapped to relational databases, it may not be efficient and there may be a better implementation for those databases.

Database `events`

This database is very simple. There is one of those per room and it simply stores the mapping event_id -> event, where the first one is a string and the second one is the full json of the event as it was received.

Databases `event_order` and `order_event`

These are actually two databases (per room). The order of an event is a simple, 64-bit integer. Since you usually don't receive the first event in a room first, you need to be able to insert events before and after an already stored event (but not in between). This means an unsigned integer should start with half of the maximum, that can be represented. (While that limits the maximum number of messages, that can be stored, it is very unlikely you will ever reach that, without receiving a limited response in between, since you would need to receive many messages per nanosecond.)

You then store a mapping of that order integer to the event_id, so that you can iterate events in order and you also store the reverse mapping, so that you can easily figure out, where an event is positioned.

Additionally you store next to the event_id the last batch_token of the currently handled response, so that you can paginate backwards if needed.

Databases `message_order` and `order_message`

This database is very Nheko-specific. Basically the idea is as follows: You don't need to render all messages at once. You can just render them, when the message at a certain position is shown. You could calculate the index to event_id mapping on demand from the event_order store, but this may be expensive, if you don't want to render reactions, edits and other related events at the "time" where they arrived, but rather at the location of the event they relate to. So you need to store an additional order key for "visible" messages instead of all messages. This is the message_order database and order_message is the inverse mapping.

Database `related`

Matrix has the concept of related events. These can be replies, reactions, redactions, edits, in chat verifications, call events and probably more in the future. Most of those have the event_id they relate to in m.relates_to. To easily access related events, we will store the related event ids as a value under the event_id they relate to as the key.

This makes it very easy to show for example reactions. You just have to iterate over the related events of each event and if they are a reaction, show them in a little bubble under that event. Similarly you could show edits that way, verification history as a single event or even implement basic threading on demand, i.e. have a button that focuses the timeline on all events that are a reply to the current event or where the current event is a reply to (transitively until a certain depth is reached).

Handling `/sync`

/sync is how you usually receive events. A client usually should call /sync in a loop. This request will block until a new event arrives at the server. This is usually called long polling. Once /sync returns, you will want to persist the new events. Usually you do that in a loop over every room and then loop over each message. You can distinguish the following kinds of messages:

Redactions: When a message gets redacted, a redaction is sent to every client. Clients generally should then remove the content key of the redacted event, so that the event is not displayed to the user anymore. With our event store this is fairly straightforward. Just read the event with the id specified unter the redacts field, strip out the content and write it again. You also want to inform the UI to rerender that event, if it is currently displayed to the user.
"Normal" messages, that should be displayed to the user. Those will need to be added to the events database. As those events should be shown to the user, you will want to add their event_id to the message <-> order databases as well as the event <-> order databases.
"Hidden" events: Some events should not be shown to the user. Those should generally still be added to then events and event <-> order databases, but you will not want to add them to the message <-> order database.
Related events: These can be messages from 2. or 3.. These events may not be displayed or they are, but they also affect the rendering of other events. You will, in addition to handling 2. or 3., you will also want to add those events or rather their event ids to the related database. That way you can render for example reactions or edits. Replies work the other way around. Threads may work both ways, depending on how you want to display them. Don't forget to notify the UI, that the related events changed, so edits can update the message text or new reactions do get added and similar.

Additionally to storing the events, you will also want to update room state, sync token, presence and such. This is out of scope of this document though. Make sure to not accidentally store the same sync response twice, i.e. after reconnecting because of a network issue. You can easily dedup /sync responses by verifying the sync token matches what you currently are expecting.

If your sync response contains a limited field and this is set to true, you will have a gap in your event store. There are multiple ways to handle that: You can throw away all other messages or you store the gap and fill it, when the user views the room/gap. The former is rather easy to implement, the latter is rather complicated, but may lead to nicer results.

Paginating via `/messages`

Paginating is pretty much the reverse operation of long polling. This is what happens, when a user requests older history than currently stored, or when you need to fill a gap (which can happen in forward and backward direction). When paginating backwards, you can handle events pretty similar to /sync, with a few, small differences. You will not really need to handle redactions. Servers should not return you a redaction for an event you didn't receive yet or at least they should never return you the full event, which was redacted, and you will have to handle the redaction then. Rather you may receive the event with content redacted or you don't receive it at all. Also you want to add the event ids to the top of the timeline of course, when you are paginating backwards instead of to the bottom. Furthermore you may receive related events, where the event it relates to was not received yet. Make sure to handle that correctly.

One thing to remember: You are not allowed to use a /sync token to paginate. While some homeservers, like synapse, tend to allow this, others don't. Always use the prev_batch token from /sync or from /messages.

Filling in missing events

Sometimes you may want to have an additional event available to render another event, but you have neither received it via /sync nor via pagination. In those cases you can use the /<room_id>/event endpoint to fetch specific events. This can be useful to render replies or membership events nicely and similar. Usually you don't know, where those events are positioned, so you only want to add them to the events database, until you actually receive them while syncing or paginating.

Pending messages and local echo

Usually you want to show pending messages to the user, even though they haven't arrived over /sync yet. There are 2 ways to do that: Either you keep them in memory, rendering them after all other messages or you store them in the database as a normal message under their transaction id instead of event id and then replace it, once it arrives over /sync (or you use the event id returned by /send). How to keep track of the pending messages, queue them properly, etc, is left as an exercise to the reader.

Limitations

This is a rather simple implementation of an event store. For this reason it does not handle gappy timelines that well. Relations are also not stored that optimized, you may want to store the actual reactions to a message instead of only related events and iterating. This also limits /sync responses to the disk or database speed. As such this may have more CPU and IO overhead, than an event store, which is mostly kept in RAM. This can be somewhat mitigated via caches, but not completely. You also lose some of the power SQL databases can give you. You will need to implement those higher level features yourself.

Benefits

You basically don't need to keep any events in RAM. You can store everything on disk and only load the necessary events from disk, maybe with some caches. This means you basically should only need RAM to keep the UI and connection logic around. Everything else should just be caches, that can be dropped at anytime. Especially with a memory mapped database, this should have a nice effect on systems with lower amounts of RAM. This requires you to be able the message indices to some lazy models for your views though, which may have some CPU overhead.
The event store is fairly simple and should be easy to debug, optimize and extend. You can implement it on most kinds of databases, although key-value stores are recommended.
It is fairly easy to hide events by applying a filter on what gets stored in the message <-> order databases, although changes to those filters may require you to rebuild those databases.

Closing thoughts

I hope you found this small introduction to Nhekos event store interesting. If you have any questions or ideas, come discuss them in #nheko:matrix.org. Maybe you have ideas on how to improve it or you want to compare or introduce your own event store? I'm interested in how other clients handle their events and store them for rendering to users. Thank you for your attention!