Mirai Bot 5.0 - Mirai + Megane

Tue, 06 Oct 2020 05:00:00 UTC

mirai, bot, discord, changelog

Mirai + Megane

Over the past many months I've been working on a rewrite of Mirai, moving to a clustered architecture. Previously Mirai ran entirely in one process, and this was incapable of handling the amount of data received from Discord. To solve this I created megane, a Discord bot framework that splits a bot up into multiple clusters to make use of threads (something NodeJS doesn't do). Here you can see the results of the first live test of megane:

Mirai Bot CPU usage with mirai-bot-core. Notice how often the CPU core reaches 100% utilization, blocking things like commands and causing disconnects.

Mirai Bot CPU usage with megane. Notice that each cluster has its own core, spreading computation out so one task can't block the entire bot.

Removal of Presence

Presence data makes up the large majority of all events from Discord. If you don't know what presence data is, it's basically your current state in Discord. It's your activities (game status), online status, username, avatar for every guild (server). Every time one of these things changes, a presence update is sent to every client in every guild. So imagine if Rythm, which is in 12M+ guilds, updates it's game/status. That means for each of those 12M+ guilds every member that's connected will get a presence update. You can imagine what this means for bots that are in a large amount of guilds.

As of now, Mirai receives somewhere around 200-250 million presence events per day. If you do the math that's around 2,500 presence updates per second.

With the addition of intents to the Discord API presence updated are largely disabled for bots. Getting access to them requires manual whitelisting which is not easy to get. Luckily Mirai hardly makes use of them after recent changes, so the only thing lost is a few pieces of info in the m.info command. Without needing to process these events Mirai would probably run without issue even on the current design. Here is the most recent CPU usage graph from the second megane test without presence:

Notice how it doesn't even seem like anything is running.

Unfortunately with intents also comes a huge change to the way guild member chunks work. If you don't know, guild member chunks are a resource that bots request to receive every member in a guild (if they need it). Before intents guild member chunks could be requested all at once on connecting. However, without the guild presence intent this is limited to one guild per request. With a total shard websocket hard limit of 120 requests per minute this obviously causes issues. Eris, the api library megane uses, does not even support this. I did the math, and found out that for a shard with 2,000 guilds (the max is 2,500) it would take 20 minutes to cache every guild's members at 100 guilds per second. I obviously didn't want to further delay the ready event by 20 minutes, so my only option was to write my own caching logic. I added a custom member caching system to megane that caches guild members after the cluster is ready. This allows most of the bot to operate while the parts that need a complete cache are disabled until everything is ready for them. You'll notice this indicated in Mirai by the "starting..." status.

How Megane Works

Megane works by splitting Discord shards up into clusters and running those clusters in separate processes. By default the number of clusters is the number of CPU cores on the server. This allows it to make use of every core, while keeping overhead down by not creating too many unneeded processes. A simple way to think of this is that it's assigning each cluster to a CPU.

Services are also a new concept in megane. Services are programs in megane that run in their own process. Some examples in Mirai are the Twitch, anime, and Patreon service. Each of these services are used by all of Mirai, but only one instance exists which every client can communicate with. Running these in their own process prevents things like Twitch data updates from blocking a thread and causing issues with the bot.

For a visual illustration of how this is all laid out you can use this poorly-made diagram I made:

Notable Changes in v5

Here you can find a reduced changelog with less important changes removed. For the full v5 changelog scroll to the next section.

Full Changelog

Core Changes

Service Changes

Command Changes