Multithreading in Photon
What this article is about
In this article, we will talk about multithreading in the backend.
- how it is implemented
- how is it used
- what can be done
- what we invented ourselves
All these questions are relevant only if you develop something for the server side – modify the Server SDK code, write your own plugin, or even start some server application from scratch.
How does Photon solve the issue of multithreading?
The photon server application accepts requests from multiple client connections at the same time. I will call such connections peers. These requests form queues. One for each peer. If the peers are connected to the same room, their queues are merged into one – the room queue.
There are up to several thousand such rooms, and their request queues are processed in parallel.
As a basis for the implementation of task queues in Photon, the Retlang library was used, which was developed on the basis of the Jetlang library.
Why don’t we use Task and async/await.
It’s because of the following considerations:
- Photon Server development started before the appearance of these features
- The number of tasks that are performed by fibers is huge – tens of thousands per second. Therefore, there was no point in adding another abstraction, which, as it seems to me, also causes GC (Garbage Collector). The fiber abstraction is much more subtle, so to speak.
- For sure, there is a TaskScheduler that does the same thing as fibers and I would have learned about it in the comments, but in general, I did not want to reinvent the wheel.
What is a Fiber?
A fiber is a class that implements a command queue. The commands are queued and executed one after the other – FIFO. We can say that the template multiple writers – single reader is implemented here. Once again, I want to draw attention to the fact that the commands are executed in the order in which they were received, i.e. one after the other. This is the basis for the security of data access in a multithreaded environment.
Although in Photon we use only one fiber type, namely PoolFiber, the library provides five types. All of them implement the IFiber interface. Here is a short description of each.
- ThreadFiber – an IFiber backed by a dedicated thread. Use for frequent or performance sensitive operations.
- PoolFiber – an IFiber backed by the .NET thread pool. Note: execution is still sequential and only executes on one pool thread at a time. Use for infrequent, less performance-sensitive executions, or when one desires to not raise the thread count.
- FormFiber/DispatchFiber – an IFiber backed by a WinForms/WPF message pump. The FormFiber/DispatchFiber entirely removes the need to call Invoke or BeginInvoke to communicate with a window from a different thread.
- StubFiber – useful for deterministic testing. Fine grain control is given over execution to make testing races simple. Executes all actions on the caller thread
About PoolFiber
Let’s talk about tasks execution in PoolFiber. Even though it uses a thread pool, the tasks in it are still executed sequentially and only one thread is used at a time. It works like this:
- We enqueue a task in the fiber and it starts to be executed. To do this, the ThreadPool.QueueUserWorkItem is called. And at some point, one thread is selected from the pool and it performs this task.
- If while the first task was running, we enqueue several more tasks, then at the end of the first task, all the new ones are taken from the queue and the ThreadPool.QueueUserWorkItem is called again, so that all these tasks are sent for execution. A new thread from the pool will be selected for them. And when it finishes, if there are tasks in the queue, everything repeats from the beginning.
That is, each time a new batch of tasks is executed by a new thread from the pool, but only ONE at a time. Therefore, if all the tasks for working with the game room are placed in its fiber, you can safely access the room data from them (tasks). If the object is accessed from tasks running in different fibers, synchronization is required.
Even better you can see the idea on the next picture. We enqueue tasks A, B and C to fiber quite rare. Than execution may look like this:
Task A executed in one thread (line in the middle), Task B executed in another thread. For Task C system can select third thread. In case of more active work when task enqueued more often we may get something like this:
Group of tasks A is using one thread, Group of taks B is using second thread and Group of tasks C is using third thread. However we should understand none of those groups/tasks are intersected on timeline. All tasks are executed strictly sequentially
Why PoolFiber
Photon uses PoolFiber everywhere. First of all, just because it does not create additional threads and anyone who needs it can have their own fiber. By the way, we modified it a little and now it can’t be stopped. I.e. PoolFiber.Stop will not stop the execution of the current tasks. It was important for us.
You can set tasks in the fiber from any thread. All this is thread-safe. A task that is currently being executed can also enqueue new tasks in the fiber in which it is being executed.
There are three ways to set a task in fiber:
- put the task in the queue
- put a task in a queue that will be executed after a certain interval
- put a task in a queue that will be executed regularly.
It looks something like this:
// equeue task
fiber.Enqueue(()=>{some action code;});
// schedule a task to be executed in 10 seconds
var scheduledAction = fiber.Schedule(()=>{some action code;}, 10_000);
...
// stop the timer
scheduledAction.Dispose()
// schedule a task to be executed in 10 seconds and repeat every 5 seconds var scheduledAction = fiber.Schedule(()=>{some action code;}, 10_000, 5_000);
...
// stop the timer
scheduledAction.Dispose()
For tasks that run at some interval, it is important to keep the reference to the object returned by fiber.Schedule. This is the only way to stop the execution of such a task.
Executors
Now about the executors. These are the classes that actually execute the tasks. They implement the Execute(Action a)
and Execute(List<Action> a)
methods. PoolFiber uses the second one. That is, the tasks fall into the executor in a batch. What happens to them next depends on the executor. At first, we used the DefaultExecutor class. All it does is:
public void Execute(List<Action> toExecute)
{
foreach (var action in toExecute)
{
Execute(action);
}
}
public void Execute(Action toExecute)
{
if (_running)
{
toExecute();
}
}
In real life, this was not enough. Because in case of an exception in one of the ‘actions’, all the others from the toExecute list were skipped. Therefore, by default, FailSafeBatchExecutor is now used, which adds try/catch to the loop. We recommend using this particular executor if you don’t need anything special. We added this executor ourselves, so it is not available in the versions that can be found on github, for example.
What else did we invent ourselves
BeforeAfterExecutor
Later, we added another executor to solve our logging problems. It is called BeforeAfterExecutor. It “wraps” the executor passed to it. If nothing is passed, FailSafeBatchExecutor is created. A special feature of BeforeAfterExecutor is the ability to perform an action before executing the task list and another action after executing the task list. The constructor looks like this:
public BeforeAfterExecutor(Action beforeExecute, Action afterExecute, IExecutor executor = null)
What is it used for? The fiber and the executor have the same owner. When creating an executor, two actions are passed to it. The first one adds key/value pairs to the thread context, and the second one removes them, thereby performing the cleaner function. The pairs added to the thread context are added by the logging system to the messages and we can see some meta data of the object that left the message.
Example:
var beforeAction = ()=>
{
log4net.ThreadContext.Properties["Meta1"] = "value";
};
var afterAction = () => ThreadContext.Properties.Clear();
//we create an executor
var e = new BeforeAfterExecutor(beforeAction, afterAction);
//we create PoolFiber
var fiber = new PoolFiber(e);
Now, if something is logged from a task that runs in fiber, log4net will add the Meta1 tag with the value value.
ExtendedPoolFiber and ExtendedFailSafeExecutor
There is another thing that was not in the original version of retlang, and that we developed later. This was preceded by the following story:There is PoolFiber (this is the one that runs on top of the .NET thread pool). In the task that this fiber executes, we needed to execute a HTTP request synchronously.
We did it in a simple way like this:
- before executing the request, we create sync event;
- the task that executes the request is sent to another fiber, and, upon completion, puts sync event in the signaled stage;
- after that, we start to wait for sync event.
It was not the best solution in terms of scalability and began to give an unexpected failure. It turned out that the task that we put in another fiber in step two falls into the queue of the very thread that started to wait for sync event. Thus, we get a deadlock. Not always. But often enough to worry about it.
The solution was implemented in ExtendedPoolFiber and ExtendedFailSafeExecutor. We came up with the idea of putting the entire fiber on pause. In this state, it can accumulate new tasks in the queue, but does not execute them. In order to pause the fiber, the Pause method is called. As soon as it is called, the fiber (namely, the fiber executor) waits until the current task is completed and freezes. All other tasks will wait for the first of the two events:
- Call of method Resume
- Timeout (specified when calling the Pause method). In the Resume method, you can also set a task that will be executed before all the queued tasks.
We use this trick when the plugin needs to load the room state using an HTTP request. In order for players to see the updated state of the room immediately, the room’s fiber is paused. When calling the Resume method, we pass it a task that applies the loaded state and all other tasks are already working with the updated state..
By the way, the need to put the fiber on pause completely killed the ability to use _ThreadFiber for the task queue of game rooms.
IFiberAction
IFiberAction is an experiment to reduce the load on the GC. We can’t control the process of creating actions in .NET. Therefore, it was decided to replace the standard actions with instances of the class that implements the IFiberAction interface. It is assumed that instances of such classes are taken from the object pool and returned there immediately after completion. This reduces the load on the GC.
The IFiberAction interface looks like this:
public interface IFiberAction
{
void Execute()
void Return()
}
The Execute method contains exactly what needs to be executed. The Return method is called after Execute when it is time to return the object to the pool.
Example:
public class PeerHandleRequestAction : IFiberAction
{
public static readonly ObjectPool<PeerHandleRequestAction> Pool = initialization;
public OperationRequest Request {get; set;}
public PhotonPeer Peer {get; set;}
public void Execute()
{
this.Peer.HandleRequest(this.Request);
}
public void Return()
{
this.Peer = null;
this.Request = null;
Pool.Return(this);
}
}
//now we use it next way
var action = PeerHandleRequestAction.Pool.Get();
action.Peer = peer;
action.Request = request;
peer.Fiber.Enqueue(action);
Conclusion
In conclusion, I will briefly summarize: To ensure thread-safety in Photon, we use task queues, which in our case are represented by fibers. The main type of fiber that we use is PoolFiber and classes that extend it. PoolFiber implements a task queue on top of the standard .NET thread pool. Due to the small performance footprint of PoolFiber, everyone who needs it can have their own fiber. If you need to pause the task queue, use ExtendedPoolFiber.
The executors that implement the IExecutor interface directly perform tasks in fibers. DefaultExecutor is good for everyone, but in case of an exception, it loses the entire remainder of the tasks that were passed to it for execution. FailSafeExecutor seems like a reasonable choice in this regard. If you need to perform some action before the executor executes a batch of tasks and after it, BeforeAfterExecutor can be useful.