Remote file access system Cage


3. Principles of building servers

Cageserver file servers can be run with an arbitrary number of ports, one of which ("primary") is used only for authorization of all clients, the others-for data exchange. The Cage server program only requires Python. In parallel, a computer with a file server can perform any other work.

The server starts first as a collection of two main processes:

  1. "Connections" - the process for performing communication operations with clients and its termination at the initiative of the server;
  2. "Operations" - the process for execution of tasks (operations) of clients on work with files, and also for closing of communication sessions on commands of clients.

Both processes are out of sync and are organized as endless cycles of receiving and sending messages based on multi-process queues, proxy objects, locks, and sockets. The "Connection" process allocates each client a port for receiving and transmitting data. The number of ports is set when the server starts. The correspondence between ports and clients is stored in shared proxy memory between processes.

The "Operations" process supports the separation of file resources, with several different clients being able to read data from a single file together (quasi-parallel, since access is controlled by locks) if this was allowed when the "first" client initially opened the file.

Processing of commands to create/delete/open/close files on the server is performed in the process of "Operation" strictly sequentially using the file subsystem of the server OS.

For General read/write acceleration, these operations are performed in threads generated by the Operations process. The number of threads is usually equal to the number of open files. Read/write jobs from clients are fed into the shared queue and the first thread that is freed takes the job out of its head. Special logic allows to exclude the operation of overwriting data in the memory of the server.

The "Operations" process monitors the activity of clients and stops servicing them both at their commands and when the inactivity timeout is exceeded.

To ensure reliability, Cageserver keeps logs of all transactions. One shared log contains copies of messages from clients with tasks to create/open/rename/delete files. For each working file, a separate log is created, in which copies of messages with tasks for reading and writing data in this working file are recorded, as well as arrays of written (new) data and arrays of data that were destroyed when overwriting (writing new data "on top" of old).

These logs provide the ability to both restore new changes to backups and to "roll back" from the current content to the desired point in the past.

The launch of Cageserver program is performed as follows: when starting the dialog, you need to define the main port for authorization and the number of ports for exchanging transactions with authorized clients (from the 1st and more, the pool of numbers begins with the next number of the main port).

Cageserver is about 3100 lines of code.