Asyncio Proxy Herd banner

Overview

Period: March 2019
Languages: Python
Description:
Background:
Wikipedia and its related sites are based on the Wikimedia Architecture, which uses a LAMP platform based on GNU/Linux, Apache, MySQL, and PHP, using multiple, redundant web servers behind a load-balancing virtual router for reliability and performance. LAMP works well for Wikipedia, but let's assume we are creating a new Wikipedia-style service that is designed for news, where (1) updates to articles happen far more often than on Wikipedia, (2) access is done via various protocols (not just HTTP), and (3) clients tend to be more mobile. In this new service, from a software point of view, the application would struggle to add new servers (e.g. for access via cell phones, where the cell phones frequently broadcast their GPS locations). Additionally, from a systems point of view, the response time looks like it will be too slow as the Wikimedia application server is a central bottleneck.

Application Server Herd:
Given this new service and its unique attributes, this project sought to look into the development of a new architecture called an "application server herd", where the multiple application servers communicate directly to each other as well as via the core database and caches. The interserver communications are designed for rapidly-evolving data (ranging from small data such as GPS-based locations to larger data such as ephemeral video). For example, you might have three application servers A, B, and C such that A talks with B and C, but B and C don't talk to each other. The idea of this new architecture is that if a user's cell phone posts its GPS location to any one of the application servers then the other servers will learn of the location after an interserver transmission, without having to talk to the database (which would be a bottleneck). This architecture was written using Python, and utilizes asyncio: an asynchronous networking library, which allows for fast processing of a high volume of added/dropped connections with other servers.

Framework:
The prototype for this new architecture consists of five servers that communicate to each other (bidirectionally), and each server accepts TCP connections from clients that emulate mobile devices with IP addresses and DNS names. Clients can send their locations to the server by sending messages via a command called IAMAT. Servers respond to IAMAT messages with AT messages that tell clients the server's location. Additionally, clients can send WHATSAT messages to query for information about places near other clients' location (i.e. within a given radius from the client). Interserver communication takes place via AT messages and propagate from server to server via a simple flooding algorithm. Additionally, the server herd is fault-tolerant and continues operation when servers in the herd go down (by dropping TCP connections), and reconnect once servers are back online. Lastly, each server logs every new/dropped connection with other servers as well as with new clients.


GitHub Source Code