TelegraphCQ: Continuous Dataflow Processing for an Uncertain World (2003)
March 22, 2024Stonebraker and Çetintemel presented “One Size Fits All” in 2005. The phrase refers to the fact that various data-centric applications use traditional DMBMS architectures to store data regardless of the characteristics and requirements of the data. In the paper, they argued that this approach was no longer applicable, with examples of streaing processing. TelegraphCQ, presented in 2003, is the first generation of databases for streamingffff data from the early 2000s.
Streming data is emitted at high volume in a continuous and unpredictable manner. Some dataflow processing applications, such as sensor monitoring systems, can be real-time systems with continuously active queries. In such systems, when new data arrives, it should be routed to active queries. Traditional RDBMS cannot daptively handle streaming data because the queries in RDBMS process data in the storage, and rely on statistics of the storage for optimization.
TelegraphCQ was developed by leveraging PostgreSQL code base to address problems of Telegraph.
The following figure shows the basic process structure of PostgreSQL.
the components shaded in gray were changed for TelegraphCQ.
The Postmaster foks new server processes in response to new client connections.
A Listener accepts requests on a connection and returns processed data to the client.
When a new query arrives, it is parsed, optimized, compiled into an access plan.
The query Executor processes the access plan.
The three processes, FrontEnd, Executor, and Wrapper comprise TelegraphCQ, as shown in the following figure.
These processes are connected using a shared memory infrastructure.
The Postmaster forks a FrontEnd process for each new connection.
Since each connection can have multiple open cursors, it depends on a proxy service to collect individual requests from clients.
The FrondEnd and the Executor exchange query plans and output of queries through queues in the shared memory infrastructure.
The Wrapper receives streaming data with less blocking operations and disk space.
It allows the Executor to access newsly arriving streamed data using mechanisims similar to those used for previously arrived tr even static data.
The above images are cited from TelegraphCQ: Continuous Dataflow Processing for an Uncertain World.