Product Engineering

Apache Traffic Server – High Performance Web Proxy Cache

Apache Traffic Server
What is Apache Traffic Server?

Apache Traffic Server is a high-performance web proxy cache that increases network efficiency and performance by caching frequently-retrieved content at the edge of the network. This brings content tangibly nearer to end users, while empowering faster delivery and reduced bandwidth use.

Apache Traffic server

Apache Traffic Server

Apache Traffic Server’s history

Traffic Server had a successful journey as a commercial web proxy server. First it was introduced & commercialized by INKTOMI Corporation, which was acquired by Yahoo in 2003, thereafter Yahoo kept it commercialized till August 2009, and later Yahoo had donated its source code to the Apache Software Foundation. In Q2, 2010 Apache Traffic Server declared as Apache Software Foundation Top-Level project.

Understanding Apache Traffic Server Caching Mechanism

When Apache Traffic Server receives a client request for a web object/content, it tries to locate the requested object/content in its object database (cache).

If the object is available in cache, Apache Traffic Server serves it to the client & marks the transaction status as CACHE-HIT.

Cache-HIT

Cache-HIT

If the object/content is not in the cache (CACHE-MISS) or no longer valid, then Apache Traffic Server obtains the object from the origin server, stream it to the client & cache it in object database.

cache-MISS

cache-MISS

Similarly before serving the object from its cache (object database), Apache Traffic Server issues a revalidation query towards origin server to validate the object’s freshness based on HTTP request headers (cache-control directive). If object/content is out-of-date then Apache Traffic Server obtains the fresh object from origin server, streams it to the client & overwrites the cached object in its object database.

HTML cache-control request headers are very crucial, to enforce object retrieval constraints and to ensure cached object freshness. Based on available HTML cache-control request headers, Apache Traffic Server decides when and from where requested object needs to be served.

High Level Architecture

It’s important to understand Traffic Server’s architecture to work with it. Traffic Server has three essential processes which work together to serve requests.

  • traffic_server manages the incoming connections, ensures the caching, pass-through fetching and so on
  • traffic_manager invokes and manages traffic_server process and communicates with other nodes when running in cluster mode
  • traffic_cop is the watchdog daemon which ensures that all other processes are up and running (if necessary) for traffic_server and traffic_manager
High Level Architecture

High Level Architecture

Apache Traffic Server in action:

In one of my telecom projects in production environment, network throughput was measured ~8GB without any bandwidth saving.

In the beginning of 2014, Apache Traffic Server has been implemented as web proxy cache in cluster mode (with 2 nodes), thereafter bandwidth savings of ~23% was achieved on each node and hence the cluster wide bandwidth savings of 46% was measured. Considering the complexity of the project, and the savings achieved, the customer appreciated the effort.

References