This page printed from: https://www.linuxmonth.com/issue1/articles/webperf/webperf.html?print=1
|
Web Site Performance by: Alex Shah |
Regular Edition |
Each week there are reports of another major site experiencing outages due to overwhelming traffic. Slow or unresponsive web sites account for hundreds of millions of dollars in lost revenue and increased cost, with estimates going into the billions for the coming year. Despite these facts, performance and capacity testing and tuning are largely ignored by web masters and developers. We will demonstrate the need for web developers to focus on the web application in addition to connection issues between the web site and end user. A guideline for testing web sites will be proposed as well as a detailed explanation of techniques used at Binary Evolution to guarantee reliability and quick response times under heavy traffic.
On February 14, 1998, Valentine's Day, America's largest reseller of flowers, FTD, with the help of their new online web site, FTD.com, was on its way to a record breaking day of flower sales. However, by day's end, millions of dollars of revenue had been lost due to a slow and unresponsive web site. In the early morning hours the web site was doing fine, but as the day progressed, more and more traffic congested the site. By 10:00am, the site had slowed down to an unusable state. Transactions were taking more than 30 seconds to respond, often not responding at all. Within an hour, the web site was completely unreachable, even by FTD's own development staff.
Since FTD's developers were located in Toronto, Canada and its web server was being co-located at an Internet Service Provider (ISP) in New Jersey, it took several frantic hours for the rescue team to get organized and travel down to FTD headquarters to begin diagnosing the problem. By 3:00pm, the web site was again able to take orders, and with careful monitoring and multiple restarts of the web server, FTD's online presence was partially restored. Millions of dollars of revenue were lost that Valentine's Day due to the number of lost orders. The indirect loss from damaging the FTD brand and the overall cost of losing online consumers to other flower vendors is more difficult to calculate, but undoubtedly exceeds any direct loss.
The irony of the FTD story is that it is relatively inexpensive to detect performance issues and resolve them as compared to the cost of down time and lost business. Many web sites fail to follow necessary steps when testing their own web application. A common oversight of web site management is in the preparation for failures at the ISP, or the connection between the web server and the end user while neglecting other aspects of the web site. Part of this failure can be attributed to overconfidence amongst the in-house development team. Relying on the development team to check its own work for defects is not ideal.
Using a separate Quality Assurance (QA) department is a better way of bug tracking, but the QA lab will often find usability and "look and feel" type mistakes. Having a few QA people review a web site does not compare to having thousands of simultaneous users banging away. Most QA labs do not have the expertise or tools needed to emulate how the live web site will actually be used. The result: the web site crashes during peak usage, or on the launch day.
A set of guidelines that Binary Evolution uses for effectively testing web sites will be discussed. In addition, we will demonstrate several common web site performance issues and our solutions. Even with the best preparation, web site disasters can happen. In fact, web site failure is nearly unavoidable and precautions must be taken to properly monitor for the inevitable crash. A plan of action will be proposed for worst case scenarios.
A recent study by Zona Research, Inc. (Nov., 1999) found that 20% of Internet users spend on average $200 each month. Given the size of the Internet (67 million active users), this equals 13.5 million online buyers, each spending $200 a month, or approximately $32 billion spent each year on online commerce.
While the consumer e-commerce market is considerable in size, the business-to-business (B2B) e-commerce market is larger and is predicted to grow dramatically. Web sites are beginning to accept B2B transactions over the Internet, either as part of their existing consumer site or as a key component of their business model. According to Forrester Research, the B2B e- commerce market is estimated to be $250 billion for 2000 and will grow to over $1.4 trillion by 2003.
Ideally, e-commerce sites should capture all of this trillion dollar potential. Pages should load quickly, be easily understood, guide users to the information that they need with a minimal number of clicks, and not suffer from connection issues between the web server and the consumer. B2B transactions should be secure, reliable and instantaneous.

In reality, the web master has little control over many connection issues that a client user faces (Figure 1). As concluded in the Zona Research study, 8.7% of the time client users are unable to connect to their local ISP due to difficulties with the connection: modem configuration, phone line, busy signals, or unreliable service. 2.2% of the time, the client user's ISP loses its connection to the Internet backbone. By carefully choosing a well-connected ISP, a web site can reduce the likelihood of a dropped connection between the web server and the Internet backbone (typically 2.2%).
Unfortunately, the connection between the consumer and Internet backbone is less controllable. Thus, even with an optimally connected web solution, e-commerce sites cannot avoid losing 10.9% (8.7% modem + 2.2% ISP) of potential revenue. B2B sites can expect closer to 2.2% loss, since the business on the other end will likely have a dedicated connection.
Most site managers understand that the longer a person has to wait for a web page to load, the more likely they are to "bail out" to another web site. As a result, web masters choose their hosting service carefully to ensure that their site has good connectivity to the consumer. When the consumer is outside the United States, efforts are made to mirror the web site in key geographic regions so that the delay in accessing a web site is minimized.
Despite these considerations, many web sites overlook the fact that the majority of consumers connect to the Internet using either a 28.8 or 56kbps modem. Usability testing is done across a high speed connection instead of simulating how the end user will access the site. Given that web sites are typically accessed via 56kbps modems which transfer at 5 kilobytes (K) per second, how long will a consumer wait for a page to load? [Note: kpbs represents kilobits per second, not kilobytes per second; there are 8 bits to a byte. Most 56kbps modems actually connect at 48kbps which results in a tranfer rate of 5K/s.]
Several studies have been conducted to determine why surfers terminate their use of a particular web site. The result of one such study is shown in Figure 2. Interestingly, no matter how long a page takes to load, 6% of users will bail out regardless. When a page exceeds 35K, the number of users who exit increases dramatically, until 70K is reached at which time about half have quit the site. Once a user has waited for more than 70K to transfer, they tend not to bail out, but the majority of consumers have already been lost by then.
Why are pages greater than 35K exited? Taking into account that the transfer rate for most end users is 5K/s, it will take 7-8 seconds for a 35K page to load. Many web studies have labeled this observation the "8 second rule". In other words, users will not wait more than 8 seconds to access a web site. Those web sites that have large front pages can expect to lose a good percentage of their visitors, as shown in Figure 2.

Developers should keep in mind that the "8 second rule" applies to the entire loop. The loop starts when the request is sent by the browser to the web server, and ends when the result page and all embedded images are transferred, processed and rendered on the screen. One should also calculate the seconds required by web applications to generate dynamic pages, for example: pages that access a database, or process forms. Such pages should process within 1 or 2 seconds, even under peak load conditions, to ensure that the overall response time is well within the "8 second rule".
Since web consumers will not tolerate a delay of more than 8 seconds, several guidelines can be established:
Understand the route from web server to ISP to customer. By reducing the distance between the web server and customer, consistent response times can be obtained. Internet distance is not measured geographically, but rather by the number of "hops". A hop is a machine on the Internet that a request must pass through to get to its destination. The more hops that are passed through to get information to the client browser, the longer the end user must wait. Work with the ISP to lower hops between the customer and the web site. If the customer is located in another country, mirror the web site to minimize the hop count.
Design pages that are less than 40K. Smaller pages mean lower "bail out" rates. If a page is dynamically generated, the size of the overall page must be reduced by 5K for each second it takes to process the page. For example, if a search request takes 3 seconds to process, the total page size (including images) cannot exceed 25K, 40K - (5K * 3).
Minimize the number of components on each web page. The time a page takes to load is made up of the transfer time of the HTML page plus the transfer of each image plus the time to render on the browser. Creating complicated pages with many tables and multiple images will exceed the "8 second rule".
Create several shorter pages instead of one large page. Studies have shown that web surfers prefer to click through several smaller pages with less content than wait for a single lengthy page.
Maintain a persistent, high performance connection between web server and database.
Common Gateway Interface (CGI) is often used to generate dynamic pages that access database content. Several issues exist with CGI which make it unusable for even low traffic web sites:
Each CGI request starts a new process. Starting a new request introduces unnecessary operating system overhead when handling requests. Under load, the number of CGI processes started will equal the number of incoming CGI requests which can result in thrashing – a condition where resources are dedicated to operating system tasks and no user processing can be completed. With FTD (see Introduction), thrashing brought on by too many simultaneous CGI requests resulted in a deadlock condition. Only after the power was cycled on the machine was the web site able to recover.
Each CGI request requires a new database connection. Databases based on structured query language (SQL), Oracle in particular, do several lengthy processing steps when a new database connection is created. The benefit of additional overhead during the initial link is that subsequent database requests are nearly instantaneous. Unfortunately, CGI cannot take advantage of database connection caching since the process is terminated for each request.

Figure 3. Scattered CGI Response Times
CGI processing is inconsistent. Usability studies have shown that in addition to the "8 second rule", the second major factor that drives web surfers away from web sites is inconsistent response times, making CGI a nonviable solution for web applications. CGI is unable to maintain consistent response times, even under moderate load. As shown in Figure 3, CGI requests are not handled in a first in, first out (FIFO) basis. Some requests are handled within a couple of seconds, while others take half a minute. The reason for this phenomena is that the operating system scheduler determines which request is handled next.
For example, during the processing of an initial request (request 1) , a second request (request 2) may arrive and take CPU cycles from request 1. Request 2 may finish before request 1, and another request (request 3) might arrive before the process scheduler resumes request 1. Request 3 completes, and request 4 starts its processing. By now, request 1 has been swapped out to disk and will not be loaded up for some time.
Separate the web application from the web server.
In response to the performance problems associated with CGI, several vendors have developed high performance, proprietary application programming interfaces (APIs) for their servers. The most notable are Netscape's NSAPI, Microsoft's ISAPI and Apache's server API. Applications compiled using one of these proprietary server APIs are faster than CGI programs. By running an application within the server process, the overhead of starting up and initializing a new process for each request is removed. Requests are handled in the order received and are not subject to rearrangement by the operating system's process scheduler. Database connections can be opened once, and reused.
Although server APIs perform better than CGI programs, they suffer from reliability and scalability issues. Since the web application is loaded into the web server process, a bug in the application can bring down the entire server. Code placed in the web server must be thoroughly tested under various load and failure conditions to ensure reliability. In addition, server APIs are difficult to learn and force the developer to use C or C++ to develop their application. Server APIs do not scale well on their own and require specialized software and hardware to spread processing across multiple machines.
Ideally, the web application should run outside the web server process. With an out-of-process solution, a bug in an application results in a single failed request rather than a web site crash.
Binary Evolution's own application server software, VelociGen, is an ideal platform for rapidly building fast, reliable web sites. Our web server side plug-in allows developers to build web applications with Perl scripting. Unlike CGI, VelociGen is fast: our own testing has shown it to be 2x faster than Java Servlets and 4x faster than Cold Fusion. Thanks to some clever performance innovations: compiling and caching of the script and keeping persistent database connections, VelociGen can process large volumes of requests, as much as 30 times faster than CGI. Processing can also be scaled across machines to meet the demand of any web site. VelociGen simplifies web programming by providing functions and templates for common web tasks such as connecting to the database, parsing query arguments, uploading files, and others.
Unlike most web tools on the market, VelociGen does not lock you into a single vendor for product extensions nor force you to learn a new proprietary language. Backed by a solid community of developers, Perl has an advantage that no other web development tool can boast: over ten years of real world usage. With Perl, a few lines of readable, maintainable code accomplish far more than equivalent solutions written in Java or C++.

VelociGen's architecture is designed to give the web server maximum crash protection (Figure 4). The Perl engines have been removed from the actual web server API and have been placed in separate processes. Each Perl engine handles a single request, allowing the web server to survive the inevitable bad code. Naughty scripts are forcefully terminated after the timeout value has expired. Each engine is restarted after a user defined number of hits, keeping the size and stability to a very reliable, crash proof level.
XML, session management, and template support have been added with release 2.0. VelociGen can now be configured to accept and process XML documents. Session management means personalization features and shopping cart applications can be added to the web site with ease. The new template system allows you to create Perl Server Pages (PSP). With PSP, small snippets of Perl code can be added directly into your HTML page, allowing you to do common tasks such as retrieving database content, or processing forms.
Next Month we'll talk about testing practices, how Binary Evolution does their testing and preparing for the worst when dealing with web trafic.