Performance Optimization Web Application
Database optimization techniques.
Create index:-
Always create index for columns which are used to join tables. For eg, you have one SQL which joins multiple tables. Creating index for columns which are used to join tables provides tremendous improvement in SQL execution.
SQL Query optimization:-
You might be following ANSI SQL standard, but you should realize that some SQL statements are slow in some RDBMs. For eg, in MySQL using OR in WHERE clause is slow. You may use AND or UNION to achieve the same result. For eg,
SELECT * FROM employee WHERE designation='manager' OR designation='programmer'
The above query can be rewritten as,
SELECT * FROM employee WHERE designation='manager'
UNION
SELECT * FROM employee WHERE designation='programmer'
This is faster than the first query. Similarly try to avoid subqueries in MySQL and use them sparingly. For eg,
SELECT address FROM employeeaddress WHERE empid IN
(SELECT employeeid FROM employee WHERE designation='programmer')
The above query can be rewritten as,
SELECT a.address FROM employeeaddress a, employee b WHERE
a.empid=b.employeeid AND b.designation='programmer'
Always return the columns which are actually required. If you want to return just employee name, do not use SELECT * statement. Try to avoid cursors.
Refer the documentation of your RDBMS for SQL query optimization techniques.
Schema Design:-
When you design tables, you just follow one rule of thumb. Keep the data in your table as small as possible. The number of columns in a table should be limited to a small number. The number of rows should not grow exponentially for a table which is queried frequently. For eg, you have an employee table where you want to keep records for every revision. Rather than creating record in the same table, you may create employee history table where you may keep all previous records. The employee table should keep only the latest updated records.
Database Server caching:-
Caching is an important step for server optimization. This should be done according to your available memory. For shared hosting environment you don’t have any control, but this can be configured if you have dedicated or VPS hosting. We may allocate memory for query cache, table cache, sort buffer etc. For eg, if you increase the allocated memory for query cache, recent queries are cached in memory. When a user executes same SQL statement, the result is retrieved from memory rather than disk which is much faster. In MySQL this is configured in my.cnf configuration file. For other RDBMS refer the documentation.
Master Slave architecture:-
When the load on your database server is too heavy where a single server instance could not manage all concurrent requests, you may think of implementing multiple instances. Master Slave architecture is a popular one where all update/delete operations are carried out in a single instance called master. All read operations are done in multiple slave instances. Replication is one way where changes in master database are synchronised to slave databases.
Server 1—> location 1 (master used for SQL UPDATE/INSERT/DELETE statements)
Server 2—> location 2 (slave used for SQL SELECT statements)
Server 3—> location 3 (slave used for SQL SELECT statements)
Here the replication is set in such a way that Server 2 and Server 3 get updates from Server 1.
There are many other optimization techniques left, but I am not a database expert so I am jumping into web server side. Most probably you might be using popular web servers like Apache, IIS or application servers like Weblogic, Websphere etc.
Use web server cache:-
Always deploy application by remembering caching in mind. In Apache you could control cache using httpd.conf file. By default caching is enabled in all web/application servers. You may specify in the content header of your html/dynamic server resource files whether caching should be enabled for that page. For eg, you may use meta tag <meta http-equiv=”Expires”…> for your html page to specify how long your page should be cached.
Enable GZIP compression:-
Compression always saves your bandwidth and time. In a typical web application user makes request and web server sends response. GZIP compression is the way of compressing the content before sending to browser. But all browsers do not support compression. The browser who is compression aware should know how to decompress the content. Fortunately, you need not to worry about it. Many web/application servers support GZIP compression. You have to just enable it. Server sends compressed content to browsers supporting it and normal response to other browsers. Study server documentation to know how to enable it.
Load balancing:-
When a single web server instance could not withstand all incoming requests, you may think of creating multiple instances. Use a load balancer like Pound in Linux. Pound acts as a proxy, load balancer etc.
Pound—> location 1
Apache server 1—> location 2
Apache server 2—> location 3
Here all the requests are coming to location 1 where load balancer (in our case Pound) is installed. It redirects the request to other Apache servers, get response and serve back to user. You may look at other load balancers also.
There are many application server specific tuning techniques but it is beyond the scope of this tutorial.
Let us get into application development side.
Use W3C standard:-
All your HTML/XHTML, CSS code snippets should be W3C standard compliant. You may use validators like http://validator.w3.org/ to check for any markup errors. If your site is W3C compliant, it works in all browsers without much issues.
Compress CSS/JavaScript files:-
All static resources like CSS files, JavaScript files etc should be compressed before deploying into a web server. Compression means removing all white spaces, new lines etc. There are many free tools available to compress content for CSS/JavaScript files.
Reduce HTTP requests:-
In a typical HTML page so many requests are made to web server. For eg, every page may consists reference to images, CSS files etc. Try to reduce the number of requests. For eg, if you have 10 CSS files, you may combine them to single file. Similarly use CSS sprites where single master image is used rather than serving individual images.
Reduce db calls/Generate static content:-
Suppose your site is a cricket fans site where you publish live cricket score card. All users require same information, latest score status. Rather than calling database for every request, you create an XML file with score details which is updated in regular intervals of time, say in every 10 minute. The scorecard is served from static XML file residing in web/application server. In this way we reduce database load coz database is called only in creating static XML file.
Use distributed caching:-
Static content always do not work in the case of sites with heavy traffic because disk reading always affect performance. Distributed caching is a way of replicating or allocating cache across multiple servers. Memcached is a popular distributed caching tool often implemented in a LAMP environment. In memcached objects are stored in RAM using key value pairs. In your application you first check whether the object is available in memcached and call database only if the data is not available in cache. You may look at other caching tools also.
Create index:-
Always create index for columns which are used to join tables. For eg, you have one SQL which joins multiple tables. Creating index for columns which are used to join tables provides tremendous improvement in SQL execution.
SQL Query optimization:-
You might be following ANSI SQL standard, but you should realize that some SQL statements are slow in some RDBMs. For eg, in MySQL using OR in WHERE clause is slow. You may use AND or UNION to achieve the same result. For eg,
SELECT * FROM employee WHERE designation='manager' OR designation='programmer'
The above query can be rewritten as,
SELECT * FROM employee WHERE designation='manager'
UNION
SELECT * FROM employee WHERE designation='programmer'
This is faster than the first query. Similarly try to avoid subqueries in MySQL and use them sparingly. For eg,
SELECT address FROM employeeaddress WHERE empid IN
(SELECT employeeid FROM employee WHERE designation='programmer')
The above query can be rewritten as,
SELECT a.address FROM employeeaddress a, employee b WHERE
a.empid=b.employeeid AND b.designation='programmer'
Always return the columns which are actually required. If you want to return just employee name, do not use SELECT * statement. Try to avoid cursors.
Refer the documentation of your RDBMS for SQL query optimization techniques.
Schema Design:-
When you design tables, you just follow one rule of thumb. Keep the data in your table as small as possible. The number of columns in a table should be limited to a small number. The number of rows should not grow exponentially for a table which is queried frequently. For eg, you have an employee table where you want to keep records for every revision. Rather than creating record in the same table, you may create employee history table where you may keep all previous records. The employee table should keep only the latest updated records.
Database Server caching:-
Caching is an important step for server optimization. This should be done according to your available memory. For shared hosting environment you don’t have any control, but this can be configured if you have dedicated or VPS hosting. We may allocate memory for query cache, table cache, sort buffer etc. For eg, if you increase the allocated memory for query cache, recent queries are cached in memory. When a user executes same SQL statement, the result is retrieved from memory rather than disk which is much faster. In MySQL this is configured in my.cnf configuration file. For other RDBMS refer the documentation.
Master Slave architecture:-
When the load on your database server is too heavy where a single server instance could not manage all concurrent requests, you may think of implementing multiple instances. Master Slave architecture is a popular one where all update/delete operations are carried out in a single instance called master. All read operations are done in multiple slave instances. Replication is one way where changes in master database are synchronised to slave databases.
Server 1—> location 1 (master used for SQL UPDATE/INSERT/DELETE statements)
Server 2—> location 2 (slave used for SQL SELECT statements)
Server 3—> location 3 (slave used for SQL SELECT statements)
Here the replication is set in such a way that Server 2 and Server 3 get updates from Server 1.
There are many other optimization techniques left, but I am not a database expert so I am jumping into web server side. Most probably you might be using popular web servers like Apache, IIS or application servers like Weblogic, Websphere etc.
Use web server cache:-
Always deploy application by remembering caching in mind. In Apache you could control cache using httpd.conf file. By default caching is enabled in all web/application servers. You may specify in the content header of your html/dynamic server resource files whether caching should be enabled for that page. For eg, you may use meta tag <meta http-equiv=”Expires”…> for your html page to specify how long your page should be cached.
Enable GZIP compression:-
Compression always saves your bandwidth and time. In a typical web application user makes request and web server sends response. GZIP compression is the way of compressing the content before sending to browser. But all browsers do not support compression. The browser who is compression aware should know how to decompress the content. Fortunately, you need not to worry about it. Many web/application servers support GZIP compression. You have to just enable it. Server sends compressed content to browsers supporting it and normal response to other browsers. Study server documentation to know how to enable it.
Load balancing:-
When a single web server instance could not withstand all incoming requests, you may think of creating multiple instances. Use a load balancer like Pound in Linux. Pound acts as a proxy, load balancer etc.
Pound—> location 1
Apache server 1—> location 2
Apache server 2—> location 3
Here all the requests are coming to location 1 where load balancer (in our case Pound) is installed. It redirects the request to other Apache servers, get response and serve back to user. You may look at other load balancers also.
There are many application server specific tuning techniques but it is beyond the scope of this tutorial.
Let us get into application development side.
Use W3C standard:-
All your HTML/XHTML, CSS code snippets should be W3C standard compliant. You may use validators like http://validator.w3.org/ to check for any markup errors. If your site is W3C compliant, it works in all browsers without much issues.
Compress CSS/JavaScript files:-
All static resources like CSS files, JavaScript files etc should be compressed before deploying into a web server. Compression means removing all white spaces, new lines etc. There are many free tools available to compress content for CSS/JavaScript files.
Reduce HTTP requests:-
In a typical HTML page so many requests are made to web server. For eg, every page may consists reference to images, CSS files etc. Try to reduce the number of requests. For eg, if you have 10 CSS files, you may combine them to single file. Similarly use CSS sprites where single master image is used rather than serving individual images.
Reduce db calls/Generate static content:-
Suppose your site is a cricket fans site where you publish live cricket score card. All users require same information, latest score status. Rather than calling database for every request, you create an XML file with score details which is updated in regular intervals of time, say in every 10 minute. The scorecard is served from static XML file residing in web/application server. In this way we reduce database load coz database is called only in creating static XML file.
Use distributed caching:-
Static content always do not work in the case of sites with heavy traffic because disk reading always affect performance. Distributed caching is a way of replicating or allocating cache across multiple servers. Memcached is a popular distributed caching tool often implemented in a LAMP environment. In memcached objects are stored in RAM using key value pairs. In your application you first check whether the object is available in memcached and call database only if the data is not available in cache. You may look at other caching tools also.