Sunday, May 29, 2011

HTTP Transport in axis2 : How to fix Timeout waiting for connection?

If you are fed up with navigating to different JIRA issue while trying to to understand why are you getting “Timeout waiting for connection” (every thing was fine till axis2 1.5 and you just upgraded to latest release say 1.5.4), you are at right place.

First lets understand when this happens:

for(int i=1; i<=3; i++){
ServiceClient sc = new ServiceClient();
----
OMElement response = client.sendReceive(payload);
response.build();
}

This will work perfectly fine in axis2 1.5 but will not work for axis 2 1.5.1 onwards. Particularly third call will fail due to timeout waiting for connection exception.

So what got changed and why we are getting the exception:

This is attributed to recent change in axis2 AbstractHttpSender#getHttpClient code. Which started caching the MultiThreadedHttpConnectonManager(MTHCM).

protected HttpClient getHttpClient(MessageContext msgContext) {

connManager = new MultiThreadedHttpConnectionManager();
configContext.setProperty(HTTPConstants.MULTITHREAD_HTTP_CONNECTION_MANAGER, connManager);
}

where connection manager being cached once created which was not the case prior to axis 2 1.5.1. Prior to 1.5.1 if caching is not enabled via axis2 option api every time new manager was getting created.

One thing to note is, axis2 1.5.4 creates new HttpClient object for each call unless its explicitly cached by using “CACHED_HTTP_CLIENT” option.
The reason behind caching MTHCM is to improve performance(http://hc.apache.org/httpclient-legacy/performance.html).

Http Connection manager serves as connection factory and manage connection pool. They come in different flavors. MTHCM is thread safe flavor of connection manager which allows multiple connection to run concurrently in thread safe way.

MTHCM manage http connection pool and make sure new connection is not created every time new call is made(Establishing a new connection is quite time consuming, it involves multiple packet exchange between client and server http://hc.apache.org/httpcomponents-client-ga/tutorial/html/connmgmt.html : Connection persistence).



So idea behind recent change in axis2 is to use connection pool facility of MTHCM to avoid new http connection every time you call webservice. While earlier for every webservice call new MTHCM instance was getting created and thus new connection pool and new connection(unless you are using caching MTHCM facility exposed by option api of axis2, now axis2 made caching MTHCM default behaviour ).

But then using single MTHCM comes with couple of constraints which leads to connection time out problem.

By default MTCH create only max 2 connection per host and 20 nos of total http connections (http://hc.apache.org/httpclient-legacy/threading.html) in the pool. And once you are done using these connection you need to return it to pool by releasing the connection (if you have already consumed two connection to talk to a host, request for third connection for the same host will wait and eventually throw timeout waiting for connection if no connection is available in the pool at the end of timeout period). Releasing the connection has to be done at the application level after reading the response stream since there is no way Apache client can determine if response has been read.

So if we are using axis2 1.5.1 onwards we need to release the connection to the pool since we can not have more than one MTHCM resulting only two available connection per host(default no of connections can be increased using options provided by MTHCM [http://hc.apache.org/httpclient-legacy/threading.html]).

Lets rewrite the sample pseudo with the suggested approach to fix the issue.

for(int i=1; i<=3; i++){
try{
ServiceClient sc = new ServiceClient();
----
OMElement response = client.sendReceive(payload);
response.build();
}finally{
sc.transportCleanup();
}
}

Transport cleanup interns calls releaseConnection() which releases the connection back to the pool.

Please note cleanup works only if HTTP response status code is one of following 200, 202, 400, 500.

Coming Soon: Other ways (using option api though one mentioned in the current blog is best way) to fix timeout issue, and some related discussion about axis2 and transport cleanup