This article describes an issue with the Caché WebLink and CSP technologies. The problem has manifested itself in the IIS/ISAPI (Internet Information Services/Internet Server application programming interface) connectivity option in both products. The result of the analysis is described together with the putative diagnosis. Finally, remedial action taken within the WebLink and CSP software is discussed.
The article contains the following sections:
Technology Overview
WebLink and CSP are software products that provide network-based connectivity between most commercial Web servers and the Caché application development and database environment.
WebLink and CSP implement two methods for connecting Web servers to Caché. The choice of method depends on whether or not the hosting Web server provides a multithreaded application programming interface (API):
The connectivity options for WebLink (WL) and CSP are illustrated in the following diagram:
Connectivity Options for CSP and WebLink
Problem Description
In all reported cases the classic symptoms are as follows:
Reports from the field suggest that though it is sometimes possible to recover the Web server environment by restarting IIS, it is often necessary to completely reboot the hosting server. Clearly, both these remedies result in severe disruption to operations.
Frustratingly, and despite the intense efforts of the InterSystems support staff, it has not been possible to reproduce this problem under test conditions. The only precipitating factor appears to be the posting of large amounts of data from the Web browser to Caché. Web server or application load does not appear to be a factor; failures occur at times when the Web server is virtually idle.
Investigation and Analysis
The first approach in identifying a cause involved trying to identify the source of the errors recorded in the operating system event log. If these failures are caused by a mechanical defect in the Caché ISAPI DLLs, it is straightforward to quickly supply a solution.
Both WebLink and CSP have built-in exception handling that is largely focused on dealing with, and responding to, bogus and malformed Web (HTTP) requests. Indeed, it is not impossible that the Caché ISAPI DLL could potentially fail to cope with an unusual, though otherwise valid, HTTP request.
To analyze this failure, rigorous exception handling was added to every function in the DLL regardless of the level of complexity or the nature of the operation performed by individual functions. Exceptions that occurred within the context of the DLL were successfully trapped and recorded, though there were still occasions where exceptions appeared in the operating system event log. This finding indicated that the failures had to be related to a common resource shared between IIS and its ISAPI extensions.
More detailed investigation revealed that the exceptions were occurring as a result of claiming memory from the processes’ primary heap. The operating system functions responsible for allocating memory (specifically HeapAlloc) were failing internally. There are three main possibilities:
Considering these possibilities, the most likely explanation for the failures appear to be an obscure problem deep within the operating system memory management facility that is somehow provoked or exacerbated by fragmentation of the heap.
Remedies
As a result of the preliminary analysis, the core of the Caché ISAPI DLLs were reworked to take pressure off the memory management facility of the operating system. The following schemes have been implemented:
The preceding changes have been made in a way that does not result in a loss of performance in either WebLink or CSP. Indeed, early tests with WebLink indicate a performance boost. CSP, having been reworked the same way, enjoys similar improvements.
A Special Note for WebLink Users
To significantly reduce the amount of memory used per request, WebLink attempts to stream the output from the Web server directly to the appropriate Caché system with minimal intermediary buffering. To do this, WebLink needs to know where to send the request as soon as possible without having to read large volumes of request data. Traditionally, the reserved form/URI variables prefixed with “MGW” (particularly MGWLPN and MGWCHD) are used to specify the target Caché server.
For HTTP GET methods (hyperlinks), these variables are present in the URI query string. The complete query string is ideally always available within the hosting Web server environment. For example:
<A HREF="/scripts/mgwms32.dll?MGWLPN=LOCAL&formID=3">
For HTTP POST methods (complete form submissions), these variables are transmitted in the URI specified within the form’s “ACTION” attribute. For example:
<FORM METHOD=POST ACTION="/scripts/mgwms32.dll?MGWLPN=LOCAL&formID=3">
Again, this is ideal because the whole query string is always available within the Web server environment. The target Caché system is determined before the incoming form data stream is read from the Web server. However, it should be noted that this technique does not work for the Opera Web browser. This particular browser fails to transmit query strings appended to URIs in the “ACTION” attribute.
An alternative method for HTTP POST is to include the MGW* variables as hidden fields within the form’s data. For example:
<HTML>
<HEAD><TITLE>My Form</TITLE></HEAD>
<BODY>
<FORM METHOD=POST ACTION="/scripts/mgwms32.dll">
<INPUT TYPE=HIDDEN NAME="MGWLPN" VALUE="LOCAL">
.
.
.
</FORM>
</BODY>
</HTML>
This works well for cases where the variables are placed at the top of the form. WebLink can determine the target Caché system after reading only a small amount of incoming form data.
However, for cases where these variables are placed at the end of the form (as shown in the following example) WebLink must read and buffer the entire data stream to identify the target Caché system.
<HTML>
<HEAD><TITLE>My Form</TITLE></HEAD>
<BODY>
<FORM METHOD=POST ACTION="/scripts/mgwms32.dll">
.
.
.
<INPUT TYPE=HIDDEN NAME="MGWLPN" VALUE="LOCAL">
</FORM>
</BODY>
</HTML>
While steps have been taken to use as little memory as possible in these latter cases, it is clearly much better to always include the MGW* variables at the top of the form.
In practice, WebLink expects to see the MGW* variables arrive first in the data stream if it is to use the optimal method of servicing the request. It starts streaming the incoming form data directly to Caché as soon as it reads the first non-MGW* variable. However, this exposes an ambiguity within the protocol.
When WebLink has not read any MGW* variables in the first section of data, it has no way of knowing whether no MGW* variables are specified (that is, the default Caché server is indicated) or whether the MGW* variables are, in fact, at the end of the transmission. Many WebLink applications do not specify any MGW* variables and rely on the default settings held within the configuration. For these applications to benefit from the optimal memory/request management scheme, a new configuration parameter has been introduced: Optimise_Memory_Usage (TRUE or FALSE).
If this parameter is set to FALSE (the default), WebLink buffers the entire request data if it cannot read the MGW* variables either in the query string or as the first fields in the posted data stream. This allows MGW* variables to be specified at the end of the form.
If this parameter is set to TRUE and WebLink cannot read the MGW* variables, either in the query string or as the first fields in the posted data stream, it assumes use of the default Caché server. This mode of operation is highly recommended but should only be used if you are sure that your application complies with the requirements regarding the placement of MGW* variables. Applications developed using WebLink Developer always place the MGW* variables at the top of each form; therefore, these applications can safely take advantage of this mode of operation.
A warning is written to the WebLink event log if an MGW* variable is not processed as a result of arriving at the end of the incoming data stream. For example:
>>> Fri Sep 07 15:46:13 2001; Thread ID: 182

WARNING: Reserved Variable 'MGWLPN' was not processed because it was not
 found amongst the first fields in the submitted data.

/scripts/mgwms32.dll?FormID=1
Note:
The information in this section also applies to WebLink operating in PDQWeb compatibility mode. In PDQWeb mode, the critical reserved variable is EP as opposed to the WebLink MGW* variables.
Conclusion
Tests in the field have been encouraging. Early adopters of the new DLLs have experienced no failures.
The decision to make the software changes described in this document has been made as a result of analyzing the symptoms and the event logs generated by our DLLs and the operating system. It would be ideal to be able to categorically put forward a definitive diagnosis; however, this has not been possible because of the inability to reproduce the failures under test conditions. For the same reason, it has not been possible to open a dialog with Microsoft to obtain a fix or a recommended workaround from them; historically, they only work with problems that can be demonstrated.
Long-term, InterSystems plans to offer alternative connectivity options, namely the NSD-based options shown in the diagram in the Technology Overview section. The central feature of these options is that the core WebLink/CSP functions are separated from the Web server environment. This, in itself, lowers the risk associated with open interfaces like ISAPI and provides a more stable and manageable platform for high-end Web operations. However, the all-in-one ISAPI connectivity solution continues to play an important role in small to medium-sized Web operations and InterSystems is committed to providing a robust implementation.