At the heart of Caché lies the Caché Database Engine. The database
engine is highly optimized for performance, concurrency, scalability, and reliability.
There is a high degree of platform-specific optimization to attain maximum performance
on each supported platform.
Caché is a full-featured database system; it includes all the features
needed for running mission-critical applications (including journaling, backup and
recovery, and system administration tools). To help reduce operating costs, Caché
is designed to require significantly less database administration than other database
products. The majority of deployed Caché systems have no database administrators.
The major features of the database engine are described in the following sections.
Transactional Multidimensional Storage
All data within Caché is stored within sparse, multidimensional arrays.
Unlike the multidimensional arrays used by typical OLAP (online analytic processing)
products, Caché supports transaction processing operations (inserts, updates,
locking, transactions) within its multidimensional structures. Also, unlike most OLAP
engines, these multidimensional structures are not limited in size to available memory.
Instead, Caché includes a sophisticated, efficient data cache.
Because Caché data is of inherently variable length and is stored in
sparse arrays, Caché often requires less than half of the space needed by a
relational database. In addition to reducing disk requirements, compact data storage
enhances performance because more data can be read or written with a single I/O operation,
and data can be cached more efficiently.
Objects and Multidimensional Storage
Caché Objects uses multidimensional storage as the foundation of its
object persistence technology. For example, suppose you have a simple
Employee class
that represents employee datasay an employee's name, ID number, and location.
Instances of
Employee object could be stored in a multidimensional
array like the one pictured here.
Storage of Persistent Objects
In this case, the array is subscripted by an object identifier value and the
data for each instance is stored compactly within the nodes of the array. Caché
Objects automatically creates the optimal storage structure for persistent classes.
If a cross-index, say on the
Location property, is needed,
the persistence engine would use a different multidimensional array, similar to the
one pictured below, to associate location values with corresponding object identifiers
(typically a twodimensional structure is used for this purpose).
Storage of Indices
Such indices are automatically maintained as changes are made to the database.
The Caché SQL engine automatically uses such an index to satisfy queries for
finding Employees by location.
Multidimensional arrays give applications a great degree of flexibility in how
they store their data. For example, a set of closely related objects, say an
Invoice object
and its corresponding
LineItem objects, can easily be configured
so that the
LineItem objects are physically clustered with
a
Invoice object for highly efficient access.
Using a unique feature known as
subscript mapping, you can specify
how the data within one or more arrays is mapped to a physical database file. Such
mapping is a database administration task and requires no change to class/table definitions
or application logic. Moreover, mapping can be done within a specific sparse array;
you can map one range of values to one physical location while mapping another to
another file, disk drive, or even to another database server. This makes it possible
to reconfigure Caché applications (such as for scaling) with little effort.
The flexibility of transactional multidimensional storage gives Caché
a significant advantage over the twodimensional structure used by traditional
relational databases: it is this flexibility that allows Caché to be a high-performance
SQL, object, and XML database without compromise. It also means that Caché
applications are better prepared for future changes in technology.
Caché provides the ability to execute database operations as well as
any degree of business logic within Caché processes.
Process Management
A process is an instance of a Caché virtual machine running on a Caché
server. A typical Caché server can run thousands of simultaneous processes
depending on hardware and operating system. Each process has direct, efficient access
to the multidimensional storage system.
The Caché virtual machine executes instructions, referred to as
P-code,
that is highly optimized for the database, I/O, and logic operations typically required
by transaction processing and data warehousing applications. Virtual machine code
can be created in the following ways:
-
SQLSQL queries submitted to Caché are processed by the
Caché SQL optimizer which, in turn, converts them to efficient, executable
P-code (making use of any indices that may be present).
-
Object BehaviorThe Caché Objects technology provides
a high degree of server-side object behavior (such as object persistence) by automatically
generating executable P-code (using method generatorsobject methods that can
generate code according to pre-determined rules).
-
Caché ObjectScriptApplications
can include any logic that they wish to execute within a Caché data or application
server using the Caché ObjectScript scripting language. Such code can take
the form of object methods (analogous but much more powerful than stored procedures
within the relational world as they can make full use of object-oriented features)
or complete
routines(small programs that run within a Caché
server).
-
Caché BasicObject methods
can also be implemented using the Basic programming language. Caché includes
a powerful, object-based version of the popular Basic programming language that is
familiar to a large portion of the world's software developers. Caché's Basic
runs on every supported platform and is completely interoperable with Caché
ObjectScript.
Distributed Data Management
One of the most powerful features of Caché is its ability to link servers
together to form a distributed data network. In such a network, machines that primarily
serve data are known as Data Servers while those that mainly host processes, but little
to no data, are known as Application Servers.
Enterprise Cache Protocol
Servers can share data (as well as locks) using the Caché Enterprise
Cache Protocol (ECP). ECP is effective because data is transported in packages. When
information is requested across the network, the reply data package includes the desired
data, and additional, related data as well. The natural data relationships inherent
to objects and the Caché multidimensional data model make it possible to identify
and include information that is related to the originally requested data. This
associated information
is cached locally either at the client or on the application server. Usually, subsequent
requests for data can be satisfied from a local cache, thus avoiding additional trips
across the network. If the client changes any data, only the updates are propagated
back to the database server.
ECP makes it possible for applications to support a wide variety of runtime
configurations including multi-tier and peer-to-peer.
To provide database integrity and reliability, Caché includes a number
of journaling subsystems that keep track of physical and logical database updates.
The journal management technology is also used to provide transaction support (a journal
is used to perform transaction rollback operations) as well as database shadowing
(a journal is used to synchronize a shadow server with a primary data server). As
with the rest of the system, Caché lets you configure its journaling system
to meet your application's needs.
To support concurrent database access, Caché includes a powerful Lock
Management System.
In systems with thousands of users, reducing conflicts between competing processes
is critical to providing high performance. One of the biggest conflicts is between
transactions wishing to access the same data. Caché lock management offers
the following features to alleviate such conflicts:
-
Atomic OperationsTo eliminate typical performance hot spots,
Caché supports a number of atomic operations, that is with no need for application
level locks. An example of this is the ability to atomically allocate unique values
for object/row identity (a common bottleneck in relational applications).
-
Logical LocksCaché does not lock entire pages of data
while performing updates. Because most transactions require frequent access or changes
to small quantities of data, Caché supports granular logical locks that can
be taken out on a per-object (row) basis.
-
Distributed Locksin distributed database configurations, Caché
automatically supports distributed locks.
Caché provides portable support for a myriad devices (such as files,
TCP/IP, printers) making it possible for Caché applications to interoperate
with a host of other technologies. The interconnectivity options available with Caché
(including CSP, ODBC, SOAP, and Java) are built on top of this underlying support.
Caché runs on, and is optimized for, a variety of hardware platforms
and operating systems including Windows (98, NT, 2000, XP, etc.), OpenVMS, Linux,
and every major version of UNIX.
You can easily port applications developed with Caché as well as data
from one platform to another. This can be as easy as installing Caché on the
new platform and moving the database files to new system. When moving between some
systems, you may need to run an in-place data conversion utility (to convert one endian
representation to another).
Caché supports a variety of different runtime configurations giving
you maximum flexibility when you deploy your applications. You can switch between
different deployment options by changing Caché system settings; typically there
is no need to change your application logic.
Some basic deployment options are listed below.
Basic Client/Server Configuration
In the simplest client/server configuration, a single Caché data server
services a number of clients (from one to many thousands, depending on the application
and platform).
Client/Server Configuration
The client systems can be any of the following:
-
Stand-alone desktop systems running a client application that connects
via a client/server protocol (such as ODBC, ActiveX, JDBC, Java).
-
Web server processes talking to Caché via
CSP (Caché
Server Pages),
SOAP, or some other connectivity option
(such as ODBC, JDBC). Each Web server process may then service a number of browser-based
or machine-to-machine sessions.
-
Middleware processes (such as an Enterprise Java Bean application
server) that connect to Caché via ODBC, JDBC, etc.
-
Devices, such as terminals or lab equipment, that connect to Caché
using one of many supported protocols (including TELNET and TCP/IP).
-
Some combination of the above.
Shadow Server Configuration
The Shadow Server configuration builds upon the basic client/server setup by
adding one or more shadow servers. Each shadow server synchronizes itself with the
data within the main data server by connecting to and monitoring its transaction journal.
Shadow Server Configuration
Shadow servers are typically used to service ad hoc queries, large reports,
and batch processes to limit their impact on the main transaction system. They can
also be used to provide fail-over systems.
The multi-tier configuration uses the Caché distributed database technologythe
Enterprise Cache Protocol (ECP)to make it possible for a greater number of
clients to connect to the system.
Multi-tier Configuration
In the simplest multi-tier setup, one or more Caché systems, acting as
application servers, are placed between the central data server and the various client
systems. In this case the application servers do not store any data, instead they
host processes that perform work for the client's benefit, off-loading the CPU of
the data server. This type of configuration scales best for applications that exhibit
good
locality of reference, that is most transactions involve reasonably
related data so that locking across application servers is limited. Such applications,
as well as those with a fair amount of read access (like most typical Web applications),
work extremely well in this model.
More complex configurations, with multiple data servers as well as data stored
on application server machines, are also possible.
Typically applications use the multi-tier configuration for scaling as well
as for providing high-availability (with applications servers serving as hot standby
systems).