- Sakai CLE Courseware Management
- Alan Berg, Ian Dolphin
- 2187字
- 2025-03-31 04:37:32
How Sakai is deployed at scale
Sakai relies on keeping persistent information in a database. It stores content in the database or in the file system, and the application itself resides on the local file system. For small deployments, it is possible to have the database, file storage, and application on the same computer. However, the impact of a disk failure and the lack of stability under high-load conditions and long-term scalability make that structure less attractive for medium and large-scale, deployments. By placing the file system and database on separate servers, you not only gain more capacity, but it also becomes easier to diagnose performance bottlenecks. Scaling Sakai to more than one application server allows load balancers to distribute sessions and serve a larger number of users. The load can be divided among available hardware according to its capability, which allows sites to scale ad-hoc as the new equipment becomes available. The load-balanced infrastructure also enables a higher overall availability and ensures stability under much higher loads than an individual server can achieve.
Of course, there are many different configurations in the wild, and the examples in this chapter represent only the most generic. Suffice to say, Sakai scales well and there are a number of reliable configurations.
Tip
The System Administrator's Handbook
Tony Atkins wrote the majority of the System Administrator's Handbook (http://bugs.sakaiproject.org/confluence/display/DOC/Sys+Admin+Guide), which is an excellent reference for those of you who are directly involved in large-scale deployments.
Within many organizations, such as the University of Amsterdam (UvA), the online learning environment is classed as core business, and therefore decision makers have designated the service mission-critical with a related service level agreement (SLA). At the hardware level, this implies that the power supplies are redundant and large batteries sit charged in case of power failure, hard disks are hot-swappable, and extra servers are available for quick replacement. In practice, however, the stability of the system also depends heavily on highly knowledgeable application administrators. System administrators proactively monitor the logs and deal with daily problems before the end user notices. In my opinion, one of the classic mistakes that organizations make is dividing application administration tasks over a department where too many individuals are involved in understanding the overall infrastructure. Ideally, you want a couple of application administrators focused on the task. The long-term reliability of your infrastructure may well be dependent on the thorough Sakai-specific training of the administrators. In a successful project, management needs to factor in resources for training during the initial phases of deployment.
The following is a screenshot of a generic scalable structure. It is a redundant infrastructure — the gold-plated solution in which the accountants have calculated that the cost of hardware that is cheaper than the loss of service or reputation from a failure. Two load balancers sit in front of the application servers. One deals with incoming traffic, while the second passively monitors whether the first balancer is still functioning by checking for a heartbeat. If a front-end application server fails, the active load balancer stops sending requests to the failed server.

Load balancing
Load balancers are highly reliable and constitute the least likely point of failure in the whole infrastructure. In the most expensive and sumptuous environments, the infrastructure and data are mirrored to a second, geographically distant data center. In most situations, however, the cost-benefit ratio is much too high to consider doing this. That said, if you have the luxury of a data service center shared with other organizations, economies of scale have the potential to push down the per-organization costs to the point of action.
Load balancers know which application server a particular user needs to go to. The cheapest solution, known as IP spraying, is to divide incoming requests by IP address. However, that makes assumptions about how your users are distributed. The University of Amsterdam has study centers scattered citywide, unevenly distributed in the IP address space. Furthermore, there are proxy servers and firewalls that hide the IP addresses of client web browsers, further skewing the IP space. This lack of homogenous distribution invalidates IP spraying for Sakai-specific deployments.
A poor man's load balancer is a round-robin Domain Name Service (DNS). DNS translates the readable hostname of a server into an IP address and vice versa. If the DNS server returns a different server's IP address for each request, it will divide the web browser traffic evenly. This technique does not scale up, however, because users need to stay on one Sakai instance for each login session, and if a server fails, the DNS will still send the failed server's address back to the client.
Load balancers used to be expensive, but gradually their cost has diminished and their capacities rapidly increased in accordance with Moore's Law. A modern balancer can also keep track of how quickly a particular frontend server responds and can distribute the load better. It keeps track by adding its own cookie to the incoming request. Requests with a particular cookie are stuck to a particular frontend. The manufacturers aptly name this a "sticky session".
Frontend servers
The frontend servers normally contain a Sakai instance, one per server, with very standard specifications. The specifications change as hardware becomes cheaper. At the time of writing, our local admin considers a dual core CPU and 8GB of RAM reasonable. Another widely-used approach is to use virtualization, where there is one Sakai instance per virtual machine.
It is also possible to place a single Apache web server in front of a number of Tomcat instances on the same machine. That allows better use of large amounts of memory and the possible offsetting of SSL, and the responsibility for static files such as web page images, to the Apache server. Furthermore, Apache has many modules that extend its functionality — to turn it into a software load balancer or a reverse proxy for caching, for instance. Apache enables you to offset authentication away from the built-in Sakai login, and you can compress responses to speed up connections. All this comes at a price: complexity. In the end, the decision to use or not use Apache may simply depend on whether your local administrators have had previous experience with it and are confident with the underlying technologies.
It is also possible — and cost effective — to have your SSL offloaded onto the load balancer itself via hardware accelerators. This has the advantage of decreasing the level of configuration required to achieve safe logins and secure e-mails.
Because Java-based Sakai is not OS specific, the University of Amsterdam can run its critical services in both acceptance and production environments on Sun Solaris; the development team can choose its own OS and some members choose to develop on Linux, some on Solaris, and one lonely figure on Windows. (I will not discuss the pros and cons of each OS; it is sufficient to say that I am comfortable with the current situation though I'm slightly jealous of the cool Mac notebook and telephones one of the ponytailed webmasters parades around.)
Each frontend has the same specifications and some organizations have a "hot" spare ready to plug in, in case of emergency. Another approach is to use the same hardware elsewhere. At UvA, we test patches and newly-developed code in an acceptance environment. This is best practice for all large-scale deployments. The acceptance environment is a cut-down version of the production one, but uses the same software OS versions and hardware. For example, the acceptance environment for our current LMS has one load balancer instead of two and it has fewer frontend servers. The database hardware and file storage structure is identical. At first glance, this may look costly, but over the years, it has enabled local administrators to capture problems early before they affect the users. The secondary advantage is that the infrastructure sits physically within the same data center as production. If a failure occurs, a system admin can switch a machine from acceptance to production via a few-typed commands, router rule modifications, and changing a couple of cables.
The private net (which was shown in the previous screenshot) is reserved for network traffic between the application server and the file and database servers. This increases security and potentially removes a bottleneck under the high load.
The backup net is for traffic from backups. However, if you are using Storage Area Networks (SANS) or other modern storage devices, they can also potentially lie on the backup network.
Database preferences
Different organizations have different database preferences. The demo version of Sakai uses an in-memory database. However, the centrally organized Quality Assurance effort has also thoroughly tested Sakai with Oracle and MySQL databases.
Traditionally, web clients unwittingly see MySQL as a part of an open source LAMP environment (Linux, Apache, MySQL, and PHP) used for Internet applications by a large number of service providers. On the other hand, Oracle has a significant reputation as an enterprise-level, mission-critical database. The Sakai Foundation recommends both MySQL and Oracle for large scales. The Performance Work Group, mostly from the University of Michigan, has stress-tested and tweaked the various aspects until you can see your face in the polished surfaces that remain. Sakai has been X-rayed and interrogated until the vast majority of performance glitches have been eroded into history.
The Java Virtual Machine
A Sakai instance runs in a Java Virtual Machine (JVM). Among other things, the virtual machine is responsible for managing memory by removing old unused Java objects. In general, the JVM does a very good job, but needs its environment to tell it how much memory it can use and the best way to clean up old garbage. The system administrator achieves the configuration of the relevant setting by modifying the JAVA_OPTS
environment value.
There are two instruction sizes for the JVM: 32 bits and 64 bits. The 32-bit version can allocate up to 4GB of memory to any given instance. However, the OS may limit the real ceiling to a lower value. By default, the 32-bit version of a Windows server limits the real memory available to less than 3GB and the 32-bit Linux kernel to less than 2GB (http://goobsoft.homeip.net/JavaDebianTuning). It is, therefore, a good idea to operate with a 64-bit Linux, Windows, or Solaris kernel and, if you want to use more than 4GB of memory, then set JAVA_OPTS
to the corresponding -d64
option.
To make life somewhat more confusing, you may not want to use more than 4GB of memory even if you can. Java divides up its memory into areas. One or more parts of the memory are for short-term objects and others for long-living objects. If the long-living objects fill their allocated space, the JVM does a full collection of garbage, stopping the application until it has traced its way through all the long-lived objects' relationships. The stop-the-world garbage collection normally runs in milliseconds, but under load, the wait can become noticeable to the end user. The more memory you allocate, the longer the wait may become.
JVM configuration is something of an art, so trust the defaults until you have the evidence and expertise to suggest that you should do otherwise. If this approach does not prove adequate, then look at the configuration from other production servers and reuse when appropriate. The deployment configuration information from the production servers is stored in the Sakai Foundation's centrally-maintained bug tracking database known as Jira (http://bugs.sakaiproject.org/jira/browse/PROD). For example, for the current University of Delaware, the settings are:
-server -d64 -Xms8g -Xmx16g -XX:PermSize=512m -XX:MaxPermSize=512m -XX:NewSize=2000m -XX:MaxNewSize=2000m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+UseMembar -XX:-UseThreadPriorities -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled -Djava.awt.headless=true
In English, the JVM is set to start using 8GB of memory and can take up as much as 16GB. The most efficient settings are when -Xms
equals -Xmx
so you can argue that -Xms
should equal 16g.
The -server
option is interesting. Java is self-tuning; specialists call this ergonomics. JVM ergonomics improve from version to version. The JVM can choose from a number of algorithms and can keep statistics internally. Setting the -server
option tells the JVM to treat the system as if it is running on a server and not on a client. The other options mentioned manipulate how the JVM will clean up its memory and tell the JVM not to load certain Java classes at startup because we are not using a GUI. In the Delaware case, the system administrators certainly know what they are doing, but if you do not, I advise you to keep the configuration settings to a minimum unless you have time to experiment through stress testing.
As new tools bubble up to production, tweaking database indexes and configuration values may make the difference between a quiet and a busy day. Luckily, due to the scale of use, early adopters such as the University of Cape Town in South Africa (https://vula.uct.ac.za/portal) get to do the potentially tricky early optimizations and they are rather good at their work. This implies that a not-so-secret vector to success is to have a system administrator keep contact with the incoming patches from the ever-changing maintenance branch and then patch early in their own acceptance environment.