blogs.sun.com

Prenumerera på innehåll
Welcome to Blogs.sun.com! This space is accessible to any Sun employee to write about anything.
Webbadress: http://blogs.sun.com
Uppdaterad: 1 tim 55 min gammalt

Java.Net.Next Requirements - More than One Account on Java.Net?

4 tim 44 min sedan

Do you have more than one account at Java.Net? If so, please read on.

As announced by Ted Farrell back in February, Java.Net.next will use the Kenai infrastructure.  For the most part we are trying to make the transition as painless as possible but there will be some changes here and there.  Sometimes new features will be added; sometimes some may be dropped.  One specific point is about usernames and email addresses.

Currently, Java.Net allows for multiple usernames with the same email address (login is done only via the username), but Kenai does not (you can login with username or email).  We are considering using the Kenai approach for the new implementation of Java.Net and want your feedback on how that would affect you.

I asked Kevin if he could run a poll on this, and he kindly did. If you are interested, please go vote.

Copying the questions here:

  • I only have one account at Java.Net
  • I have multiple accounts at Java.Net and each has a different email address
  • I have multiple accounts at Java.Net with the same email address but I can change that
  • I absolutely need to have multiple accounts at Java.Net with the same email address
  • I don't have a java.net account
  • I don't know
  • Other

Database Migration

5 tim 49 min sedan
Comments are closed down at the moment due to website migration ....

PSARC 2010/306 - read-only zpools

7 tim 33 min sedan
You may think that this already there, but so far you was only able to set a single dataset (ZFS filesystem or ZFS emulated volume) into read-only. This PSARC case proposed such an mode-of-operation for the complete zpool. According to the documentation of this case, the following steps are done to reach the read-only-ness. 1). pool is loaded but transaction processing is disabled
2). all filesystems and zvols are mounted in read-only mode
3). any intent-log replays are deferred (any pending synchronous writes will be replayed once the pool is imported read-write)

Illumos

8 tim 49 min sedan
Something is going on. The following announcement of an announcement appeared at osol-discuss:A number of the community leaders from the OpenSolaris community have
been working quietly together on a new effort called Illumos, and we're
just about ready to fully disclose our work to, and invite the general
participation of, the general public.

We believe that everyone who is interested in OpenSolaris should be
interested in what we have to say, and so we invite the entire
OpenSolaris community to join us for a presentation on at 1PM EDT on
August 3, 2010.

You can find out the full details of how to listen in to our conference,
or attend in person (we will be announcing from New York City) by
visiting http://www.illumos.org/announce (The final details shall be
posted there not later than 1PM EDT Aug 1, 2010.)

We look forward to seeing you there!

- Garrett D'Amore & the rest of the Illumos Cast Really looking forward to this announcement

Bryan Cantrill about his move to Joyent

9 tim 38 min sedan
Bryan wrote an interesting article about his move to Joyent:Add it all up — the history in the cloud space, the disposition to solving tough cloud problems that I want to solve like instrumentation and observability, and the exciting development of node.js — and you have a company in Joyent that I believe could be the next great systems company and I’m deeply honored (and incredibly excited) to be a part of it!

DGC IV: Confluence Upgrades

17 tim 16 sek sedan
This blog post is part of the DevOps Guide to Confluence series. In this chapter of the guide, we’ll have a look at Confluence upgrades.

Confluence Release History and Track Record
I started using Confluence at around version 2.4.4 (released March 2007). A lot has changed since then, mostly for better. In my early days, Atlassian was spitting out one release after another — typically 3 weeks or less apart — followed by a major release every 3 months. You can check out the full release history on their wiki.

This changed later on and recently there have been fewer minor releases and bigger major releases delivered 3.5-4 months. Depending on your point of view this is good or bad. It now takes longer to get awaited features and fixes, but on the other hand the releases are more solid and better tested.

For major releases, Atlassian now usually offers Early Access Program, which gives you access to milestone builds so that you can see and mold the new stuff before it ships.

Contrary to the past, the minor versions have been very stable lately and have contained only bugfixes, so it is generally safe to upgrade without a lot of hesitation.

The same can't be said about major releases. Even though the stability of x.y.0 releases has been dramatically improving lately, I still consider it risky for a big site to upgrade soon after a major release is announced. Wait for the first bugfix release (x.y.1), monitor the bug tracker, knowledge base and forums, and then consider the upgrade.

Having gone through many upgrades myself, I think that it is a good practice to stay up to date with your Confluence site. We have usually been at most one major version behind and frequently on the latest version, but as I mentioned avoiding the x.y.0 releases. This has been working well for us.

Staying in Touch and Getting Support
In order to know what's going on with Confluence releases, it is a good idea to subscribe to the Confluence Announcements mailing list. This is a very low traffic mailing list used for release and security announcements only.

Atlassian's tech writers usually do a good job at creating informative release notes, upgrade notes and security advisories, so be sure to read those for each release (even if you are skipping some).

There are several other channels through which people working on Confluence (plugin) development can communicate and support each other, these include:
Despite Atlassian's claims about their legendary support, I found the official support channel rarely useful. Being a DIY guy and having a reasonable knowledge about Confluence internals, I usually found myself in need of a more qualified support than what the support channel was created for. For this reason my occasional support tickets usually ended up being escalated to the development team, instead of handled by the support team.

On the other hand the public issue tracker has been an invaluable source of information and a great communication tool. I wish that more of my bug reports had been addressed, but for the most part I have been receiving reasonable amount of attention even though sometimes I had to request escalation to have someone look at and fix issues that were critical for us.

The biggest hurdle I've been experiencing with bug fixes and support was that sites of our size are not the main focus for Atlassian and they are not hesitant to be open about it. I often shake my head when I see features of little value (for us that is - because they target small deployments and have little to do with core wiki functionality) being implemented and promoted, but major architectural issues, bugs and highly anticipated features go without attention for years. Just browser the issue tracker and you'll get the idea.

Confluence Upgrades
The core of the upgrade procedure will depend on the build distribution type you use (standalone, war, building from source), but fundamentally in all cases, you need to shut down your Confluence, replace your app (standalone or war) with the new version and then start it again. An automated upgrade process will take care of updating the database schema, rebuilding the search index and other tasks required for a successful upgrade.

That was the good news, the bad news is that there is a lot more work to be done in order to successfully upgrade a site with as little downtime as possible.

Dev and Test Deployments and Testing
Before you upgrade the real thing, you should at first get familiar with the release by upgrading your dev and test environments.

It's often handy to invite your users to do a brief UAT (user acceptance testing) on your test instance as they might catch something that you or your automated tests haven't.

Picking the Outage Window
Based on your users' usage patterns (as easily identified by web analytics solutions like Google Analytics), you should pick a time when the usage is low. For our global site this has been early mornings at around 4:30 or 5am PT.

When it comes to picking a day, we usually stuck with Tuesdays, Wednesday or Thursdays. Nobody wants to be dealing with an issue during a weekend when internal (infrastructure) or external (Atlassian) support is harder to get hold of.

You also want to communicate the planned outage to your users, so that they are not caught by surprise when you announce an outage on a day when they are releasing important documents on the wiki.

As far as outage duration goes, we usually plan for a 30min outage during a 1 hour window and most of the time have been able to bring the site back online within 30min or less.

Ready, Set, Go!
The actual deployment consists of several steps, which in our case are:
  • disabling load balancing for both nodes (which automatically triggers redirection of all requests to a maintenance pages hosted elsewhere)
  • shutting down both nodes
  • disabling MySQL replication between the master and slave db
  • taking ZFS snapshot of the Confluence Home directory
  • taking ZFS snapshot of the MySQL db filesystem on the master
  • deploying the new war file
  • starting one node (while the loadbalancer still ignores it)
  • watching container and Confluence logs for any signs of problems

At this point, we have one of our nodes up and running (hopefully :-)). We can log in with an admin account and check if everything works as expected. The next tasks include:
  • upgrading installed plugins
  • upgrading custom theme (if there is one)
  • running a bunch of automated or manual tests, just to verify that everything is ok

If things are looking good, we can allow the load balancer to start sending requests to our upgraded node. Continue watching logs and eventually deploy the war on the second node and re-enable the MySQL replication.

If any issues occur during the deployment, we can simply:
  • shut down the upgraded node
  • revert to the latest Confluence Home snapshot
  • revert to the latest MySQL db snapshot
  • redeploy the older version of war file
  • either retry the deployment or re-enable load balancer and deal work on resolving the issues outside of production environment

In my experience from all the dev, test and prod deployments, we've had to roll back and redo an upgrade from scratch only once or twice. It's very unlikely that you'll have to do it, but it's better to be ready than sorry.

If you are building Confluence from patched sources and deploy your own builds frequently, then you might want to consider automating your deployments with tools like Capistrano. This will save you a lot of time and make the deployments more reliable and consistent.

Conclusion
If you do your homework, Confluence is quite easy to upgrade. It's unfortunate that the entire cluster must be shut down for an upgrade even between minor releases, but if you plan your deployment well, you will be able to minimize the downtime to just a few minutes outside of peak hours.

In the next chapter of this guide, we'll take a look at patching and customizing Confluence.

Multi-Core or Hyper-Threaded? Or Both?

22 tim 45 min sedan

Recently a question was posed to the Sun Ray User Community:  Intel or AMD for Linux Sun Ray Server?

Ford vs Chevy!  Coke vs Pepsi!  What a great blog topic for a Friday.

You could choose from dual Intel 6 core X5670 2.93 GHz ("Nehalem") or dual AMD Opteron 12 core 6168 1.9 GHz ("Magny-Cours")

Of course the "I love my job and I'd really like to keep it" answer would be Intel since Oracle does not offer any servers based on the 12 core Opteron (just 8 core models).  But let's throw caution to the wind and think about this in the context of what a Sun Ray Server in "traditional" mode (i.e. not kiosk mode) really is.  It's a desktop.  Unlike kiosk mode where normally applications execute "somewhere else" (i.e. terminal server, a VM, etc) in traditional mode the applications execute on the Sun Ray Server.  Unlike a desktop, it's multi-user.

So while you definitely want something "server class", you also want something that is going to run *your* applications at the best price/performance ratio.

At the end of the day, both options offer 24 threads.   The Intel solution does so by offering 6 Hyper-Threaded Technology (HTT) cores per socket and the AMD by offering 12 single threaded cores per socket.  There's a 1 GHz clock speed difference favoring the Intel solution, but let's not fall prey to the "megahertz myth".  Not just yet anyways.

While you can go out there and find all kinds of "Bench This", "Spec That" types of reviews, those tests are generally written to take the most advantage out of any platform.  However, most of the end user applications we all use aren't.

So, which design is better for "desktop applications", Intel with HTT or AMD with all those glorious physical cores? 

Here's I get to use the most popular, catch all answer of all-time when it comes to any Server Based Computing or VDI question. 

It depends. 

It depends on the applications.  Doesn't everything?

Recent history would indicate that desktop applications prefer the multiple cores over HTT.  Or perhaps better stated, the developers of those applications may prefer multi-core development (or at least find it easier).  

Remember that Pentium HTT ("Northwood") actually was replaced on the desktop in favor of  multi-core processors (see CoreDuo). In a traditional Sun Ray environment where a variety of "desktop applications" execute on the Sun Ray Server, understanding some of the possible reasons HTT was replaced by multi-core is interesting, if not important.

When HTT was introduced, most desktop applications simply weren't able to take advantage of the it.  Add to that, the HTT chips actually consumed a lot more power.  End result was a system that increased your energy costs while decreasing your application's performance.  Explain that one to your boss, Mr Technology influencer.   Especially with "all those CPUs" showing up in mpstat or perfmon.

None of that of course was the fault of the technology, well the power was, but not the bad performance or the misconception of threads as physical processor that sits in a socket.  Truthfully our traditional performance monitoring tools still promote that misconception.  The performance was due to applications not taking advantage of the HTT and it being on a single core.  Didn't it seem like around 2004-05, the catch-all response to all desktop application performance queries was: "Pentium 4, you say?  Did you try disabling Hyper-Threading in the BIOS?"

With Nehalem, Intel put all that bad PR behind them and brought HTT back to the desktop, but with a twist, it's also multi-core.  

This is different, but is it better?  Maybe.  Maybe not.  Probably, but...it depends.  (Ha!)

We know that the OSes are better equipped for HTT (i.e. Solaris is now optimized for it along with a million other things), and they actually don't consume that much more power, so they are "greener".  Goodness for the data center.

From my experience, I'd say both Sun Ray Software and the Oracle VDI stack performs better with HTT (based on sizing numbers in kiosk mode and per core VM sizing data under Solaris) than they did under the non-HTT models of those chips.  Considerably better, all other things being equal (clock speed, # of cores, etc).  But those aren't typically considered "desktop applications", they are more in the realm of pseudo-operating systems, or at least "Server Systems".  Both of which have been HTT aware for a long time, but that doesn't exactly help *your* application.  Which leads us to the million dollar question:

How many of the applications that you use today are parallelized so they can execute across multiple threads simultaneously (i.e. HTT aware)?  If the answer is "very few" then you're not taking advantage of the Intel design and the physical cores on AMD solution may actually perform better for your apps even with the "lower clock speed". 

Making applications multi-core aware is fairly easy (says the non-programming "developer"), and most existing applications already support this.  However adding HTT capabilities to existing applications is considerably far more work.  And sure, there are those that will say that HTT can help certain multi-core aware applications depending on what they are doing. Though I think a lot of these arguments mistake multi-threading for Hyper-Threading, which in fact is simultaneous multi-threading.

But really, to get the most out of HTT, you need to code your applications a certain way.  Intel has guides, and all kinds of tools to aid the application developer get the most out of HTT.  But what if those aren't used?

In a single user use case, the average person might never know the applications they are using aren't taking advantage of HTT technology because of the multi-core and relatively high clock rate. The HTT multi-core becomes a Swiss Army Knife so to speak.  If your app can take advantage of HTT, great.  If it can't, we've got cores.  And on top of that we have speed!  That's beautiful for a PC.  A single user PC.

But how well does it scale out when we are talking about multiple users running those "non-HTT aware" apps on the same server? In the AMD design, multi-core (but non-HTT aware) apps have 24 "physical" cores to work with, what's the trade off of the "virtual" cores on the HTT chips?  Is the clock speed enough to overcome? The other features on HTT chips enough to tip the scales? Maybe.  Probably.  It depends.

If you were running Sun Ray Server Software in Kiosk mode or choosing a server to be the hypervisor for Oracle VDI, go with Intel and their HTT "Nehalem" processors.  You won't be disappointed.  At least I haven't been.  I'm sure I'd also have a lot of good things to say about the AMD as well.

But if you are actually running desktop apps on the Sun Ray Server, and trying to do so at any kind of scale, I'd say it's at least worth doing some investigating and maybe even some application testing at scale.  Then you can really understand what's the best fit for your environment.

GlassFish 3.1 Milestone 3 - Admin console can now speak cluster!

fre, 2010-07-30 21:56

The GlassFish admin console is often cited as one of the strong point for GlassFish. Yes, open source and ease-of-us can live happily together! After delivering clustering and centralized admin features in Milestones 1 (post) and 2 (post) of the ongoing 3.1 work, it was time in Milestone 3 to deliver the first drop of a graphical user interface that is able to interact with these features.

The following is a short screencast (hosted on the relocated "GlassFish Channel" property) walking you through the basic scenario of creating a cluster, populating it with instances, starting the cluster, deploying an application to various targets (cluster or standalone instances) and closing with a short part on monitoring the system - all from the Graphical User Interface :

The GlassFish Open Source Edition 3.1 Promoted Builds are available from http://download.java.net/glassfish/3.1/promoted/ and numerous details are offered on the GlassFish Wiki for this milestone and the upcoming work.

Check out youtube.com/user/GlassFishVideos for more videos soon.

More Than Just List

fre, 2010-07-30 21:06

I've noticed that a lot of people who use the java.util Collections classes in their APIs will most often use List. Sometimes to the exclusion of the other Collections types. The other core Collections classes Set, Map and Collection are under-represented in most public Java APIs.

In the APIs I've written, especially since the introduction of generics, I try to use the other types more liberally. Why not just use List? To me List implies "ordered" and "duplicates allowed". If ordering isn't a relevant characteristic for the values I'll use Collection. If I wish to indicate that duplicate elements are not allowed I'll use Set or for cases when order does matter, SortedSet. When the collection

Using the right Collection type provides the API user a good hint as to key characteristics of the collection data. Using only List in APIs makes those characteristics less obvious and may lead to mis-use or abuse of APIs.

Dallas Tech Fest 2010 Trip Report

fre, 2010-07-30 20:41

Oracle was a gold sponsor of Dallas Tech Fest 2010 - 9 parallel tracks and 5 sessions/track.

I gave a 3-hour hands-on workshop on Java EE 6, GlassFish, and NetBeans. The room was packed during the first part (about 60 or so) and most of the audience retained for second part. The workshop explained several advantages of Java EE 6 such as simplicity, ease-of-use, and richness of the platform. These concepts were demonstrated using multiple coding sessions involving several technologies from the platform. Specifically it showed:

  • Creating simple Java EE 6 application using JSP, Servlets 3.0, Enterprise Java Beans 3.1
  • Facelets-based page creation using Java Server Faces 2
  • Contexts & Dependency Injection with Java Server Faces 2
  • Accessing database table using Java Persistence API 2
  • RESTful Web services using JAX-RS

NetBeans IDE specific features like Deploy-on-Save and Session-Preservation features that boosts your development productivity were also demonstrated using code samples.

The slides are now available:

Java EE 6 Hands-on Workshop at Dallas Tech Fest 2010

Watch Tim Rayburn (one of the conference organizers) talks about where Dallas Tech Fest is today and how they like to evolve it for next year:

One of the attendees mentioned after the workshop that it was like "drinking from the fire hose". The content could be overwhelming for users who are not familiar with NetBeans and new to Java EE 6. As repeated multiple times during the workshop, all the code shown in the workshop is clearly explained in screencast #30 (also in-lined below).

Feel free to re-run the workshop at your own pace and convenience. And you can always post any comment on this blog or GlassFish forum for question and/or clarifications.

Check out some pictures from the event:

On a personal front, met a few avid readers of my blog, connected with some Oracle folks, and barely met Ted Neward. I found Texas very humid and hot early in the morning for me. But still managed to squeeze in a 10K run, at a much slower pace:


One of the advantage of staying at Westin hotels is that they have a running map close to their hotel. And if not then their "Westin Workout" gyms are typically well equipped - even a stepper and exercise ball ;-) And a complimentary upgrade to United First both ways on both the legs certainly added to the overall great experience.

Thanks Erik & Tim for providing me the opportunity to speak, I definitely look forward to participating next year!

And here is the complete photo album below:

Technorati: conf dallastechfest dallas glassfish javaee6 netbeans


Dallas Tech Fest 2010 Trip Report

fre, 2010-07-30 20:41

Oracle was a gold sponsor of Dallas Tech Fest 2010 - 9 parallel tracks and 5 sessions/track.

I gave a 3-hour hands-on workshop on Java EE 6, GlassFish, and NetBeans. The room was packed during the first part (about 60 or so) and most of the audience retained for second part. The workshop explained several advantages of Java EE 6 such as simplicity, ease-of-use, and richness of the platform. These concepts were demonstrated using multiple coding sessions involving several technologies from the platform. Specifically it showed:

  • Creating simple Java EE 6 application using JSP, Servlets 3.0, Enterprise Java Beans 3.1
  • Facelets-based page creation using Java Server Faces 2
  • Contexts & Dependency Injection with Java Server Faces 2
  • Accessing database table using Java Persistence API 2
  • RESTful Web services using JAX-RS

NetBeans IDE specific features like Deploy-on-Save and Session-Preservation features that boosts your development productivity were also demonstrated using code samples.

The slides are now available:

Java EE 6 Hands-on Workshop at Dallas Tech Fest 2010

Watch Tim Rayburn (one of the conference organizers) talks about where Dallas Tech Fest is today and how they like to evolve it for next year:

One of the attendees mentioned after the workshop that it was like "drinking from the fire hose". The content could be overwhelming for users who are not familiar with NetBeans and new to Java EE 6. As repeated multiple times during the workshop, all the code shown in the workshop is clearly explained in screencast #30 (also in-lined below).

Feel free to re-run the workshop at your own pace and convenience. And you can always post any comment on this blog or GlassFish forum for question and/or clarifications.

Check out some pictures from the event:

On a personal front, met a few avid readers of my blog, connected with some Oracle folks, and barely met Ted Neward. I found Texas very humid and hot early in the morning for me. But still managed to squeeze in a 10K run, at a much slower pace:


One of the advantage of staying at Westin hotels is that they have a running map close to their hotel. And if not then their "Westin Workout" gyms are typically well equipped - even a stepper and exercise ball ;-) And a complimentary upgrade to United First both ways on both the legs certainly added to the overall great experience.

Thanks Erik & Tim for providing me the opportunity to speak, I definitely look forward to participating next year!

And here is the complete photo album below:

Technorati: conf dallastechfest dallas glassfish javaee6 netbeans

Award Nominations and Bocce Ball in the Park

fre, 2010-07-30 17:21

Just a reminder that the JCP is calling for Award Nominations.  This year will be the 8th annual awards, celebrating excellence in the JCP program membership, JSR development, innovation and leadership.  Please nomination via the board or send email to either heather at jcp dot org or the PMO-- pmo at jcp dot org today!  The nominations will be up for an Executive Committee vote and the awards will be presented during JavaOne this September. 

For fun, I thought I would share some images from the the JCP program office potluck last week.  We enjoyed a picnic lunch and played bocce ball in the park.   

Patrick, Max and Harold observing 

 Joe measuring distance

The PMO:  Harold, Joe, Max, Patrick, Heather

A fine time enjoyed by all :-). 

DGC III: Confluence Configuration and Tuning

fre, 2010-07-30 15:12
This blog post is part of the DevOps Guide to Confluence series. In this chapter of the guide, we’ll have a look at Confluence configuration and tuning.

There are four ways how one can modify Confluence's runtime behavior:
  • Config Files in Confluence Home directory
  • Config Files in WEB-INF/classes
  • JVM Options
  • Admin UI

Config Files in Confluence Home directory
Confluence Home directory contains one or more config files that control runtime behavior of Confluence. The most important file is confluence.cfg.xml that must be present in order for Confluence to start. This file can be modified by hand while confluence is shut down, but also gets modified by Confluence occasionally (mostly during upgrades). Your changes will be preserved, as long as you made them while Confluence was offline.

Another relevant file is tangosol-coherence-override.xml which must unfortunately be used to override Confluence’s lame multicast configuration needed for cluster configuration (see below).

Lastly there is config/confluence-coherence-cache-config-clustered.xml which contains configuration of the Confluence cache. Generally you don't want to modify this file by hand. I’ll come back to talk about cache configuration later in the Admin UI section of this chapter.

In general it is advisable to be very consistent about your environment, so that you can then just have a single version of these files that you can distribute on all servers when needed. This includes the directory layout, network interface names, and so on.

A combination of the first two files will allow you to configure the following:

Clustering
As I mentioned, this configuration is split between two config files. confluence.cfg.xml contains confluence.cluster.* properties, which allow you to set multicast IP, interface and TTL, but not the port. Only tangosol-coherence-override.xml can do that.

The cluster IP is by default derived from a "cluster name" specified via the Admin UI or installation wizard. For some reason Atlassian believes that in an enterprise environment one can just let a software pick a random IP and port to run multicast on. I don’t know about any serious datacenter where things work this way. You’ll likely want to explicitly set IP, port, interface name and TTL and the only way to do that is by modifying these files by hand and ignoring the "cluster name" setting in the UI. Make sure that settings are consistent in both files.

DB Connection Pool
Confluence comes with an embedded connection pool. I believe that you can use your own too (if it comes with your servlet container), but I’d suggest sticking with the embedded one since it is widely used and Atlassian runs their tests with it also. The pool is configured via confluence.cfg.xml and its hibernate.c3p0.* properties. The most important property is pool max_size which will prevent the pool from opening more than a defined number of connections at a time. You want this number to be higher than your typical peak concurrent request count (are you monitoring that?), but not higher than what your db can handle. We have ours set to 300, which is double of our occasional peaks. Don’t forget that in order to take advantage of these connections, you’ll likely need to also increase the worker thread count in your servlet container.

DB Connection
The connection is configured via hibernate.connection.* properties in confluence.cfg.xml. Depending on your db, you might need to specify several settings for the connection to work well and grok UTF-8. For our MySQL db, we need to set the connection url to something likejdbc:mysql://server:3306/wikisdb?autoReconnect=true&useUnicode=true&characterEncoding=utf8
Note that if you are editing this file by hand, you must escape illegal xml characters. More info about db connection can be found in the Confluence documentation.

Config Files in WEB-INF/classes
Just a side note: if you are building confluence from source then these files can be found at confluence/confluence-project/conf-webapp/src/main/resources/.

These files are the most cumbersome to work with because you need to apply your changes to them after each upgrade. I'll describe how we use our automated patching machinery to do this in the future chapter of this guide. For now let's just go over the available config files and what you can change here.

atlassian-user.xml - used to configure user provisioning, e.g. LDAP. For more info read the docs.

confluence-init.properties - this file allows you to specify the path to Confluence Home directory. There is a better way to set this; see the JVM Options section below.

log4j.properties - modify logging preferences, this can also be done via the UI, but AFAIK the changes are not preserved after restart or upgrade.

seraph-config.xml - controls authentication framework. You'll likely need to modify this file if you have a custom authenticator and login page.

I should note that there are many other (usually xml) configuration files bundled with individual jars in WEB-INF/lib, but those rarely need to be modified.

JVM Options
Another way to configure certain settings is via JVM options. From the complete list of recognized options these are the ones we use:

-Dcom.atlassian.user.experimentalMapping=true - this is a critically important setting for us with 180k users. Without it, our cluster panics due to data overload (CONF-12319), unfortunately despite Atlassian’s claims that this experimental feature is production ready, it got broken soon after release, and then again recently, so you’ll have to patch atlassian-user module to get it to work.

-Dconfluence.disable.peopledirectory.anonymous=true - for big public deployments the people directory is a privacy risk and generally useless for anonymous users, we have it disabled for anonymous users.

-Dconfluence.disable.mailpolling=true - early on we decided that we don’t want people to build up mail archives on our site. While the feature is useful for small internal wikis, it’s too much of a risk with little reward to provide it on a public wiki. Unfortunately, this option only disables mail fetching. The UI for setting up mail archives will still be present in the wiki; you'll have to patch Confluence to remove it.

I didn't learn about -Dconfluence.home until recently. I would much prefer to use it than to mess with confluence-init.properties file in WEB-INF/classes.

Admin UI
Most of the Confluence settings can be configured via Confluence admin interface. The downside is that the configuration is not being versioned, and there is no easy way see diffs and to roll back unless you want to hack the db and replace data from backups. With that in mind lets look at the most important settings.

General Configuration
Server Base Url - make sure this is set up correctly, otherwise confluence and its plugins won’t work properly.

Users see Rich Text Editor by default - we have this set to off. In the past many RTE bugs were causing headaches to our writers especially those who did lots of editing. In Confluence 3.2 and 3.3 the editor has improved a lot and it might be the time for us to reconsider this decision.

CamelCase Links - this used to be one of THE wiki features in general a few years ago, but as wikis have matured and people started creating more and more content, the automatic linking started to cause more problems than help. We have it off.

Threaded Comments - very useful; make sure it’s on.

Remote API (XML-RPC & SOAP) - we have ours on, but I patched the remote api code to restrict access to it.

Compress HTTP Responses - OMG please turn this on if is isn't already. It’s a major performance booster. Alternatively you might want to do the compression in your webserver as Tim pointed out in comments below.

JavaScript served in header - we have this on, but for better performance it should be off. Unfortunately that breaks many plugins and legacy code that uses obtrusive javascript. Since this option has been around for a while, it might be worth it to just set it to off and deal with the remaining broken things as they are identified.

User email visibility - we have this set to visible to admins only, but our power users found it too be a collaboration barrier so I patched the code and made emails visible to our global employees group in addition to the admin group. It would be nice if confluence allowed such a configuration out the of box.

Anonymous Access to Remote API - No sane person will leave this on. If I were in charge, I would go as far as removing it from Confluence product.

Anti XSS Mode - This is a very handy feature. Not 100% bulletproof, but it helped to significantly decrease the number of XSS exploits in Confluence since its introduction.

Attachment Maximum Size (B) - I mentioned this one already in the first chapter when discussing the db configuration. If you are running a cluster (or think that you will eventually run it), set this to some low value. Ours is 5MB.

Connection Timeouts - these options are pretty handy when you have lots of feed macros, gadgets and other plugins that pull contet from remote sites. In order to prevent worker thread pileup in your servlet container don’t go beyond the default 10sec (which is already pretty high).

Daily Backup Administration
As I previously mentioned, this backup feature is useless for anything but tiny sites. Disable it.

Manage Referrers
Collecting referrers is ok, but don’t display them publicly if you run a site on the Internet. Otherwise you run a risk of exposing some internal only URIs that might contain confidential information.

Languages
Most of our documentation and content is written in American English, but unfortunately Atlassian doesn’t provide such a language pack. I just patch the default Australian English pack to get a US English pack. It works great and is almost no hassle to maintain.

User macros
I discourage their use in enterprise environement. The lack of versioning, automated testing and documentation makes them a nightmare to maintain. Just create Confluence plugins for everything you need.

PDF Export Language Support
This is a tricky one. It took us quite a while to find the right single font that could be used to generate PDFs in almost all languages. Finally we found soui_zhs.ttf, which is distributed with OpenOffice. It’s a huge file, but it works like charm for all kinds of non-wester languages.

Themes
For reasons I’ll discuss later, we disabled all the themes except for our custom one, which is the global and default space theme. To disable a theme you have to go to plugins view and disable the appropriate theme plugins.

Cache Statistics
The name of this section in the UI is misleading, because not only can you view cache statistics here, but more importantly you can fully control the cache size via the UI. And in this case, I’m really glad that there is a UI to manage the cache config xml file, which due to its size is really hard to work with by hand. The changes you make via the UI are persisted in the Confluence Home directory and propagated thought the cluster.

Out of all the things you can tune via the admin UI, the cache tuning will have the biggest impact on your site’s performance. Confluence ships with cache settings optimized for smaller sites, so increasing the cache size is unavoidable for larger deployments.

Tuning the cache settings is a time-consuming process because you need to balance the memory consumption with performance improvements. Usually I revisit the cache stats once a month and look for caches that are performing badly because the number of objects allowed in that particular cache is low. Confluence caching system is composed of many caches that are controlled via this UI.

The best indicator of an overflowing cache is when the "Effectiveness" value is low (under 70-80%) AND “Percent Used” value is high (over 80%) AND usually the “Expired” value will be relatively high compared to “Hit” value in the same cell. This means that Confluence needs to go to the DB too often, even though it could cache the data in memory if the cache was bigger.

If you don’t understand what all the cache names and numbers mean, don’t worry about that too much. As long as you don’t make any dramatic changes too quickly and you monitor your JVM heap usage, you can’t break anything.

As you increase the cache sized, you’ll eventually start running out of heap space. That’s why you need to monitor the JVM and increase the -Xmx value as needed. If the number of concurrent users increases, you might also need to slightly increase the -Xmn value (see the JVM Tuning chapter for more info).

I wish Atlassian would provide better descriptions for all the available caches, because unless you know Confluence internals well, you won’t know what you are doing and that doesn’t feel good. Additionally, I’d like to see a way to limit memory usage, not the number of objects, because their size varies. Ideally, I'd really like to be able to just say "Use 3GB of memory for cache and distribute it in the most efficient way. Oh and let me know if you need more or less memory to work effectively". It would be better if Atlassian moved away from an in-process cache which in my opinion is not a good fit for Confluence. Maybe we'll get there one day.

Plugins
This section of the Admin UI is where you can install, uninstall, enable and disable plugins and their modules. There is also a Plugin Repository which additionally allows you to install plugins from Altassian’s remote servers or user specified URIs. The recently released Atlassian Universal Plugin Manager will eventually replace the latter one (or both?), I’m glad to see that happening.

I suggest that you disable plugins that you don’t use or don’t want your users to use as soon as possible. We disabled all the bundled themes because we wanted to provide users with only one custom theme developed and maintained by us (I’ll explain the reasoning in a future chapter). For security reasons thehtml and html-include macros should in my opinion be disabled on all but family Confluence deployments. And for performance reasons Confluence Usage Stats plugin is not suitable for any bigger deployments.

Plugin installation is very easy to do. That’s both good and bad. The plugin framework provided by Confluence is a very sophisticated piece of software which allows you to install and uninstall plugins on the fly without any need to restart the server. Need to quickly install a fixed version of a buggy plugin without disturbing hundreds or thousands of users that are currently using your site? Done. That’s how easy it is.

On the other hand, it is tempting to install plugins just because they have cool names or promise great features. You can do that in your dev or test environment, but in production you should only install plugins that you picked after some serious consideration.

This is what I look for when deciding whether to install a plugin or not:
  • was the functionality provided by the plugin requested by larger group of users or is the plugin needed for site administration purposes?
  • was the plugin developed and tested in-house, if no is it supported by Atlassian, if no can we or some respectable Atlassian partner support it should there be some problems?
  • is the plugin compatible with our confluence version? does it have a track record of being compatible or was it made compatible with new Confluence versions as they were released?
  • are there no major unresolved bugs in the areas of performance, scalability, data integrity and security?
  • does the plugin have an automated test suite with good test coverage?

If you answer “yes” to all of these questions, then you may go ahead do a trial before installing the plugin in production. Otherwise, you might provide your feedback to the plugin authors and wait if the pending issues get resolved before proceeding.

I don’t want to be harsh, but especially 2-3 years ago most of the plugins created for Confluence were crap. But as the platform matures, and Atlassian partners get involved more, the quality of available plugins has been slowly increasing. The main issue that I see is that the existing plugins are not developed and tested with large scale deployments in mind. Hopefully things will change as more and more deployments grow beyond small and medium sites. It’s unfortunate that even some commercial plugins, suffer from the very same issues that plague plugins created by bunch of volunteers and enthusiast. So pick your plugins carefully, do a trial, check for unresolved bugs and existing user complaints, and then decide.

I've been reasonably active in the Atlassian development community and from these interactions, I'd like to highlight the work done by Dan Hardiker (Adaptavist) and Roberto Dominguez (Comalatech). And though I haven't worked with guys from CustomWare, they are also considered to be pretty sharp.

Be especially careful with plugins that provide new macros for the wiki content. Once you install such a plugin you won't be able to uninstall it without breaking wiki pages until all the references to that macro are removed (with tens of thousands of pages and no ability to track the references this might be a big challenge).

In general however, try to keep the number of plugins low. It’s better for performance and you won’t get in trouble as often when you need to upgrade Confluence but some of the plugins you use are not compatible with the new Confluence version.

Conclusion
You should now have a good idea about how to configure Confluence and where this configuration is done. In the next chapters we'll look at upgrading Confluence, patching and more.

A large SMF/FMA putback to Opensolaris

fre, 2010-07-30 09:25
Last night a really large putback found it's way into the codebase: It centers around the Fault Management Architecture of Solaris. A lot of new features found their way in Opensolaris. For example smtp-notify:smtp-notify is a small, lightweight daemon that is fully managed by the Service Management Facility (SMF). It uses the interfaces delivered in libfmevent to subscribe to both software and FMA problem lifecycle events. Upon receipt of an event, it produces an email notification based on a set of notification preferences which are stored in the SMF service configuration repository. or snmp-notifysnmp-notify will generate one of two types of SNMP traps, based on the event class. For FMA events (list.* events), snmp-notify will generate the existing sunFmProblemTrap trap as defined in /etc/net-snmp/snmp/mibs/SUN-FM-MIB.mib. For software events (swevent.*), snmp-notify will generate a sunSweventTrap trap as defined in /etc/net-snmp/snmp/mibs/SUN-SWEVENT-MIB.mib.Furthermore there are lot of other changes, for example a way to configure the notifications in the SMF repository or the definition of FMA events for instance state transitions (eg. a service going online or offline or in maintainance state).

As SMF is running in each zone on it's own, the fmd (the fault management daemon) had to be enabled to run in non-global zones as well. This was introduced by PSARC 2010/225:This RFE will enable fmd and associated SMF services in a non-global Solaris zone. The set of fmd plugins delivered into a zone will be restricted to those that are relevant - the hardware-oriented modules, particularly, will not be delivered in a non-global zone.

The first consumer of the fmd service in a non-global Solaris zone will be SMF; this will work with fmd to achieve snmp and email notifications of instance state transitions.

Happy SysAdminDay

fre, 2010-07-30 07:44
The world celebrates the 11th SysAdminDay today. To all the users out there: This is a nice opportunity to thank your admin staff for doing migrations at night to keep your business running at day, for making seemingly impossible requirements possible, for keeping the systems up and running, for answering your questions, that would be able to answer yourself with google in the time you need to pick up the phone

And keep in mind, your admin would do everything for you to restore the operations of your datacenter:

Oracle Premier Support for Operating Systems on non-Oracle hardware

fre, 2010-07-30 05:12
Yesterdays announcement regarding support subscriptions on HP and Dell x86 system has a much broader background. In this article i want to share this information with you:

  • Albeit the recent announcement just talked about HP and Dell, this offer is valid for all certified systems on the HCL. There are 508 server systems right now on the HCL. You will find the HCL here. If the systems isn't on the HCL, ask your vendor to certify it. So you are able to buy support and for Fujitsu Systems as well as for IBM systems for example.
  • Pricing is socket-based. You pay $1000 for Oracle Premier Support for Operating Systems per year per socket for systems with one to four sockets and $2000 per socket per year for systems with more than five sockets.
  • As most of my readers are interested in Solaris, i think i know the next question: As you may know, the Software License Agreement for Solaris for example just allows you to use Solaris 10 up to 90 days in production use without entitlement. This support offering provides you a non-perpetual entitlement to run Solaris 10 on a non-Oracle HW. To say it simple: It's a subscription. This entitlement is valid for the same period as your support. When you don't renew the support, you don't have an entitlement to use Solaris in production after your support expires.
  • The support is provided directly from Oracle.
  • The rules with Sun servers are different: You get a bundled perpetual license with your server and pay only 8% (SW) or 12% (SW+HW) of the net systems price for support.
  • Albeit you have now the choice to run Solaris on non-Oracle x86 servers, i see several advantages to use Solaris on Sun equipment. Those advantages range from "Same vendor of OS and HW" to "Better integration of the system into the Fault Management Architecture of Solaris"

(Safe Harbour: Keep in mind, that this blog is a private one. I don't write in my function as a Oracle employee. The interpretation of the rules expressed by an Oracle representant is the authoritative one)