Monday, March 31, 2014

Troubleshoot the SDL Tridion Deployer Service - Part 1

The Deployer Service is step 3, the finalizing step in the SDL Tridion publishing process.
In working with the Deployer, I've come across different reasons something could go sideways, and so in time, several best practices, a certain analysis pattern and some specific areas of interest began to emerge in the way I look at an issue here. 

The purpose of the Deployer service is to complete the publishing process by taking the transport package available in a specified location, usually known as the "incoming" folder, and processing its instructions.  This is done in accordance with the deployer's configuration (a combination of deployment information and data storage options).

It's hard to pinpoint where I begin, hence the brainstorming below.  Typically if there is a specific error I consider that the starting line, and so the next logical step takes me over to the logs, but more often than not, I feel better equipped to work on figuring out an issue if I know the architecture of the system involved.  It also helps to pay close attention to the "behavior" exhibited.  
Time and time again the first question that surfaces especially in situations where seemingly unexplainable ghosting behavior is observed becomes about the number, type and setup of deployers.  These have actually become my favorites: explain the unexplainable!

As I begin the analysis, I make sure I have the following 3 things: 
- the item id
- the publish queue transaction id I will need to trace along
- the timeframe in which the issue occured
.. and of course, as much clarity as I can muster about the symptoms experienced.

1. Are there 2+ (multiple) deployers referencing the same incoming folder - the behavior encountered is somewhat mystical in nature - records will not arrive at their destinations, transactions will not show up in the logs, and so content will not be available on the respective websites.
This can happen should 2+ deployers be configured to monitor the same incoming folder, a racing condition can occur, where the two deployers begin to compete and pickup transports depending on availability.

2. Are licenses correct - validate the path specified for configuration files, and if necessary, remember to request licenses ahead of time from SDL Customer Support.

3. Are there multiple destinations in the target - this is important to be aware of proper permissions and correct paths or urls. 

4. Is there a "rogue" deployer running somewhere - another mystical behavior, the clue here is to watch if transactions just "disappear" and don't show up in known logs at all, with some other deployer running and picking up jobs.  An interesting place to check in this case is the QUEUE_CONSUMERS table, look for machines activated which should not be. Deactivate what should not be in use anymore.  Note the deployer itself is not registered in the QUEUE_CONSUMERS table, the reason this is mentioned is to serve as a hint for what machines may possibly be carrying it which should not.  It is possible to have a deployer installed and running from a Content Manager server, although the preferred setup for scalability and performance is to decouple Content Delivery from Content Management.

5. Check what hotfixes are installed - perhaps the error message is a clear indication of an issue already fixed, an excellent source for this is available on http://www.sdltridionworld.com, I login and visit the Support section to check what hotfixes are posted and/or have been installed.

6. Check cd_core and cd_deployer logs - I look here to trace a specific transaction or item, best bet is to put these in DEBUG or TRACE mode, however beware logs will grow quite rapidly depending on the volume of data published.  Also search for "ERROR"s or "WARN"ings. 

7. Are there jars missing - probably one of the lengthier checks, but this can be scripted for comparison against a working site or the installation folders.  Typically the errors for this will be "Class not found", or "java.lang.NullPointerException", and so I follow the stacktrace to determine what call may have originated the issue and therefore what is missing.

8. Is the storage configured correctly -  beware what is defaulted and what becomes inherited.  Beware what is going to the file system or to the database.  Beware what is cached.  Beware the database connections information is up to date and test connections are working.

9. Is the cd_deployer_conf.xml correct - is the xml formatted correctly, are locations and paths correct?

10. Setup hibernate logging - If errors seemingly database related  are seen, this logging can trace most calls to the database.  This type of logging comes with a high warning: information will be logged quickly so use it to replicate the issue fast and when ready, turn off promptly.  Check Tridion StackExchange for steps on how to enable it and detect what hibernate entries look like.


By now, I may have solved the issue, or otherwise I am in possession of quite some good information and ready to perform more indepth analysis.  If I still cannot determine root cause or fix the issue, I continue my research further to increase the information collected.
In Part 2 of the article, I follow on with more clues, tips and fixes.

1 comment:

  1. Nice tips, Elena! The race conditions and rogue deployer scenarios would not be obvious for me, especially when focusing only a specific setup assuming everything else is okay. :-)

    ReplyDelete