Tuesday, October 8, 2013

Under the Magnifying Glass - Cache Channel Service Troubleshooting Steps

Lately I found myself spending a lot more time troubleshooting the Cache Channel Service, and learning more and more about it. Funny how the more I fixed different issues, the more I started liking it!  
In general, most issues seemed to revolve around connections between the three players involved: the Cache Channel Service, the Deployer and the Website.

So let's track them one by one and place them under the magnifying glass to discover what went wrong.

  • First off, be clear on what type of CCS is installed, Windows service or Java process? Then check documentation for configuration and correct installation steps
  • Is CCS running?  Test it with telnet.
    • Telnet 127.0.0.1 1099
    • If the service is running, you should get a blank screen
    • If the service is not running: you will likely see the following "Could not open connection to the host, on port 1099: Connect failed"
    • Always restart the Cache Channel service before restarting the Broker / Deployer

    • Is CCS communicating on the same port as the Deployer and Website Cache?  Think of it as people trying to call each other on different phone numbers!  If it doesn't match, that call is not going to happen..
      • In Windows, check that the registry key HKEY_LOCAL_MACHINE->SYSTEM->CurrentControlSet->Service->TCDCacheService shows the port, otherwise, the CCS is running on default port 1099 
    • Is CCS running in a 64-bit JVM (as it is a 64 bit windows service)
    • What about that RMIChannel setting?  In principle, in RMI, the first party, in our case the Cache Channel Service will open a socket to listen to incoming requests on a specific port.  If a request is received (such as from the Deployer/Website), a different port is used to initiate and respond to it.  This other port is typically chosen by the underlying operating system from a fairly large range available, which may be problematic in environments with firewalls holding a tight control on what ports are made available*.  It is possible however to exert tighter control on this, by using a combination of configuration settings and programming applied when the (web) application starts.
      • Set <RMIChannel Port="xyzt"> on all Tridion configuration files where the setting is available
      • In Java, set com.tridion.util.TridionRMISocketFactory.setRMIChannelListener(listenerPort);
      • In .NET, set Com.Tridion.Util.TridionRMISocketFactory.SetRMIChannelListener(listenerPort)
    • Check firewalls, windows and custom software, are any ports blocked by firewalls?
    • Restart CCS - simple but may be just as effective in applying recent configuration changes.  If running it as a Windows service, stop and start the service.  If running it as a process, try the following** (change CCS path if yours is different):
      • > cd opt/tridion/cds/common/scripts
      • > ls -la
      • > ./start_ccs.sh stop
      • > ./start_ccs.sh start
      • > ./start_ccs.sh status
    • Try using 'localhost' if all parties are on the same machine
    • Check cd_core logs from Deployer, web application, CCS (if running in its own jvm and therefore creating own logs)
    • In logs, check communication from each party to see if the Deployer wrote notifications as well as if the application received notifications, good reference on SDL Tridion World***
    • Run netstat -a from the command prompt to get a list of all active connections the computer is listening too (including TCP and UDP ports) then check if the ports you are using are open
    • Google the errors seen in the logs, there may be explanations outside of Tridion!
    • Check other Content Delivery settings in the documentation to take advantage of some pretty nifty settings the cache is capable of like the "FlushCacheDuringDisconnectInterval" which can be used to control the behavior of the cache even when disconnected (keep items in cache or flush on first disconnect)
    By now, I hope the magnifying glass was put away as one of the checks above helped restore caching to its duty, but if not, there is a entire armada of magnifying glasses on tridion.stackexchange.com.  Try posting your special circumstances there!

    Cheers, and go have fun tracing!


    http://www.netcluesoft.com/rmi-through-a-firewall.html
    ** The Tridion LiveDoc will tell you how you can create your own scripts to start/stop CCS
    *** http://www.sdltridionworld.com/articles/sdltridion2011/analyzing_object_cache.aspx