Feature Switches, Inheritance and Agile with Scala & JMX on the JVM

By William Narmontas, Sep 30, 2016

Remember last time a critical service in production worked incorrectly and you reverted your changes, or restarted the services for new configuration? Neither do I.

In one of my recent contracts, 10 days before production deployment I began to work on a feature that would actually take 2–3 months. The management had not accounted for the complexity of the task, but no worries: we have Agile and Play Scala at our disposal.

Once you know that you'll miss won't make a deadline, all risk is gone, and only certainty is left.

I agreed with the project manager to implement the simplest, dumbest, safest thing there was, so at least we would not miss the deadline. It would take 1 day to get into master.

This would be our basic interface:

trait Feature {
  def execute(inputXml: Elem): Future[Result]
}

And a very dumb implementation in Scala, called ‘StaticFeature’

class StaticFeature extends Feature {
  def execute(inputXml: Elem): Future[Result] = {
    Future.successful(StaticFeature.staticResult)
  }
}
object StaticFeature {
  val staticResult: Result = ???
}

Now knowing that the management will be asking for constant updates for the final deliverable, I agreed with the Product Owner that it'll make more sense to implement something simple that at least partially meets the requirement, but would take a bit less. It took 2 weeks and we we were able to deploy it to production quite easily. If anything went wrong, we'd be able to switch back to something we agreed to earlier (ie ‘StaticFeature’).

Welcome to ‘AdvancedFeature’, which is 10x more complicated than ‘StaticFeature’, is tested, and can fall back to ‘StaticFeature’ when it is unable to fit the bill.

class AdvancedFeature(staticFeature: StaticFeature)
  extends Feature {
  def execute(inputXml: Elem): Future[Result] = {
    Future(blocking(AdvancedFeature.process(inputXml)))
      .recoverWith {
        case NonFatal(e) => staticFeature.execute(inputXml)
      }
  }
}
object AdvancedFeature {
  def process(inputXml: Elem): Result = ???
}

But let's say rather than fail and fall back, ‘AdvancedFeature’ returned incorrect output? Because the input XML is very loose, there was no guarantee that everything would work as expected and none of it could be easily unit tested.

However, how do we switch back? Revert Git code? Redeploy a configuration change? Come on! We have the JVM at our disposal. It has something called JMX (“Java Management Extensions”) which provides you with a way to manage your application at runtime. You can do metrics, rebind ports, change configuration options, change logging verbosity, run diagnostics — all sorts of things. So here's how you do it:

First you write a generic interface, a “Management Bean”:

package features
trait FeatureConfiguratorMBean {
  def getFeatureLevel: String
  def setFeatureLevel(name: String): Unit
}

Then you write an implementation which registers this component to the JMX registry and allows its methods to be called remotely:

package features
class FeatureConfigurator() extends FeatureConfiguratorMBean {
  val platformMBeanServer = ManagementFactory.getPlatformMBeanServer
  val objectName = new ObjectName("app:type=FeatureConfigurator")
  platformMBeanServer.registerMBean(this, objectName)
  def stop(): Unit = {
    platformMBeanServer.unregisterMBean(objectName)
  }
  def getFeatureLevel: String = ???
  def setFeatureLevel(name: String): Unit = ???
}

And now when your app runs, you open up Java Mission Control (jmc in UNIX shell), connect to the app, and then you can change values immediately (it'd call setFeatureLevel method upon pressing Return):

Of course in production you might want to have something easier to use, in which case I used Java 8's JavaScript interpreter jjs to connect to a remote running process and change the value of configuration via a Shell script.

Now the business had the certainty that instead of being stuck with incorrect behaviour they'll get a lesser version which gives the expected behaviour.

Final implementation, ‘ComplexFeature’:

class ComplexFeature(serviceA: ServiceA, serviceB: ServiceB,
                     advancedFeature: AdvancedFeature) 
  extends Feature {
    def execute(inputXml: Elem): Future[Result] = {
      Future(ComplexFeature.process(serviceA, serviceB))
        .recoverWith {
          case NonFatal(e) => advancedFeature.execute(inputXml)
        }
    }
}
object ComplexFeature {
  def process(serviceA: ServiceA, serviceB: ServiceB):
    Future[Result] = ???
}

This was the most complicated by far, and the business wanted to do an A-B type of roll out of the feature into several production environments. Run on the first environment for a few weeks, then run it on the second, then the third, and then we're done with the fourth. Should anything go wrong, we resort to the lesser versions and change the level at runtime. Easy & convenient.

This approach was successful. Ops loved it. PO loved it.

If feature C started producing incorrect results, it would fall back to B.
If feature B started producing incorrect results, it would fall back to A.
Ops were able to switch between these levels at runtime.
Ops were able to deploy the same code to different production environments and A/B test the changes.
Ops and Test were able to deploy the same code to production, staging and test environments and verify the behaviour of every single feature.
Ops were able to monitor behaviour and the rates of fall back on DataDog.

Something now is almost always better than nothing now.

But everything now is better than something now, so choose Scala & JDK 8, it has it all.

Connect up with me on Twitter.

—

Update, 4 Nov 2016: A reader gave a very good question:

So what are the downsides and upsides compared to, say, database switches?

My answer:

No downsides compared to database switches, only upsides. The thing you want to work out is how to execute this trigger since your service is remote.

1. Java Mission Control via SSH tunnelling. You'll need to specify extra Java properties to allow that.
2. A custom script to make the call from the server itself. This is the approach I took in the article. Tooling takes a bit of time to build.
3. Launch something like Hawtio on the server: http://hawt.io/ This is a very nice simple approach but you'll need to secure access control to it. Needs Jolokia: https://jolokia.org/
4. Use https://jolokia.org/ with your own administrative panel via REST. Where you want to abstract the low level details.

Choose #1 if your developers have access to production machines and know what they're doing (small team, few apps).
Choose #2 if you DevOps and Dev are in close communication but Dev don't have access to production.
Choose #3 if you have more applications running on the machine, maybe even several instances of the same app.
Choose #4 if you are delivering to a customer who has their own Ops team and need special reliable control of your stuff.
I think this will warrant a further article.
Databases shouldn't really be used for configuration or management. But of course not all platforms support management facilities.