On every major project I’ve worked on we had a situation when a database query or an HTTP call to a remote service would start timing out and failing. At times that would cause the entire aplication to degrade or go down completely. We would try to fix every case, often in an urgent manner, or manually disable the piece causing trouble so we had at least the rest of the application running. Eventually we’d introduce timeouts for critical pieces and wrap parts of code into try/catch blocks.
The truth is: in complex distributed systems a failure is inevitable. Please meet Phystrix: a PHP library by oDesk we recently open-sourced. It helps solving the problem in a uniform clear way.
Let’s say you have a SQL query to get top commenter on your web site for the last week
You may be invoking it directly or your ORM may be doing this for you.
One day you notice your app has slowed down and getting worse with time, your clients are complaining. Having stress and working over time, you finally find out that the culprit is this query. It serves to display some piece of information on every page load. Until you have it figured it out, you disable the widget, letting the rest of the application run. Things are back to normal for a while.
It doesn’t even matter what exactly was wrong with the query. You may have misconfigured some RDBMS indexes, or your initial analysis of how the data was going to be used was wrong. It doesn’t have to be a SQL query either, may as well be an HTTP call to a backend or to something like Google API.
With Phystrix, you wrap your SQL in a command (see CommandPattern):
Then, whenever you need to execute it, you instantiate your command and call the execute() method:
Here the command is instantiated via the Phystrix command factory, which makes sure all required dependencies are injected. You could instantiate the command with the new keyword, too. You would need to incorporate some static initialization code into your commands for that:
Now, with the command pattern used to isolate the SQL query, Phystrix will count all successful and failed requests. If within a specific timeframe the query times out more than it’s allowed, Phystrix will block the query and return getFallback() function result instead. For a while, Phystrix will not attempt to execute the query again! Letting the rest of your application function as it should and relieving the database engine.
In case it was a temporary outage, Phystrix will be trying to execute the query every once in a while. One single attempt to see if this point of access to the database is back up. If it is - Phystrix unblocks the command and starts gathering success/failure statistics again.
How Phystrix behaves is fully configuration-driven. Following examples on the official GitHub page, you can instantiate the command factory by providing a configuration file:
“Circuit Breaker” is the feature responsible for blocking any given command execution. For instance, if you want to disable a command unconditionally - you can do so by setting “forceOpen” to true.
You can also pass any custom configuration parameters, like a SQL timeout specific to a particular command:
you can then obtain this configuration in your command as follows:
You can also define configuration for a particular context:
Phystrix has a similar feature set as Hystrix - the library for Java by Netflix. Request cache and request log, custom service locator, flexible configuration - all these make Phystrix usage convenient. Read more about them on the official GitHub Page.
It’s also possible to use the awesome Hystrix monitoring tool with Phystrix:
See Phystrix Dashboard for that.
After using Phystrix in production environment for a while I can see that every project I worked on in the past would benefit from it. For enterprise, high-load, complex distributed systems with diverse data source - a tool like Phystrix is simply a must.