Two Sigma strives to leverage massive amounts of data, advanced engineering, and quantitative research through every step of the investment process, from data ingestion to forecasting, portfolio selection and trade execution. Parameterized systems exist at the heart of each stage. Finding the optimal settings for these parameters is an ongoing challenge that we are always improving. We have dedicated significant resources to thinking about how to evaluate the utility of all our system settings and building infrastructure to scale these evaluations.
Even with the capability to perform a massive amount of computation on the cloud, we still need to think deeply about how to search the space of parameters for the best possible values as quickly and cheaply as possible. We prefer to adopt open source solutions for common problems like this and fall back on developing in-house solutions when we think it will give us a competitive edge, but for this problem we actually found that the most logical solution was to use a proprietary service that outperformed everything else we evaluated.
The optimization bottleneck
Some systems are simple or well-studied enough to have closed-form analytic solutions. Others, like increasingly popular deep learning algorithms, have analytic mathematical formulations that make them good targets for powerful gradient descent methods. Unfortunately, much of our research requires intricate simulations where none of these fast optimization methods can be used. We also have cases where the parameters for machine learning algorithms have to be tuned and all the typical tricks don’t work. For these scenarios, we have to turn to techniques that can work on “black-box” functions.
We’ve tried a variety of standard techniques to solve this problem. The simplest approach is called a “grid search,” where every parameter is varied over some range in equal step sizes and every combination is evaluated. While easy to code and use, this approach becomes very expensive and slow for even a moderate number of parameters. We’ve implemented more sophisticated techniques internally and have also brought in a number of open-source packages that try to solve the problem. Recently, we’ve turned to the increasingly popular field of Bayesian optimization and tried incorporating open source tools like GPyOpt into our process.
None of these approaches have been perfect. Many of the techniques are hard to use and require a lot of tuning themselves to work well. Replacing the problem of tuning parameters with the problem of tuning our tuners isn’t really a solution at all. Even with all the tuning in the world, many of these algorithms will never work well on certain classes of problems. For the internally built tools, we found that the cost of building, updating and maintaining the systems was a greater tax on our resources than we expected.
To solve these challenges, we’ve integrated SigOpt’s optimization service and are now able to get results faster and cheaper than any solution we’ve seen before. We were so pleased with SigOpt that, in addition to becoming a customer, Two Sigma decided to become a strategic partner, which includes investing in SigOpt.
SigOpt optimizes any system at scale
In a departure from our preference for open-source or internally built tools, we trialed SigOpt as the optimization engine in a component of our research platform. With a few lines of Python code, we were able to quickly call the SigOpt optimization algorithms for any of our tuning problems in a standardized way. We then incorporated SigOpt as the optimization engine for multiple research tools that expose experiments’ results in a consistent way using the platform’s powerful dashboard. Since then, our researchers have seen multiple benefits that ultimately accelerate the development process with an aim to amplify their impact on our business.
First, SigOpt drove significant performance gains for our systems. In testing against alternative methods like GPyOpt, SigOpt delivered results much faster. To contextualize this significant performance gain, consider one machine learning algorithm that had particularly lengthy training cycles. Using GPyOpt, it took 24 days to tune. With SigOpt, the tuning process resulted in a more accurate solution and only took 3 days to tune. We discovered a better performing solution 8x faster than we would have otherwise. The faster time to tune is often the difference between trying to optimize or not. In this sense, these performance gains led to a productivity improvement that has real impact on our business.
Second, SigOpt offered advanced optimization features that allowed us to solve entirely new business problems. One of the more intuitive examples of these advanced features is multi-metric optimization. When analyzing simulation results, there are many measures of performance. We could try to boil everything down to a single metric, but this would require hard-coding trade-offs in a way that may be suboptimal. Instead, SigOpt allows us to optimize multiple metrics at the same time and see the Pareto-optimal frontier of solutions that we can analyze using its web dashboard. This feature is also useful in traditional machine learning scenarios where we might want to sacrifice accuracy for inference time and want to understand exactly what the trade-off is.
An unexpected benefit of the increased power of the optimization algorithm and new ways of specifying multiple metrics has pushed us to rethink several metrics and has revealed new insights into the research process.
Finally, SigOpt offers asynchronous parallelization across our distributed compute environment. Other solutions take advantage of our massive clusters, but would evaluate tasks in batches and wait for every task within the batch to complete before launching the next set of tasks. SigOpt’s solution provides a new task to evaluate as soon as one completes, meaning all of our machines are utilized throughout the optimization process. This gets us answers faster and ensures that we’re efficiently using our compute resources. With our vast compute resources, any optimizations to the process makes a significant difference.
Why empowering our experts is the key to our success
Each of these capabilities is designed to automate and accelerate tasks in the optimization process that do not benefit from domain expertise and traditionally are solved with expensive trial and error or distracting maintenance of brittle, unscalable open source methods. As a consequence, each of our researchers has more time to apply their domain expertise to develop new insights and evolve new solutions with the goal of having an even greater impact on our business.
Practitioners who face optimization bottlenecks similar to Two Sigma’s can learn more about SigOpt by reading their ongoing applied optimization research, reviewing their API docs to see if they are a potential fit for your team or signing up for a trial.