The ultimate goal of every software product is to convert inputs(provided by end-users or automatically received from external systems) into valuable outputs (insights).
This is typically done via processing logic (= algorithms), which transform a number of inputs in a set of outputs.
Based on my experience these process-logic components can be divided into 5 categories:
- Calculation: a calculation is a clear mathematical formula, which allows to calculate a specific value. This category can be identified by the fact that all required information is available as input and that the output is precise, i.e. there is only 1 correct result and this is perfectly repeatable. Examples are the calculation of the average price of a share in the last year, the calculation of the time-weighted rate of return of an investment portfolio, the reimbursement schedule of a credit or the interest amount to be paid out on a saving account every quarter.
- Data lookups: a data lookup consists of retrieving a result in its raw form or in a processed form (e.g. sorted, filtered, aggregated…) from a database. Typically this consists of a query executed to retrieve specific data. Examples are the lookup of the data of a specific customer, the retrieval of all customer transactions of last months in descending order of transaction amount or the number of customers and total transaction amount generated last year grouped by customer country.
- Expert systems: these are trees of (nested) "If-Then-Else" clauses (or similar conditional logic), resulting in a workflow, flowchart or decision tree. This means the algorithm will check one or more conditions to determine which path in the business logic needs to be taken (i.e. in order to determine the next step). Examples are decision rules to determine which type of analysis is required for a credit file, fraud and AML protection rules in the payments domain…
- Optimization problems: these are mathematical calculations with, in principle, only 1 solution, but it is often impossible (or too complex or too resource/time consuming) to know if the perfect solution is obtained. Usually these algorithms work with a cost function which needs to be optimized, but it is difficult to know if a local minimum or the absolute minimum is reached. Nonetheless all rules are clearly defined and given enough processing (calculation) power an exact result can be obtained. Examples are resource scheduling problems, portfolio rebalancing based on number of (customer) investment constraints and recommendation lists or model portfolios… More info can also be found in my blog "Optimisation problems - Far from being a commodity" (https://bankloch.blogspot.com/2020/05/optimisation-problems-far-from-being.html).
- System Identification (= SI) & Machine Learning (= ML) problems: these are problems for which it is difficult or even impossible to describe the rules of how the system (the algorithm) should behave. Instead the algorithm is setup as a black box, for which a large number of coefficients/parameters (typically elements in a matrix) are estimated by training the model. This is done by tuning these coefficients, so that they best match the test data in- and outputs provided to the model. Typical examples are fraud/AML detection, recommendation engines and engines to define the next-best-action for users, anomaly detection, speech and image recognition…
Obviously, many algorithms are a combination of these categories. E.g. a pricing calculation engine (cfr. my blog "Calculation engines in Financial Services - A key differentiator in the business strategy") can combine a number of categories, e.g. * Data Lookup: to retrieve certain parameterizable values, which can be easily configured by business users (e.g. the percentage of discount granted for specific credit types), but also to retrieve data required to feed the engine (like e.g. the segmentation of the customer) * Expert system: set of rules to decide which pricing regime should be applied * Calculation: obviously a number of calculations need to be applied, like calculating the price based on different factors (e.g. apply a percentage on the transaction amount), applying minimum and maximum thresholds… * SI & ML: allow to setup dynamic pricing models, which evolve automatically based on whether customers accepted or not products at a proposed price.
When defining business logic (or an algorithm) it is important to choose the right category (or right combination of categories).
This is typically a compromise between:
- Flexibility to change the behavior of the algorithm
- Operational Complexity to explain/understand a result, but also to validate if the algorithm behaves as expected
- Implementation Complexity, i.e. the complexity to implement the algorithm in code form.
- Knowledge of the rules that drives the logic and the ability to easily describe it and along with it the number of factors to take into account
For example: an algorithm of type "calculation" is very little flexible and requires a very precise knowledge and description of all governing rules, but once defined it is typically easy to explain a result and to test it. Given that a certain set of inputs will always lead to the same output(s), it is also easy to setup automated regression tests.
On the other hand an algorithm of type "SI & ML" can be very flexible (often it will even adapt itself automatically) and requires less knowledge of the associated rules, but it is often impossible to explain/understand a result and it is nearly impossible to fully validate the algorithm, i.e. to ensure that the algorithm will not give very bad results in certain edge cases.
The implementation of every category of algorithm is furthermore supported by specific implementation software, i.e.
- Calculation: hundreds of (often open source) mathematical libraries exist to calculate any type of result (e.g. libraries with statistical formulas or libraries with financial calculations)
- Data lookup: SQL is the standard for interacting with a database, but often all kinds of abstraction layers have been built on top of this to support more easy complex data lookups, like Hibernate, QueryDSL, jOOQ, Spring Data…
- Expert systems: obviously every programming language supports If-Then-Else and Case clauses, but there are also hundreds of abstraction layer, helping to implement these type of algorithms, like BPMS systems (like TIBCO ActiveMatrix BPM, IBM BPM, Oracle BPM, Camunda BPM, jBPM…), Workflow systems (like Nintex, Zapier, ProcessMaker…) and Business Rule engines (like Kissflow Process, IBM Operational Decision Manager, Drools, Red Hat Decision Manager - formerly JBoss BRMS), Progress Corticon Business Rules Engine, SAS Business Rules Manager, Hyperon…)
- Optimization problems: this is still a bit of an unexplored and immature domain, with little (user-friendly) tooling available, like I also mentioned in one of my previous blogs. Interesting names to look at are JuMP (based on Julia language), ADMB, GLPK, OpenMDAO, Motulus, OptaPlanner… However all those tools are still rather complex and therefore still difficult to use for non-specialized developers.
- SI & ML: in this space TensorFlow is the most known abstraction to setup such algorithms, but obviously many alternatives exist like PyTorch, Keras, Amazon SageMaker, IBM Watson Studio…
With the popularity of AI/ML, people try to implement many algorithms with AI/ML logic. While AI/ML is a powerful tool, it comes with a number of disadvantages as well. As such it is best to use it only when the majority of the rules to describe the relation between in- and outputs are unknown. If a part is known, it is probably better to start with other algorithm categories and potentially fine-tune the result with AI/ML.
It is important therefore to approach the categories in the order as set above, i.e. use an exact calculation if possible, otherwise use rules derived via an expert system, data lookup or optimization problem. AI/ML should be the last resort, when it is impossible to properly define the guiding business rules.
This is important, as too often people consider nowadays AI/ML as the first solution to any problem. E.g. in recommendation engines, people are pushing more and more AI/ML, while this might not always be the best idea, as a lot of rules to recommend a product are known, often there is insufficient data available to properly train the AI/ML model and being able to explain recommendations is often important as a salesperson needs to support the process or regulators require proof of certain propositions/decisions.
As always in IT, there is no silver bullet and not one solution to rule world, but instead a deliberate choice needs to be made, where pros and cons are weighted against each other, instead of surfing along on the latest buzz.