How ML can improve alarms classification to detect market abuse
A crucial balance in financial regulation
In the financial sector, market abuse — a collection of practices that undermine or illegally exploit market operations — threatens markets integrity and public trust. Through a rigorous screening enforced by the Market Abuse Regulation (MAR), adopted by the European Parliament and the Council of the European Union in 2014 and effective from 03 July 2016, institutions are tasked with monitoring transactions and are responsible for developing specific algorithms to identify anomalous patterns in data flows. The MAR, which applies across all EU member states, defines what an anomalous pattern is and the rules to apply for different abuse categories:
- Insider Dealing
- Market Manipulation
- HFT
- Cross Product Manipulation
- Inter-Trading Venues Manipulation, and
- Bid-Ask Spread
This is where market abuse detection (MAD) systems come into play, offering an advanced solution in this space.
A high number of alarms to review
In MAD, the algorithms’ parameters defining an unusual trading activity are set within certain ranges, and surpassing these limits prompts the system to raise an alarm. This alarm is then assessed by a person in charge of reporting verified market abuse instances to regulators. Firms must calibrate the parameter thresholds to detect real abuses while minimizing false alarms. Typically, the approach is conservative, leading to the generation of an excessive number of alerts that, upon review, are found to be harmless. This burdens compliance officers with a largely unnecessary workload and undermines confidence in monitoring tools.
A machine learning approach
At LISTGroup, we prioritize leveraging cutting-edge technologies, also in the realm of artificial intelligence (AI) solutions. Drawing on this commitment, in order to overcome the above challenge, a machine learning (ML) classifier was designed to be trained on the past behavior of compliance officers and to guide them for future decision-making. To do this, we analyzed a dataset of about 360,000 alarms generated by a European tier one bank over a year (table 1).
Pattern | Closed | Signaled | Total | Closed % |
Large daily gross 19.3% | 69855 | 177 | 70032 | 99.75% |
Insider trading MAR 18.9% | 68422 | 277 | 68719 | 99.96% |
Pair crossed trades 12.7% | 46028 | 4 | 46032 | 99.99% |
Order frequency 5.9% | 21568 | 22 | 21590 | 99.90% |
Painting the tape 6.0% | 21379 | 440 | 21819 | 97.98% |
OTC off price 5.0% | 18155 | 2 | 18157 | 99.99% |
Daily gross leading price move 4.7% | 16945 | 152 | 17097 | 99.11% |
Daily net leading price move 3.6% | 13006 | 88 | 13094 | 99.33% |
Smoking 2.9% | 10724 | 0 | 10724 | 100.00% |
Unbalanced bid-ask spread 2.5% | 9134 | 0 | 9134 | 100.00% |
Crossed trades 2.3% | 8202 | 323 | 8525 | 96.21% |
Front running and tailgating 2.2% | 7995 | 0 | 7995 | 100.00% |
Decreasing the offer 1.7% | 6330 | 2 | 6332 | 99.97% |
Advancing the bid 1.7% | 6017 | 17 | 6034 | 99.72% |
Spoofing 1.4% | 4803 | 193 | 5196 | 92.44% |
Abusive squeeze 1.3% | 4538 | 70 | 4608 | 98.48% |
Momentum ignition 1.3% | 4517 | 121 | 4638 | 97.39% |
Marking the close 1.2% | 4301 | 85 | 4386 | 98.06% |
Quote stuffing 0.9% | 3337 | 1 | 3338 | 99.97% |
Creation of a ceiling in the price pattern 0.9% | 3195 | 6 | 3201 | 99.81% |
Creation of a floor in the price pattern 0.8% | 2738 | 5 | 2743 | 99.82% |
Attempted position reversal 0.8% | 2564 | 315 | 2879 | 89.06% |
Deleted orders 0.5% | 1721 | 24 | 1745 | 98.62% |
Position reversal 0.4% | 1271 | 21 | 1292 | 98.37% |
Front running 0.3% | 1042 | 0 | 1042 | 100.00% |
Wash trades on own account 0.3% | 976 | 3 | 979 | 99.69% |
Improper matched orders 0.2% | 863 | 39 | 902 | 95.68% |
Placing orders with no intention of executing … | 422 | 0 | 422 | 100.00% |
Pump and dump 0.1% | 271 | 14 | 285 | 95.09% |
Front running and tailgating OTC 0.1% | 235 | 0 | 235 | 100.00% |
Ping orders 0.1% | 232 | 1 | 233 | 99.57% |
Inter trading venues manipulation 0.0% | 133 | 0 | 133 | 100.00% |
Trash and cash 0.0% | 70 | 4 | 74 | 94.59% |
Underlying leaded cross product manipulation 0.0% | 1 | 0 | 1 | 100.00% |
The dataset is labeled with two categories: Closed and Signaled (that is those sent to the regulators). The number of raised alarms is roughly 360k, but as you can see the majority were closed upon reviewing as not actually related to a fraud.
Each alarm is represented by a set of relevant features, including both numerical and categorical variables. In Table 2, we show a subset of the features available for the Large Daily Gross (LDG) pattern, which refers to a significant or unusually high level of trading activity or gross trading volume observed within a single trading day in financial markets.
Field | Values |
TRADING_TYPE | Professional |
CLIENT_TYPE | Professional |
ENTITY_CATEGORY1 | None |
ENTITY_CATEGORY2 | S02 |
INVOLVED_INSTRUMENT_ASSET_CLASS | Equity |
REFERENCE_MIC | MTAA |
NUMBER_OF_INVOLVED_ORDER | 34 |
NUMBER_OF_INVOLVED_TRADES | 76 |
NUMBER_OF_ALERTS_SAME_CLIENT | 448 |
NUMBER_OF_ALERTS_SAME_CLIENT_INSTRUMENT | 6 |
NUMBER_OF_ALERTS_SAME_CLIENT_INSTRUMENT_TYPE | 1 |
Absolute % intraday price delta_ACTUALVALUE | 28.63 |
High day price_ACTUALVALUE | 0.656 |
Low day price_ACTUALVALUE | 0.51 |
Market daily volume_ACTUALVALUE | 541637 |
Min % daily volume_ACTUALVALUE | 255.11 |
Min % exec daily volume_ACTUALVALUE | 55.53 |
Min daily volume_ACTUALVALUE | None |
Minimum value of orders/trades_ACTUALVALUE | None |
Minimum value of single order/trade_ACTUALVALUE | None |
Value of orders_ACTUALVALUE | 834162.69 |
Value of trades_ACTUALVALUE | 173260.75 |
Volume of orders_ACTUALVALUE | 1381786 |
Volume of trades_ACTUALVALUE | 300755 |
Crafting the most suitable classifier
The primary challenge of training our model comes from the significant imbalance within our dataset, with Closed alarms far outnumbering Signaled ones. This imbalance introduces a risk of asymmetric classification errors. Mistakenly closing an alarm that should have been signaled, that is the False Negative Rate (FNR), could have serious regulatory repercussions. Conversely, incorrectly signaling an alarm that should be closed, that is the False Positive Rate (FPR), is less impactful although still undesired. Therefore, we chose to use a Random Forest Algorithm (RFA) due to its proficiency in managing imbalanced datasets and its robustness against overfitting. Balancing these two quantities is our primary objective in the model fine-tuning process.
Decoding ML classification: a look at metrics
The effectiveness of our RFA is measured by two critical customized metrics: RISK and WORK.
RISK quantifies the chance of mistakenly closing an alarm that should be signaled. Considering 1000 alarms, it is expressed as:
RISK = 1000 × P(RFA=clo) × P(sig|RFA=clo),
where P(RFA=clo) is the probability that RFA predicts ‘closed’, and P(sig|RFA=clo) is the conditional probability of an alert being signaled, given the RFA predicting ‘closed’.
WORK, on the other hand, calculates the residual alarms that require manual checks by the Compliance Officer, and is defined as:
WORK = 1000 × P(RFA=sig).
Here, P(RFA=sig) represents the probability of RFA classifying an alarm as ‘signaled’.
Results
Here, we present the results yielded by the classifier and their meaning for day-to-day MAD activities.
Our experiments consist of randomly splitting the data into training and test sets; training the RFA; and then measuring the performance using a confusion matrix (that is a box with true classes on the rows and predicted classes on the columns) to calculate our key metrics, RISK and WORK. The results, for a total of 100 experiments, are shown in the above box plots (RISK on the left and WORK on the right), where each green triangle marks the average of all the experiments for each pattern.
The findings are promising—employing ML allows us to reduce the number of alarms from 30,000 to roughly 2,593, with an average RISK of missing about 10.5 market abuse cases in a month. While this may seem significant, a closer inspection reveals that such risk is mostly associated with complex patterns like “Insider Trading” or “Painting the Tape”, which collectively account for half of the total risk.
Efforts are ongoing to enrich the dataset with relevant features we could have missed to further refine our model. It is worth noting that the strength of this supervised learning approach in MAD is intrinsically linked to the availability of historical data. A robust history of classified data is critical for the algorithm to effectively identify, and adapt to, various abuse patterns.
This solution equips compliance officers with a tool to prioritize their attention on specific alarms. Integrated into a streamlined system that manages the generated alarms, it allows for a significant working time reduction. Below we show an example of our implementation in the LookOut product.
On the left part of the screen, you can see the status of the alert and the relevant metrics returned by the ML classifier for each alarm, such as the suggested classification, a risk measure, and the model confidence level. Then, selecting a specific alarm, a section appears on the right side containing all the relevant information about the generated alert, which can be exploited to carry out a deeper investigation.
In conclusion, we have shown the promising role of ML in addressing the increasingly resource-demanding task of market abuse detection, significantly lightening the compliance officer’s load.
A common concern with deploying ML in real-world scenarios is the perceived opacity of the models during their training phase, which can lead to skepticism about their decisions. Stay tuned for our upcoming posts, where we’ll tackle the topic of ML transparency, exploring methods to enhance the classifiers’ interpretability and demystifying the processes behind these powerful tools.
The content is adapted from this series on the LIST team’s blog.
Don't miss out
Subscribe to our blog to stay up to date on industry trends and technology innovations.