How ML can improve alarms classification to detect market abuse

May 10, 2024

A crucial balance in financial regulation

In the financial sector, market abuse — a collection of practices that undermine or illegally exploit market operations — threatens markets integrity and public trust. Through a rigorous screening enforced by the Market Abuse Regulation (MAR), adopted by the European Parliament and the Council of the European Union in 2014 and effective from 03 July 2016, institutions are tasked with monitoring transactions and are responsible for developing specific algorithms to identify anomalous patterns in data flows. The MAR, which applies across all EU member states, defines what an anomalous pattern is and the rules to apply for different abuse categories:

  • Insider Dealing
  • Market Manipulation
  • HFT
  • Cross Product Manipulation
  • Inter-Trading Venues Manipulation, and
  • Bid-Ask Spread

This is where market abuse detection (MAD) systems come into play, offering an advanced solution in this space.

A high number of alarms to review

In MAD, the algorithms’ parameters defining an unusual trading activity are set within certain ranges, and surpassing these limits prompts the system to raise an alarm. This alarm is then assessed by a person in charge of reporting verified market abuse instances to regulators. Firms must calibrate the parameter thresholds to detect real abuses while minimizing false alarms. Typically, the approach is conservative, leading to the generation of an excessive number of alerts that, upon review, are found to be harmless. This burdens compliance officers with a largely unnecessary workload and undermines confidence in monitoring tools.

A machine learning approach

At LISTGroup, we prioritize leveraging cutting-edge technologies, also in the realm of artificial intelligence (AI) solutions. Drawing on this commitment, in order to overcome the above challenge, a machine learning (ML) classifier was designed to be trained on the past behavior of compliance officers and to guide them for future decision-making. To do this, we analyzed a dataset of about 360,000 alarms generated by a European tier one bank over a year (table 1).

 

Pattern Closed Signaled Total Closed %
Large daily gross 19.3% 69855 177 70032 99.75%
Insider trading MAR 18.9% 68422 277 68719 99.96%
Pair crossed trades 12.7% 46028 4 46032 99.99%
Order frequency 5.9% 21568 22 21590 99.90%
Painting the tape 6.0% 21379 440 21819 97.98%
OTC off price 5.0% 18155 2 18157 99.99%
Daily gross leading price move 4.7% 16945 152 17097 99.11%
Daily net leading price move 3.6% 13006 88 13094 99.33%
Smoking 2.9% 10724 0 10724 100.00%
Unbalanced bid-ask spread 2.5% 9134 0 9134 100.00%
Crossed trades 2.3% 8202 323 8525 96.21%
Front running and tailgating 2.2% 7995 0 7995 100.00%
Decreasing the offer 1.7% 6330 2 6332 99.97%
Advancing the bid 1.7% 6017 17 6034 99.72%
Spoofing 1.4% 4803 193 5196 92.44%
Abusive squeeze 1.3% 4538 70 4608 98.48%
Momentum ignition 1.3% 4517 121 4638 97.39%
Marking the close 1.2% 4301 85 4386 98.06%
Quote stuffing 0.9% 3337 1 3338 99.97%
Creation of a ceiling in the price pattern 0.9% 3195 6 3201 99.81%
Creation of a floor in the price pattern 0.8% 2738 5 2743 99.82%
Attempted position reversal 0.8% 2564 315 2879 89.06%
Deleted orders 0.5% 1721 24 1745 98.62%
Position reversal 0.4% 1271 21 1292 98.37%
Front running 0.3% 1042 0 1042 100.00%
Wash trades on own account 0.3% 976 3 979 99.69%
Improper matched orders 0.2% 863 39 902 95.68%
Placing orders with no intention of executing … 422 0 422 100.00%
Pump and dump 0.1% 271 14 285 95.09%
Front running and tailgating OTC 0.1% 235 0 235 100.00%
Ping orders 0.1% 232 1 233 99.57%
Inter trading venues manipulation 0.0% 133 0 133 100.00%
Trash and cash 0.0% 70 4 74 94.59%
Underlying leaded cross product manipulation 0.0% 1 0 1 100.00%

The dataset is labeled with two categories: Closed and Signaled (that is those sent to the regulators). The number of raised alarms is roughly 360k, but as you can see the majority were closed upon reviewing as not actually related to a fraud.

Each alarm is represented by a set of relevant features, including both numerical and categorical variables. In Table 2, we show a subset of the features available for the Large Daily Gross (LDG) pattern, which refers to a significant or unusually high level of trading activity or gross trading volume observed within a single trading day in financial markets.

Field Values
TRADING_TYPE Professional
CLIENT_TYPE Professional
ENTITY_CATEGORY1 None
ENTITY_CATEGORY2 S02
INVOLVED_INSTRUMENT_ASSET_CLASS Equity
REFERENCE_MIC MTAA
NUMBER_OF_INVOLVED_ORDER 34
NUMBER_OF_INVOLVED_TRADES 76
NUMBER_OF_ALERTS_SAME_CLIENT 448
NUMBER_OF_ALERTS_SAME_CLIENT_INSTRUMENT 6
NUMBER_OF_ALERTS_SAME_CLIENT_INSTRUMENT_TYPE 1
Absolute % intraday price delta_ACTUALVALUE 28.63
High day price_ACTUALVALUE 0.656
Low day price_ACTUALVALUE 0.51
Market daily volume_ACTUALVALUE 541637
Min % daily volume_ACTUALVALUE 255.11
Min % exec daily volume_ACTUALVALUE 55.53
Min daily volume_ACTUALVALUE None
Minimum value of orders/trades_ACTUALVALUE None
Minimum value of single order/trade_ACTUALVALUE None
Value of orders_ACTUALVALUE 834162.69
Value of trades_ACTUALVALUE 173260.75
Volume of orders_ACTUALVALUE 1381786
Volume of trades_ACTUALVALUE 300755

Crafting the most suitable classifier

The primary challenge of training our model comes from the significant imbalance within our dataset, with Closed alarms far outnumbering Signaled ones. This imbalance introduces a risk of asymmetric classification errors. Mistakenly closing an alarm that should have been signaled, that is the False Negative Rate (FNR), could have serious regulatory repercussions. Conversely, incorrectly signaling an alarm that should be closed, that is the False Positive Rate (FPR), is less impactful although still undesired. Therefore, we chose to use a Random Forest Algorithm (RFA) due to its proficiency in managing imbalanced datasets and its robustness against overfitting. Balancing these two quantities is our primary objective in the model fine-tuning process.

Decoding ML classification: a look at metrics

The effectiveness of our RFA is measured by two critical customized metrics: RISK and WORK.

RISK quantifies the chance of mistakenly closing an alarm that should be signaled. Considering 1000 alarms, it is expressed as:

RISK = 1000 × P(RFA=clo) × P(sig|RFA=clo),

where P(RFA=clo) is the probability that RFA predicts ‘closed’, and P(sig|RFA=clo) is the conditional probability of an alert being signaled, given the RFA predicting ‘closed’.

WORK, on the other hand, calculates the residual alarms that require manual checks by the Compliance Officer, and is defined as:

WORK = 1000 × P(RFA=sig).

Here, P(RFA=sig) represents the probability of RFA classifying an alarm as ‘signaled’.

Results

Here, we present the results yielded by the classifier and their meaning for day-to-day MAD activities.

Table showing classifier results and their meaning for day‑to‑day MAD activities.

 

Our experiments consist of randomly splitting the data into training and test sets; training the RFA; and then measuring the performance using a confusion matrix (that is a box with true classes on the rows and predicted classes on the columns) to calculate our key metrics, RISK and WORK. The results, for a total of 100 experiments, are shown in the above box plots (RISK on the left and WORK on the right), where each green triangle marks the average of all the experiments for each pattern.

The findings are promising—employing ML allows us to reduce the number of alarms from 30,000 to roughly 2,593, with an average RISK of missing about 10.5 market abuse cases in a month. While this may seem significant, a closer inspection reveals that such risk is mostly associated with complex patterns like “Insider Trading” or “Painting the Tape”, which collectively account for half of the total risk.

Efforts are ongoing to enrich the dataset with relevant features we could have missed to further refine our model. It is worth noting that the strength of this supervised learning approach in MAD is intrinsically linked to the availability of historical data. A robust history of classified data is critical for the algorithm to effectively identify, and adapt to, various abuse patterns.

This solution equips compliance officers with a tool to prioritize their attention on specific alarms. Integrated into a streamlined system that manages the generated alarms, it allows for a significant working time reduction. Below we show an example of our implementation in the LookOut product.

On the left part of the screen, you can see the status of the alert and the relevant metrics returned by the ML classifier for each alarm, such as the suggested classification, a risk measure, and the model confidence level. Then, selecting a specific alarm, a section appears on the right side containing all the relevant information about the generated alert, which can be exploited to carry out a deeper investigation.

ML alert dashboard with metrics and detailed alert panel.

In conclusion, we have shown the promising role of ML in addressing the increasingly resource-demanding task of market abuse detection, significantly lightening the compliance officer’s load.

A common concern with deploying ML in real-world scenarios is the perceived opacity of the models during their training phase, which can lead to skepticism about their decisions. Stay tuned for our upcoming posts, where we’ll tackle the topic of ML transparency, exploring methods to enhance the classifiers’ interpretability and demystifying the processes behind these powerful tools.

The content is adapted from this series on the LIST team’s blog.

ION Markets

Don't miss out

Subscribe to our blog to stay up to date on industry trends and technology innovations.