2025 marks a turning point for artificial intelligence in finance. After a year of conversations dominated by consumer chatbots, the talk has now shifted to whether industrial-grade AI can withstand regulatory scrutiny, operate within risk frameworks, and survive a compliance audit.

This shift is captured clearly in the HKMA GenAI Sandbox report, developed with Cyberport, which offers possibly one of the most structured and transparent examinations of how banks are pushing to deploy GenAI today.

Rather than asking what a model can generate, it focuses on whether a model can be trusted, validated, governed, and deployed safely across high-stakes workflows.

The HKMA GenAI sandbox experimented with GenAI capabilities across three critical domains: risk management, anti-fraud, and customer experience.

HKMA GenAI report
Source: HKMA

Key Considerations in Evaluating GenAI Sandbox Applications

To determine which proposals made the cut in the HKMA GenAI sandbox, four strategic factors guided the prioritisation process.

hkma genai sandbox report
Source: HKMA

First, evaluators looked for a high level of innovation, specifically targeting solutions that introduced novel ideas or methodologies capable of unlocking new digitalisation opportunities.

Next, this was balanced against the complexity of the solutions, ensuring that proposals demonstrated the technical sophistication and intricacy necessary to drive genuine advancement and add value.

Beyond technical merit, the framework prioritised broader impact and responsibility.

Proposals needed to show a significant expected contribution to the industry, like addressing sector-wide challenges through applications that are replicable and scalable.

Finally, the selection process required strict adherence to the principle of fair use. This ensured that the sandbox’s pooled computing resources would be used in a responsible, efficient, and equitable manner.

Since the sandbox operates on shared infrastructure, this particular criterion assesses whether the solution is engineered to be efficient, ensuring it doesn’t monopolise computing power at the expense of other participants.

How Data Strategy and Preparation Affect GenAI Model Development

It goes without saying that the performance and reliability of GenAI solutions depend on the data they are trained or fine-tuned on.

Testing data are equally critical for rigorously testing the accuracy and contextual relevance of the model’s outputs.

The sandbox observed that planning and preparing datasets was one of the most resource-intensive and critical phases of development. It requires meticulous sourcing, cleaning, labelling, and curating to ensure they are representative, unbiased, and fit-for-purpose.

It often can consume as much, if not more, time and resources than the fine-tuning of the model solution itself.

Data Collection

To ensure datasets were inclusive and dependable, HKMA GenAI sandbox participants used a mix of independent public and proprietary sources. These datasets were curated to cover critical dimensions such as product lines, languages, and customer personas.

One bank used a recursive web crawler to extract information from its public website. The crawler moved through all accessible links from the main domain and automatically fed the content into a datastore.

This approach captured the website’s information completely without human intervention, and it preserved a highly accurate copy of the site’s content.

Data Pre-processing

Once datasets were collected, they required pre-processing before being used to improve data quality and protect privacy.

To maintain consistency and support efficient fine-tuning, banks carried out several data-cleansing and standardisation steps. A key objective was to remove irrelevant “noise” from text inputs and ensure more uniform model outputs.

Banks also implemented several techniques to remove or obscure sensitive information. This included data masking, tokenisation in a data-security context, synthetic data generation, and pseudonymisation.

how to generate synthetic data genai
Source: HKMA

Beyond structured data, banks also worked with large amounts of unstructured content. Document pre-processing remains important, even for multimodal LLMs.

As LLMs also have a limited context window, long documents must be split into smaller, logically structured segments. Data chunking helped in preserving context, according to the report.

Data Augmentation

While standardisation is important, having diverse data is essential for improving a model’s adaptability.

To reduce the risk of overfitting and minimise hallucinations arising from limited training samples, select banks used data augmentation to expand and enrich their datasets. Another method focused on synthesising user inputs.

Banks also paid close attention to multilingual balance. Tests confirmed that balancing English, Traditional Chinese, and Simplified Chinese is critical to prevent data skew and ensure terminological accuracy across language variants.

Data Quality Check

A thorough data quality check is vital to assess the reliability of the data assets and uncover any underlying issues.

When preparing sample data for training, one participating bank reviewed common scenarios, edge cases and exceptional situations to ensure the data aligned with its internal standards.

Handling missing data was another important part of the process.

This required identifying which fields were optional or frequently absent, then deciding on the most appropriate approach: whether to impute the missing values, label them as “Unknown,” or remove non-essential fields that were consistently incomplete.

The final step was comprehensive documentation, which ensured the entire process could be reproduced and provided a clear reference for future updates.

Continuous Data Updates

key optimisation strategies
Source: HKMA

Maintaining model accuracy requires a continuous flow of fresh, relevant data. This would involve implementing automated or semi-automated data pipelines designed to ingest new information from approved sources.

Once collected, this incoming data was processed using the same cleansing and standardisation procedures applied during the initial pre-processing stage, ensuring consistency across the dataset.

With refreshed and expanded datasets in place, periodic model retraining would help prevent performance degradation over time and ensure the model stayed aligned with evolving language patterns, product updates, and current market conditions.

Strong Early Results, With Guardrails Still Essential

The first cohort of the HKMA GenAI Sandbox delivered clear, tangible gains for participating banks.

Institutions reported significant time savings, cutting preparation work for Suspicious Transaction Reports by 30-80%, reducing memo processing from a full day to just minutes, and shortening the production time for high-quality outputs by 60%.

Fine-tuned LLMs were also able to analyse 100% of case narratives, far surpassing traditional sampling methods. User response was consistently positive as well, with 86% of GenAI outputs rated favourably and more than 70% of GenAI-generated credit assessments considered valuable references.

business outcomes of genai sandbox
Source: HKMA

At the same time, the sandbox’s report indicated that risks still must be managed. Hallucinations and factual inaccuracies remain key concerns in high-stakes banking environments.

While advanced prompt-engineering techniques helped reduce these errors, banks recognised that these methods mitigate rather than eliminate the issue. For sensitive applications, additional safeguards like RAG and fine-tuning are still required to ensure accuracy and reliability.

The second cohort of the GenAI sandbox will build on this foundation by shifting its focus toward deeper governance and risk capabilities.

Key priorities for the second cohort include “embedding AI governance within three lines of defence, developing ‘AI vs AI’ frameworks for proactive risk management, and implementing adaptive guardrails and self-correcting systems”.

Featured image by freepik on Freepik