LLMs in Investment Research (I)

neuralgap.io

LLMs in Investment Research (I) - The Excel Enigma

Despite the remarkable strides LLMs have demonstrated, deciphering the complex world of financial excel models has still been a formidable challenge. The structured, data-rich environment of spreadsheets, particularly those used in financial modeling, presents a unique set of obstacles that push the boundaries of what LLMs can currently achieve.

At first glance, an excel spreadsheet might seem simpler to process than a lengthy text document because it’s just rows and columns of data. But for an LLM, which was primarily trained on natural language, interpreting a financial model is akin to reading a foreign language written in a three-dimensional, interconnected script. In this article we dive into why these types of data remain a challenge and how we can overcome them.

Complexity of Linkages

The first hurdle LLMs encounter is the complex structure of financial models. Unlike linear text, these spreadsheets are often sprawling ecosystems of multiple interconnected sheets. Each cell can be a world unto itself, hosting formulas that reference data across different sheets and even different files. The hierarchical nature of financial statements – where an income statement feeds into a balance sheet and cash flow statement – adds yet another layer of complexity. For an LLM, understanding these intricate relationships is like trying to map a city by looking at it through a keyhole.

Formula comprehension

Formula comprehension presents another significant challenge. Financial Excel models are often built on a foundation of complex formulas, ranging from nested IF statements to array formulas and financial functions like NPV and IRR. These aren’t just calculations; they’re the DNA of the financial model, encoding business logic, assumptions, and financial principles. An LLM needs to not only parse the syntax of these formulas but also grasp their financial implications and how they contribute to the overall model structure. It’s not enough to know what a VLOOKUP does; the model needs to understand why it’s being used and what its output signifies in the broader financial context.

Temporal Nature of Data

The variability of data types within a single model further complicates matters. Each data type comes with its own rules and conventions. An LLM must correctly identify and interpret these different data types, especially when they’re used together in calculations. For instance, Is a number a dollar amount or a ratio? Is it a date at the end of a fiscal year or a loan maturity? These distinctions are second nature to a human financial analyst but require sophisticated understanding from an AI model.

Moreover, one of the most challenging aspects for LLMs is dealing with the temporal nature of financial models. Many spreadsheets are structured with columns representing different time periods – months, quarters, or years. Understanding these temporal relationships and how they affect calculations, such as year-over-year growth or cumulative totals, is crucial. An LLM needs to develop a sense of financial time, recognizing how past, present, and projected future data interact within the model.

Size and Added Feature Complexity

As we delve deeper into the world of financial modeling, we uncover even more complex features that push the limits of current LLM capabilities. Named ranges, custom functions, circular references, and scenario analyses are just a few of the advanced Excel techniques that are commonplace in sophisticated financial models. Each of these features requires not just technical understanding but also an appreciation of their purpose and implications in financial analysis.

The sheer scale of many financial models presents another significant hurdle. It’s not uncommon for these spreadsheets to contain thousands of rows of data across multiple sheets. This volume of information stretches the processing capabilities of LLMs, which often struggle with very long input sequences. How can an AI model maintain context and accuracy when dealing with such vast amounts of interconnected data?

In essence, interpreting a financial Excel model requires more than just processing rows and columns of data. It demands an understanding of financial principles, Excel-specific features, industry conventions, and the ability to follow complex logical and mathematical relationships across a multidimensional structure. For LLMs, mastering this challenge is not just about improving their ability to handle structured data – it’s about developing a form of financial literacy and Excel fluency that rivals that of human experts.

In our next set of articles we are going to navigate how we address each of the above (at least partially) for LLMs. Stay tuned!