Thought Leadership

LLMs and the Data Stack

Written by Cervin Ventures | 27 June 2023

 

 

LLMs allow enterprises to be more nimble in leveraging data, and make better business decisions, to gain competitive advantages. They have the potential to change the architecture of data stacks and have implications on data lineage, data quality, data security and observability. While this poses a threat to companies who address these issues using “pre-LLM” technology, it creates an opportunity for founders to re-think solutions leveraging the power of LLMs. 

 

A modern data stack is a set of tools and technologies used by organizations to store, process, and analyze data. It typically includes cloud-based platforms, databases optimized for specific data types such as No-SQL databases, graph databases, and more recently, vector databases. These databases are complemented by tools for cataloging, lineage and quality, governance, and observability. Analytical and business intelligence tools use these databases and tools to deliver insights. 

 

LLMs, on the other hand, are pre-trained artificial intelligence models capable of understanding human language at a large scale and complexity. They have enormous capabilities in terms of automation of tasks, drawing inferences, and generating documents. They can be used to drive business decisions. Combining the power of LLMs with modern data platforms, therefore, has huge promise.

 

The integration of LLMs into modern data stacks has led to more advanced natural language processing, better customer experiences, and improved business outcomes. LLMs have made it easier to process and analyze large amounts of data quickly and accurately, freeing up analysts and data scientists to focus on higher-level analysis and decision-making. They have also improved the accuracy of natural language processing and text analytics, enabling organizations to extract meaning from unstructured data sources such as social media and customer feedback. Using LLMs to query databases using natural language allows business users who may not know SQL to use the power of data in making business decisions. Additionally, LLMs have improved the quality of predictive models by identifying correlations and patterns that may not be apparent to human analysts.

 

The integration of LLMs is bringing significant changes in the architecture of modern data stacks. Enterprises are looking for ways to use LLMs to improve data quality, cleaning, and pre-processing. To use the full power of LLMs requires an adjustment to the underlying databases used to support LLMs. One such database is the emerging field of vector databases. Vector databases can be used as repositories of inferences from LLMs. Another use of vector databases is to use AI to prevent and stop security attacks. It is still unclear whether a vector database would represent a full-blown forking in the long-term, given there are parallel and somewhat contradicting shifts in the broader database segment.