
Modern data analysis is no longer limited to spreadsheets or SQL queries. We’re now working with diverse formats such as text, structured tables, images, graphs, audio, and even video. Large Language Models (LLMs) like GPT-4 are unlocking powerful new ways to make sense of this complexity.
Data Analysis with LLMs (affiliate link) is a hands-on guide by Immanuel Trummer, associate professor at Cornell University. Published by Manning Publications, it delivers data scientists, analysts, and developers with practical tools to analyze multimodal data using Python and modern language models.
In this review, I’ve highlighted 8 practical takeaways from the book that will shift how you think about working with modern data.
8 Key Insights from Data Analysis with LLMs
The book is divided into three parts: an introduction to LLMs, hands-on data analysis with Python, and advanced topics like model selection, cost optimization, and frameworks. Based on this structure, here are 8 key insights that stand out.
1. Discover the Full Power of Modern LLMs
Large language models are no longer limited to generating or summarizing text. Immanuel introduces their expanded capabilities across structured tables, documents, images, audio, video, and graph data. Whether you’re dealing with relational databases or raw image files, LLMs can now support analysis across a wide variety of data types. This flexibility makes them essential tools for modern data professionals who work with complex, multimodal datasets.
2. Build Real Projects with the OpenAI Python Library
The book provides practical guidance on using the OpenAI Python library to interact with language models through code. You’ll learn how to install the library, authenticate your API key, select the right model, and structure conversations using the chat.completions.create()
method.
It also introduces important configuration options to control model behavior, such as max_tokens
, temperature
, top_p
, stop
, logit_bias
, and repetition penalties. These parameters help developers fine-tune output quality, manage randomness, and generate consistent, reliable results across different use cases.
3. Extract Structured Insights from Raw Text
Working with raw text can be messy, but LLMs help turn that chaos into structured, usable data. The author explains how to classify documents, perform named entity recognition (NER), and cluster similar content using embedding vectors, all with Python and prompt design.
Each workflow is clearly explained, from prompt engineering to API calls and postprocessing. Whether you’re building a document tagging system or automating information extraction, you will see how LLMs apply zero-shot learning to handle complex NLP tasks without labeled data, making unstructured content ready for real-world analysis.
New to NLP? Start with Mastering NLP: A Journey Through Sentiment Analysis.
4. From Natural Language to Powerful Queries
The book demonstrates how LLMs, a form of NLP technology, can convert natural language into structured queries using SQL and Cypher. Through examples in PostgreSQL and Neo4j, you learn to build systems where users can ask questions like “Show top-selling products last month” and get accurate results without writing any code.
By using prompt design and automatic schema extraction, it enables intuitive data access across relational and graph databases. This makes exploration faster and more interactive for non-technical users.
5. Turn Image, Audio, and Video Data into Insights
This book goes beyond text by showing how to work with images, audio, and video using LLMs and related tools. You’ll learn how to tag people in photos, answer questions about image content, and even generate titles from video frames using OpenCV and GPT-4o. For audio, it covers transcription with Whisper, voice-to-query interaction, and speech-to-speech translation.
These hands-on examples teach you how to encode multimedia files, manage costs, and combine modalities into powerful prompts, turning unstructured media into usable, structured insights.
6. Explore LLM Options Beyond OpenAI
The author introduces key alternatives to OpenAI, including Claude by Anthropic, Command R+ by Cohere, Gemini by Google, and open-source models from Hugging Face. These models differ in output quality, cost, and access flexibility. Most providers offer Python libraries or APIs, making it easy to integrate them into your projects. This comparison helps you stay adaptable, explore multiple capabilities, and choose the right model based on your specific needs for performance, pricing, and privacy.
7. Balance Performance, Quality, and Cost
Working with LLMs at scale can become expensive fast, especially if you’re using powerful models for every task. Instead of defaulting to the largest option, a smarter approach involves selecting models based on the complexity of the task and using smaller models when possible. Prompt engineering and hyperparameter tuning can significantly boost performance without adding cost. For specialized use cases, fine-tuning a base model with a small, targeted dataset can reduce token usage and improve output quality. These optimization techniques are essential for balancing effectiveness with efficiency in real-world applications.
8. Build Scalable Systems with LangChain and LlamaIndex
For those ready to move beyond basic prompting, this book introduces how to build robust, scalable systems using LangChain and LlamaIndex. LangChain helps you create intelligent agents that can coordinate tools, handle workflows, and solve multi-step problems. LlamaIndex enables efficient indexing of documents, images, and other data types using embeddings, making it easy to retrieve and query custom knowledge bases. Together, these frameworks allow you to transform prototype-level ideas into production-ready AI systems.
What Made This Book Click for Me
What I liked most about the book is how clearly Immanuel explains each step. He doesn’t just talk about concepts. He guides you through real Python code and prompt templates with clear, line-by-line explanations. You’re not just copying code, you’re understanding what each part does and why. From unstructured text to structured data and natural language tools, the examples are practical and easy to follow. This makes the book especially helpful for data analysts and developers who want to apply LLMs in real projects.
Who This Book Is For
Data Analysis with LLMs is ideal for data scientists and analysts who want to integrate LLMs into their data workflows. It’s also a great fit for Python developers building AI-powered features and applications, as well as machine learning engineers working with structured, unstructured, or multimodal data.
Finally, if you’re a curious learner or professional who feels overwhelmed by LLMs, this book offers a structured, hands-on guide to help you get started with confidence.
Final Thoughts
Data Analysis with LLMs by Immanuel Trummer is one of the few books that clearly show how to apply large language models to real data workflows. It covers everything from structured and unstructured data to images, audio, and graphs, using hands-on Python examples that are ready to run and adapt.
The book also helps you compare different LLM providers, manage cost through smart prompt tuning, and scale systems using tools like LangChain and LlamaIndex. Whether you’re just starting with LLMs or building advanced pipelines, it gives you a clear and practical path forward.
Buy the Book
Data Analysis with LLMs (affiliate link) by Immanuel Trummer is available on Amazon and Manning Publications. If you’re ready to explore how LLMs can analyze text, tables, images, audio, and more, this book is a clear and practical guide worth having.
Disclosure: This article about Data Analysis with LLMs contains affiliate links. If you buy through these links, we may earn a small commission at no extra cost to you. It helps us keep creating free content on Noro Insight.