Enroll for free now: https://bit.ly/4aRnn7Z Github Repo: https://github.com/HandsOnLLM/Hands-On-Large-Language-Models We're ecstatic to bring you "How Transformer LLMs Work" -- a free course with ~90 minutes of video, code, and crisp visuals and animations that explain the...
Enroll for free now: https://bit.ly/4aRnn7Z
Github Repo: https://github.com/HandsOnLLM/Hands-On-Large-Language-Models
We're ecstatic to bring you "How Transformer LLMs Work" -- a free course with ~90 minutes of video, code, and crisp visuals and animations that explain the modern Transformer architecture, tokenizers, embeddings, and mixture-of-expert models.
@MaartenGrootendorst and I have developed a lot of the visual language over the last several years (tens of thousands of iterations for hundreds of figures) for the book. This was informed by many incredible colleagues at Cohere, C4AI, and the open source and open science ML community. But to have an opportunity to collaborate with the legendary Andrew Ng and the team at @Deeplearningai we took them to the next level with animations and a concise narrative meant to enable technical learners to pick up an ML paper and understand the architecture description.
In this course, you'll learn how a transformer network architecture that powers LLMs works. You'll build the intuition of how LLMs process text and work with code examples that illustrate the key components of the transformer architecture.
Key topics covered in this course include:
The evolution of how language has been represented numerically, from the Bag-of-Words model through Word2Vec embeddings to the transformer architecture that captures word meanings in full context.
How LLM inputs are broken down into tokens, which represent words or pieces before they are sent to the language model.
The details of a transformer and the three main stages, consisting of tokenization and embedding, the stack of transformer blocks, and the language model head.
The details of the transformer block, including attention, which calculates relevance scores followed by the feedforward layer, which incorporates stored information learned in training.
How cached calculations make transformers faster, how the transformer block has evolved over the years since the original paper was released, and how they continue to be widely used.
Explore an implementation of recent models in the Hugging Face transformer library.
By the end of this course, you’ll have a deep understanding of how LLMs process language and you'll be able to read through papers describing models and understand the details that are used to describe these architectures. This intuition will help improve your approach to building LLM applications.
The SWE-bench task measures AI agents on software engineering tasks at the level of a github issue. It was one of the most important tasks measuring the progress of agents tackling software engineering tasks in 2024. We caught up with two of its creators, Ofir Press and...
The SWE-bench task measures AI agents on software engineering tasks at the level of a github issue. It was one of the most important tasks measuring the progress of agents tackling software engineering tasks in 2024. We caught up with two of its creators, Ofir Press and Carlos E. Jimenez, to share their ideas on the state of LLM-backed agents.
---
Check out our book: https://www.llm-book.com/
Mailing List: https://newsletter.languagemodels.co/
Bluesky: https://bsky.app/profile/jayalammar.bsky.social
Twitter: https://twitter.com/JayAlammar
Blog: https://jalammar.github.io/
Tool use is a method whichs allows developers to connect Cohere's Command models to external tools like search engines, APIs, databases, and other software tools. Just like how Retrieval-Augmented Generation (RAG) allows a model to use an external data source to improve...
Tool use is a method whichs allows developers to connect Cohere's Command models to external tools like search engines, APIs, databases, and other software tools. Just like how Retrieval-Augmented Generation (RAG) allows a model to use an external data source to improve factual generation, tool use is a capability that allows retrieving data from multiple sources. But it goes beyond simply retrieving information and is able to use software tools to execute code, or even create entries in a CRM system.
In this video, we'll see how we can use two tools to create a simple data analyst agent that is able to search the web and run code in a python interpreter. This agent uses Cohere's Command R+ mode and Langchain.
Find the code here at Colab:
https://colab.research.google.com/github/cohere-ai/notebooks/blob/main/notebooks/agents/Data_Analyst_Agent_Cohere_and_Langchain.ipynb
Github: https://github.com/cohere-ai/notebooks/blob/main/notebooks/agents/Data_Analyst_Agent_Cohere_and_Langchain.ipynb
LangChain Tools: https://python.langchain.com/docs/integrations/tools/
Tokenizers are one of the key components of Large Language Models (LLMs). One of the best ways to understand what they do, is to compare the behavior of different tokenizers. In this video, Jay takes a carefully crafted piece of text (that contains English, code, indentation,...
Tokenizers are one of the key components of Large Language Models (LLMs). One of the best ways to understand what they do, is to compare the behavior of different tokenizers. In this video, Jay takes a carefully crafted piece of text (that contains English, code, indentation, numbers, emoji, and other languages) and passes it through different trained tokenizers to reveal what they succeed and fail at encoding, and the different design choices for different tokenizers and what they say about their respective models.
---
Contents:
0:00 Introduction
1:25 The carefully polished text to test tokenizers
2:19 BERT Uncased
3:59 BERT Cased
4:29 GPT-2
6:00 FLAN-T5
7:00 GPT-4
9:24 Starcoder
21:31 Galactica
---
Twitter: https://twitter.com/JayAlammar
Blog: https://jalammar.github.io/
Mailing List: https://jayalammar.substack.com/
Access the Early Release version of the book with a 30-day free trial of the O'Reilly learning platform: https://learning.oreilly.com/get-learning/?code=HOLLM23 [The formatting for the tokenization chapter is still a work-in-progress, but the video gives you a better look at the approach]
Despite processing internet-scale text data, large language models never see words as we do. Yes, they consume text, but another piece of software called a tokenizer is what actually takes in the text and translates it into a different format that the language model actually...
Despite processing internet-scale text data, large language models never see words as we do. Yes, they consume text, but another piece of software called a tokenizer is what actually takes in the text and translates it into a different format that the language model actually operates on. In this video, Jay goes examines a language model tokenizer to give you a sense of how they work.
Follow our upcoming book, Hands-On Large Language Models, for more details about tokenizers and LLMs in general.
Updates on the book coming on https://jayalammar.substack.com/
My co-author: https://twitter.com/MaartenGr / https://maartengrootendorst.substack.com/
Early access on https://www.oreilly.com/library/view/hands-on-large-language/9781098150952/
---
Twitter: https://twitter.com/JayAlammar
Blog: https://jalammar.github.io/
Mailing List: https://jayalammar.substack.com/
---
0:00 Introduction
0:41 We're writing: Hands-On Large Language Models
1:13 Generating text with ChatGPT Cohere Command
2:42 Looking at the generation code
5:03 What is the actual input to a language model?
7:14 What is the actual output of a language model generate?
7:50 The tokenizer's lookup table and embeddings inside a model
9:07 Looking at the model, tokenizer
12:27 Summary
Tools like Langchain (https://github.com/hwchase17/langchain/tree/master/langchain) help you build applications on top of large language models. #shorts
Tools like Langchain (https://github.com/hwchase17/langchain/tree/master/langchain) help you build applications on top of large language models. #shorts
Over a decade ago, the phrase “software is eating the world” described how software was rapidly becoming the center of many industries beyond the technology sector. The leading book retailers, video services providers, music companies, entertainment companies, and even movie...
Over a decade ago, the phrase “software is eating the world” described how software was rapidly becoming the center of many industries beyond the technology sector. The leading book retailers, video services providers, music companies, entertainment companies, and even movie production companies were essentially software companies.
That trend is still going strong.
In this video, Jay shares observations on the value in the AI technology stack and focuses on where some of the technical moats might be.
The previous video in this series (https://www.youtube.com/watch?v=AeW9r3lopp0) discussed 4 major points about useful perspectives on generative AI. Here we continue the series with points 5-8.
Blog post: https://txt.cohere.com/ai-is-eating-the-world/
---
Twitter: https://twitter.com/JayAlammar
Blog: https://jalammar.github.io/
Mailing List: https://jayalammar.substack.com/
--
0:00 Introduction
3:44 5) Maps and Landscapes of AI Technology and Value Stacks
7:16 6) Enterprises: Plan Not for One, but Thousands of AI Touchpoints in Your Systems
8:46 7) Account for the Many Descendants and Iterations of a Foundation Model
16:01 8) Model Usage Datasets Allow Collective Exploration of a Model’s Generative Space
What's the big deal with Generative AI? Is it the future or the present? In this video, Jay goes over four key reflections on how best to think of the current state of AI products and features, and avoid pitfalls people tend to make with new tech. Blog post:...
What's the big deal with Generative AI? Is it the future or the present? In this video, Jay goes over four key reflections on how best to think of the current state of AI products and features, and avoid pitfalls people tend to make with new tech.
Blog post: https://txt.cohere.ai/generative-ai-future-or-present/
What is Neural Search? Nils Reimers - Sentence Transformers and Embedding Evaluation https://www.youtube.com/watch?v=Z_4rohX4Ki8&ab_channel=Cohere
---
Twitter: https://twitter.com/JayAlammar
Blog: https://jalammar.github.io/
Mailing List: https://jayalammar.substack.com/
--
0:00 Introduction
1:04 1- Recent AI developments are awe-inspiring and promise to change the world. But when?
5:47 2- Make a distinction between impressive 🍒 cherry-picked demos, and reliable use cases that are ready for the marketplace
6:35 3- Think of models as components of intelligent systems, not minds
7:56 4- Generative AI alone is only the tip of the iceberg
Learn how AI image generation works. This video goes over the AI components of AI image generation models like Stable Diffusion and explains how they work and how they're trained. Blog post: https://jalammar.github.io/illustrated-stable-diffusion/ --- Twitter:...
Learn how AI image generation works. This video goes over the AI components of AI image generation models like Stable Diffusion and explains how they work and how they're trained.
Blog post: https://jalammar.github.io/illustrated-stable-diffusion/
---
Twitter: https://twitter.com/JayAlammar
Blog: https://jalammar.github.io/
Mailing List: https://jayalammar.substack.com/
--
Introduction (0:00)
Text-to-image and image-to-image (1:32)
The components of Stable Diffusion - high-level overview (3:06)
The three models inside the AI Image Generator (5:48)
Generating images with reverse diffusion (8:36)
Images emerging from noise (11:09)
How the model is trained. 1 - Diffusion (12:46)
How the model is trained. 2 - Compression (17:44)
The importance of language models for image generation (20:43)
How CLIP is trained (training on both text and images) (22:55)
Guiding image generation with text prompts (25:57)
Conclusion (28:07)
This is a version of the intro cinematic to an old video game (Nemesis 2 on the MSX system). The graphics are remade using AI Generated images. Read more about the process in: https://jalammar.github.io/ai-image-generation-tools/ Original:...
This is a version of the intro cinematic to an old video game (Nemesis 2 on the MSX system). The graphics are remade using AI Generated images. Read more about the process in:
https://jalammar.github.io/ai-image-generation-tools/
Original: https://www.youtube.com/watch?v=nWdTIHpUORE&ab_channel=Zebpro