Evaluating Large Language Models (LLMs)
Год выпуска: February 2025
Производитель: Published by Pearson via O'Reilly Learning
Сайт производителя:
https://learning.oreilly.com/course/evaluating-large-language/9780135451922/
Автор: Sinan Ozdemir
Продолжительность: 7h 56m
Тип раздаваемого материала: Видеоурок
Язык: Английский + субтитры
Описание:
8 Hours of Video Instruction
Equips you with the knowledge and skills to assess LLM performance effectively
Evaluating Large Language Models (LLMs) introduces you to the process of evaluating LLMs, Multimodal AI, and AI-powered applications like agents and RAG. To fully utilize these powerful and often unwieldy AI tools and make sure they meet your real-world needs, they need to be assessed and evaluated. This video prepares you to evaluate and optimize LLMs so you can produce cutting edge AI applications.
Learn How To
• Distinguish between generative and understanding tasks
• Apply key metrics for common tasks
• Evaluate multiple-choice tasks
• Evaluate free text response tasks
• Evaluate embedding tasks
• Evaluate classification tasks
• Build an LLM classifier with BERT and ChatGPT
• Evaluate LLMs with benchmarks
• Probe LLMs
• Fine-tune LLMs
• Evaluate and clean data
• Evaluate AI agents
• Evaluate retrieval-augmented generation systems
• Evaluate a recommendation engine
• Use evaluation to combat AI drift
Who Should Take This Course
AI practitioners, machine learning engineers, and data scientists who want to systematically evaluate LLMs, optimize their performance, and ensure they meet real-world application needs.
Course Requirements
• Python 3 proficiency with some experience working in interactive Python environments including Notebooks (Jupyter/Google Colab/Kaggle Kernels)
• Comfortable using the Pandas or Transformers library and either Tensorflow or PyTorch
• Understanding of ML/deep learning fundamentals including train/test splits, loss/cost functions, and gradient descent
Содержание
Introduction
Lesson 1 Foundations of LLM Evaluation
Lesson 2 Evaluating Generative Tasks
Lesson 3 Evaluating Understanding Tasks
Lesson 4 Using Benchmarks Effectively
Lesson 5 Probing LLMs for a World Model
Lesson 6 Evaluating LLM Fine-Tuning
Lesson 7 Case Studies
Lesson 8 Summary of Evaluation and Looking Ahead
Summary
Файлы примеров: отсутствуют
Формат видео: MP4
Видео: AVC, 1280×720, 16:9, 30.000 fps, 3 000 kb/s (0.017 bit/pixel)
Аудио: AAC, 44.1 KHz, 2 channels, 128 kb/s, CBR
UPD: 2025-03-02 добавлены субтитры, описание и файл 003. 2.2 Evaluating Free Text Response Tasks.mp4 теперь поделен на две части.