Python for Data Science: A Hands-On Introduction / Python для науки о данных: Практическое введение
Год издания: 2022
Автор: Vasiliev Yuli / Васильев Юлий
Издательство: No Starch Press
ISBN: 978-1-7185-0221-5
Язык: Английский
Формат: PDF (Not True), EPUB
Качество: Издательский макет или текст (eBook)
Интерактивное оглавление: Да
Количество страниц: 303
Описание:
A hands-on, real-world introduction to data analysis with the Python programming language, loaded with wide-ranging examples.
Python is an ideal choice for accessing, manipulating, and gaining insights from data of all kinds. Python for Data Science introduces you to the Pythonic world of data analysis with a learn-by-doing approach rooted in practical examples and hands-on activities. You’ll learn how to write Python code to obtain, transform, and analyze data, practicing state-of-the-art data processing techniques for use cases in business management, marketing, and decision support.
You will discover Python’s rich set of built-in data structures for basic operations, as well as its robust ecosystem of open-source libraries for data science, including NumPy, pandas, scikit-learn, matplotlib, and more. Examples show how to load data in various formats, how to streamline, group, and aggregate data sets, and how to create charts, maps, and other visualizations. Later chapters go in-depth with demonstrations of real-world data applications, including using location data to power a taxi service, market basket analysis to identify items commonly purchased together, and machine learning to predict stock prices.
Практическое, реальное введение в анализ данных с помощью языка программирования Python, снабженное широкими примерами.
Python - идеальный выбор для доступа к данным всех видов, управления ими и получения информации о них. Книга знакомит вас с миром анализа данных на Pythonic с помощью подхода "обучение на практике", основанного на практических примерах и практических занятиях. Вы узнаете, как писать код на Python для получения, преобразования и анализа данных, применяя самые современные методы обработки данных для использования в управлении бизнесом, маркетинге и поддержке принятия решений.
Вы познакомитесь с богатым набором встроенных структур данных Python для базовых операций, а также с его надежной экосистемой библиотек с открытым исходным кодом для науки о данных, включая NumPy, pandas, scikit-learn, matplotlib и другие. Примеры показывают, как загружать данные в различных форматах, как упорядочивать, группировать и объединять наборы данных, а также как создавать диаграммы, карты и другие средства предоставления информации. В последующих главах подробно демонстрируются приложения для обработки данных в реальном мире, включая использование данных о местоположении для управления службой такси, анализ рыночной корзины для определения товаров, которые обычно покупаются вместе, и машинное обучение для прогнозирования цен на акции.
Оглавление
TITLE PAGE
COPYRIGHT
ABOUT THE AUTHOR
INTRODUCTION
Using Python for Data Science
Who Should Read This Book?
What’s in the Book?
CHAPTER 1: THE BASICS OF DATA
Categories of Data
Unstructured Data
Structured Data
Semistructured Data
Time Series Data
Sources of Data
APIs
Web Pages
Databases
Files
The Data Processing Pipeline
Acquisition
Cleansing
Transformation
Analysis
Storage
The Pythonic Way
Summary
CHAPTER 2: PYTHON DATA STRUCTURES
Lists
Creating a List
Using Common List Object Methods
Using Slice Notation
Using a List as a Queue
Using a List as a Stack
Using Lists and Stacks for Natural Language Processing
Making Improvements with List Comprehensions
Tuples
A List of Tuples
Immutability
Dictionaries
A List of Dictionaries
Adding to a Dictionary with setdefault()
Loading JSON into a Dictionary
Sets
Removing Duplicates from Sequences
Performing Common Set Operations
Exercise #1: Improved Photo Tag Analysis
Summary
CHAPTER 3: PYTHON DATA SCIENCE LIBRARIES
NumPy
Installing NumPy
Creating a NumPy Array
Performing Element-Wise Operations
Using NumPy Statistical Functions
Exercise #2: Using NumPy Statistical Functions
pandas
pandas Installation
pandas Series
Exercise #3: Combining Three Series
pandas DataFrames
Exercise #4: Using Different Joins
scikit-learn
Installing scikit-learn
Obtaining a Sample Dataset
Loading the Sample Dataset into a pandas DataFrame
Splitting the Sample Dataset into a Training Set and a Test
Set
Transforming Text into Numerical Feature Vectors
Training and Evaluating the Model
Making Predictions on New Data
Summary
CHAPTER 4: ACCESSING DATA FROM FILES AND APIS
Importing Data Using Python’s open() Function
Text Files
Tabular Data Files
Exercise #5: Opening JSON Files
Binary Files
Exporting Data to Files
Accessing Remote Files and APIs
How HTTP Requests Work
The urllib3 Library
The Requests Library
Exercise #6: Accessing an API with Requests
Moving Data to and from a DataFrame
Importing Nested JSON Structures
Converting a DataFrame to JSON
Exercise #7: Manipulating Complex JSON Structures
Loading Online Data into a DataFrame with pandasdatareader
Summary
CHAPTER 5: WORKING WITH DATABASES
Relational Databases
Understanding SQL Statements
Getting Started with MySQL
Defining the Database Structure
Inserting Data into the Database
Querying Database Data
Exercise #8: Performing a One-to-Many Join
Using Database Analytics Tools
NoSQL Databases
Key-Value Stores
Document-Oriented Databases
Exercise #9: Inserting and Querying Multiple
Documents
Summary
CHAPTER 6: AGGREGATING DATA
Data to Aggregate
Combining DataFrames
Grouping and Aggregating the Data
Viewing Specific Aggregations by MultiIndex
Slicing a Range of Aggregated Values
Slicing Within Aggregation Levels
Adding a Grand Total
Adding Subtotals
Exercise #10: Excluding Total Rows from the
DataFrame
Selecting All Rows in a Group
Summary
CHAPTER 7: COMBINING DATASETS
Combining Built-in Data Structures
Combining Lists and Tuples with +
Combining Dictionaries with **
Combining Corresponding Rows from Two Structures
Implementing Different Types of Joins for Lists
Concatenating NumPy Arrays
Exercise #11: Adding New Rows/Columns to a NumPy
Array
Combining pandas Data Structures
Concatenating DataFrames
Joining Two DataFrames
Summary
CHAPTER 8: CREATING VISUALIZATIONS
Common Visualizations
Line Graphs
Bar Graphs
Pie Charts
Histograms
Plotting with Matplotlib
Installing Matplotlib
Using matplotlib.pyplot
Working with Figure and Axes Objects
Exercise #12: Combining Bins into an “Other” Slice
Using Other Libraries with Matplotlib
Plotting pandas Data
Plotting Geospatial Data with Cartopy
Exercise #13: Drawing a Map with Cartopy and
Matplotlib
Summary
CHAPTER 9: ANALYZING LOCATION DATA
Obtaining Location Data
Turning a Human-Readable Address into Geo Coordinates
Getting the Geo Coordinates of a Moving Object
Spatial Data Analysis with geopy and Shapely
Finding the Closest Object
Finding Objects in a Certain Area
Exercise #14: Defining Two or More Polygons
Combining Both Approaches
Exercise #15: Further Improving the Pick-Up
Algorithm
Combining Spatial and Nonspatial Data
Deriving Nonspatial Attributes
Exercise #16: Filtering Data with a List
Comprehension
Joining Spatial and Nonspatial Datasets
Summary
CHAPTER 10: ANALYZING TIME SERIES DATA
Regular vs. Irregular Time Series
Common Time Series Analysis Techniques
Calculating Percentage Changes
Rolling Window Calculations
Calculating the Percentage Change of a Rolling Average
Multivariate Time Series
Processing Multivariate Time Series
Analyzing Dependencies Between Variables
Exercise #17: Adding More Metrics to Analyze
Dependencies
Summary
CHAPTER 11: GAINING INSIGHTS FROM DATA
Association Rules
Support
Confidence
Lift
The Apriori Algorithm
Creating a Transaction Dataset
Identifying Frequent Itemsets
Generating Association Rules
Visualizing Association Rules
Gaining Actionable Insights from Association Rules
Generating Recommendations
Planning Discounts Based on Association Rules
Exercise #18: Mining Real Transaction Data
Summary
CHAPTER 12: MACHINE LEARNING FOR DATA ANALYSIS
Why Machine Learning?
Types of Machine Learning
Supervised Learning
Unsupervised Learning
How Machine Learning Works
Data to Learn From
A Statistical Model
Previously Unseen Data
A Sentiment Analysis Example: Classifying Product Reviews
Obtaining Product Reviews
Cleansing the Data
Splitting and Transforming the Data
Training the Model
Evaluating the Model
Exercise #19: Expanding the Example Set
Predicting Stock Trends
Getting Data
Deriving Features from Continuous Data
Generating the Output Variable
Training and Evaluating the Model
Exercise #20: Experimenting with Different Stocks and
New Metrics
Summary
INDEX