描述
开 本: 16开纸 张: 胶版纸包 装: 平装-胶订是否套装: 否国际标准书号ISBN: 9787564169060
内容简介
机器学习已经成为许多商业应用和研究项目的一个组成部分,同时拥有广泛研究团队的大型公司也投入到这个领域。如果你使用Python,即使是初学者,本书也将教你如何构建自己的机器学习解决方案。有了目前可用的丰富数据,机器学习应用程序只受限于你的想象力。
你将学习使用Python和scikit-learn库所需的全部步骤来创建成功的机器学习应用程序。《Python
机器学习入门(影印版)(英文版)》作者安德烈亚斯· 穆勒、莎拉·圭多专注于使用机器学习算法的实践方面,而不会过多讨论其背后的数学原理。熟悉NumPy
和matplotlib库将有助于你从本书中获得*多信息。
有了这本书,你会学到: 机器学习的基本概念和应用程序 各种广泛使用的机器学习算法的优点和缺点
如何呈现通过机器学习处理后的数据,包括需要关注的数据方面 于模型评估和参数调整的**方法 用于连接模型和封装工作流的管道的概念
处理文本数据的方法,包括特定于文本的处理技术 改善你的机器学习和数据科学技能的建议
你将学习使用Python和scikit-learn库所需的全部步骤来创建成功的机器学习应用程序。《Python
机器学习入门(影印版)(英文版)》作者安德烈亚斯· 穆勒、莎拉·圭多专注于使用机器学习算法的实践方面,而不会过多讨论其背后的数学原理。熟悉NumPy
和matplotlib库将有助于你从本书中获得*多信息。
有了这本书,你会学到: 机器学习的基本概念和应用程序 各种广泛使用的机器学习算法的优点和缺点
如何呈现通过机器学习处理后的数据,包括需要关注的数据方面 于模型评估和参数调整的**方法 用于连接模型和封装工作流的管道的概念
处理文本数据的方法,包括特定于文本的处理技术 改善你的机器学习和数据科学技能的建议
目 录
Preface 1. Introduction Why Machine Learning? Problems Machine Learning Can Solve Knowing Your Task and Knowing Your Data Why Python? scikit-learn Installing scikit-learn Essential Libraries and Tools Jupyter Notebook NumPy SciPy matplotlib pandas mglearn Python 2 Versus Python 3 Versions Used in this Book A First Application: Classifying Iris Species Meet the Data Measuring Success: Training and Testing Data First Things First: Look at Your Data Building Your First Model: k-Nearest Neighbors Making Predictions Evaluating the Model Summary and Outlook 2. Supervised Learning Classification and Regression Generalization, Overfitting, and Underfitting Relation of Model Complexity to Dataset Size Supervised Machine Learning Algorithms Some Sample Datasets k-Nearest Neighbors Linear Models Naive Bayes Classifiers Decision Trees Ensembles of Decision Trees Kernelized Support Vector Machines Neural Networks (Deep Learning) Uncertainty Estimates from Classifiers The Decision Function Predicting Probabilities Uncertainty in Multiclass Classification Summary and Outlook 3. Unsupervised Learning and Preprocessing Types of Unsupervised Learning Challenges in Unsupervised Learning Preprocessing and Scaling Different Kinds of Preprocessing Applying Data Transformations Scaling Training and Test Data the Same Way The Effect of Preprocessing on Supervised Learning Dimensionality Reduction, Feature Extraction, and Manifold Learning Principal Component Analysis (PCA) Non-Negative Matrix Factorization (NMF) Manifold Learning with t-SNE Clustering k-Means Clustering Agglomerative Clustering DBSCAN Comparing and Evaluating Clustering Algorithms Summary of Clustering Methods Summary and Outlook 4. Representing Data and Engineering Features Categorical Variables One-Hot-Encoding (Dummy Variables) Numbers Can Encode Categoricals Binning, Discretization, Linear Models, and Trees Interactions and Polynomials Univariate Nonlinear Transformations Automatic Feature Selection Univariate Statistics Model-Based Feature Selection Iterative Feature Selection Utilizing Expert Knowledge Summary and Outlook 5. Model Evaluation and Improvement Cross-Validation Cross-Validation in scikit-learn Benefits of Cross-Validation Stratified k-Fold Cross-Validation and Other Strategies Grid Search Simple Grid Search The Danger of Overfitting the Parameters and the Validation Set Grid Search with Cross-Validation Evaluation Metrics and Scoring Keep the End Goal in Mind Metrics for Binary Classification Metrics for Multiclass Classification Regression Metrics Using Evaluation Metrics in Model Selection Summary and Outlook 6. Algorithm Chains and Pipelines Parameter Selection with Preprocessing Building Pipelines Using Pipelines in Grid Searches The General Pipeline Interface Convenient Pipeline Creation with make_pipeline Accessing Step Attributes Accessing Attributes in a Grid-Searched Pipeline Grid-Searching Preprocessing Steps and Model Parameters Grid-Searching Which Model To Use Summary and Outlook 7. Working with Text Data Types of Data Represented as Strings Example Application: Sentiment Analysis of Movie Reviews Representing Text Data as a Bag of Words Applying Bag-of-Words to a Toy Dataset Bag-of-Words for Movie Reviews Stopwords Rescaling the Data with tf-idf Investigating Model Coefficients Bag-of-Words with More Than One Word (n-Grams) Advanced Tokenization, Stemming, and Lemmatization Topic Modeling and Document Clustering Latent Dirichlet Allocation Summary and Outlook 8. Wrapping Up Approaching a Machine Learning Problem Humans in the Loop From Prototype to Production Testing Production Systems Building Your Own Estimator Where to Go from Here Theory Other Machine Learning Frameworks and Packages Ranking, Recommender Systems, and Other Kinds of Learning Probabilistic Modeling, Inference, and Probabilistic Programming Neural Networks Scaling to Larger Datasets Honing Your Skills Conclusion Index
评论
还没有评论。