Python for Data Analysis, 2e: Data Wrangling with Pandas, Numpy, and Ipython Paperback – 3 November 2017
|New from||Used from|
Frequently bought together
About the Author
What Is This Book About?
This book is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. My goal is to offer a guide to the parts of the Python programming language and its data-oriented library ecosystem and tools that will equip you to become an effective data analyst. While 'data analysis' is in the title of the book, the focus is specifically on Python programming, libraries, and tools as opposed to data analysis methodology. This is the Python programming you need for data analysis.
New for the Second Edition
The first edition of this book was published in 2012, during a time when open source data analysis libraries for Python (such as pandas) were very new and developing rapidly. In this updated and expanded second edition, I have overhauled the chapters to account both for incompatible changes and deprecations as well as new features that have occurred in the last five years. I’ve also added fresh content to introduce tools that either did not exist in 2012 or had not matured enough to make the first cut. Finally, I have tried to avoid writing about new or cutting-edge open source projects that may not have had a chance to mature. I would like readers of this edition to find that the content is still almost as relevant in 2020 or 2021 as it is in 2017.
The major updates in this second edition include:
- All code, including the Python tutorial, updated for Python 3.6 (the first edition used Python 2.7)
- Updated Python installation instructions for the Anaconda Python Distribution and other needed Python packages
- Updates for the latest versions of the pandas library in 2017
- A new chapter on some more advanced pandas tools, and some other usage tips
- A brief introduction to using statsmodels and scikit-learn
- I also reorganized a significant portion of the content from the first edition to make the book more accessible to newcomers.
Review this product
1 customer review
There was a problem filtering reviews right now. Please try again later.
Most helpful customer reviews on Amazon.com
This book's problem is the classic curse of knowledge. The author does not know what it's like to get started with pandas and what are the difficulties users will have.
Overall, this book provides a jumping off point in understanding the capabilities of pandas as well as its strengths, but it wasn't terribly useful in even basic data science workflow and concepts. For that, I highly recommend something like Hadley Wickham's "R for Data Science," which is much more approachable and rewarding in its use of example datasets, its more personable writing style, and its outlining of good practices for data science.
This book primarily focuses on the pandas Python library, which is awesome at processing and organizing data (Python pandas is like MS Excel times 100. This is not an exaggeration). It also introduces the reader into numpy (lower level number crunching and arrays), matplotlib (data visualizations), scikitlearn (machine learning), and other useful data science libraries. The book contains other book recommendations for continuing education.
Although this would be a challenging book for a brand new Python user, I would still recommend it, especially if you are currently doing a lot of work in MS Excel and/ or exporting data from databases. I had a few false starts learning Python, and my biggest stumbling block was lack of application in what I was learning. This book puts practical tools in the reader's hands very quickly. I personally don't have time to make goofy games etc. that other books have used as practice examples. Despite other reviews criticizing the use of random data throughout the book, I found the examples easy to follow and useful. I would also argue that learning how to generate random data is useful in itself (thus the purpose of the numpy random library), and that there are practical examples throughout the book. Chapter 14 devoted to real-world data analysis examples.
I am almost finished with my second time through the book, this time working through every example. This book has been well worth the hours spent in it. For context, I previously relied on Excel, SQL, and some AutoHotKey. This book has significantly improved how I work.
Thanks, Wes and team.