Python for social scientists

I’ve recently started getting into Python programming for social science, and I’m finding it particularly helpful as a way to formalize and organize some of the analytical steps that I previously left undocumented and unreproducible. It’s also very empowering to discover how quickly you can learn to make computers work for you in your analytical work. The natural choice for most researchers in political science and sociology is the R programming language. R is unbeaten when it comes to state-of-the-art statistical calculations, but I think Python is the better choice for me for three main reasons: (1) webscraping, (2) natural language processing, and (3) simplicity and readability.

Python basics

For the benefit of others, I’ve collected some of the packages, guides, and courses here that I’ve found to be especially helpful in getting to grips with Python and what it can do for my analytical work. For learning the basics, I recommend DataCamp and their Data Scientist track. It’s important to keep in mind that Python is a general purpose programming language and it can be used for everything from building websites and games to data analysis. DataCamp focuses on Python for data science, which is why this is my pick. They also employ a great mix of short vidoes and lots of exercises that really works for me.

In terms of tools for actual coding and analysis, I think the Jupyter Notebook IDE provides the best combination of ease-of-use, a great environment for exploratory data analysis, literate programming (which is really wonderful for non-computer-science trained people like myself), and shareability.

Natural language processing

There are a ton of resources for doing natural language processing (NLP) in Python, especially thanks to the Natural Language Toolkit (NLTK), which has been in development since 2001. For getting text from the web, the Requests and BeautifulSoup packages are beyond compare. I’m also looking forward to getting into Gensim and SpaCy for topic modeling and other text analysis tasks. NLP in Python has really benefited from what seems like a growing adoption of the language within history and the humanities. There are some really great guides available at the Programming Historian, as well as Folgert Karsdorp’s interactive course written in Jupyter Notebooks.

Network analysis

The NetworkX package is the go-to network analysis package in Python it seems. It’s very easy to use, quite powerful, and under rapid development. NetworkX prioritizes graph manipulation and analysis instead of visualization. DataCamp has two good courses on Network analysis in Python by Eric Ma, who is also the maintainer of the nxviz package for rational network visualizations, such as arc, hive, and circos plots. For other visualizations, I think NetworkX integrates very well with matplotlib, as well as supporting exports to Gephi.

Written on November 14, 2017