Peter Mucha’s Rendering of Wayne Zachary’s Karate Club Example
Duke and UNC jointly hosted the 2012 Meeting of the Society for Political Methodology (“PolMeth”) this past weekend. I had the pleasure of attending, and it ranked highly among my limited conference experiences. Below I present the papers and posters that were interesting to me, in the order that I saw/heard them. A full program of the meeting can be found here.
First up was Scott de Marchi‘s piece on “Statistical Tests of Bargaining Models.” (Full disclosure: Scott and most of his coauthors are good friends of mine.) Unfortunately there’s no online version of the paper at the moment, but the gist of it is that calculating minimum integer weights (MIW) for the bargaining power of parties in coalition governments has been done poorly in the past. The paper uses a nice combination of computational, formal, and statistical methods to substantially improve on previous bargaining models.
Next I saw a presentation by Jake Bowers and Mark Fredrickson on their paper (with Costas Panagopoulos) entitled “Interference is Interesting: Statistical Inference for Interference in Social Net- work Experiments” (pdf). The novelty of this project–at least to me–was viewing a treatment as a vector. For example, given units of interest (a,b,c), the treatment vector (1,0,1) might have different effects on a than (1,1,0) due to network effects. In real-world terms, this could be a confounder for an information campaign when treated individuals tell their control group neighbors about what they heard, biasing the results.
The third paper presentation I attended was “An Alternative Solution to the Heckman Selection Problem: Selection Bias as Functional Form Misspecification” by Curtis Signorino and Brenton Kenkel. This paper presents a neat estimation strategy when only one stage of data has been/can be collected for a two-stage decision process. The downside is that estimating parameters for a k-order Taylor series expansion with n variables grows combinatorically, so a lot of observations are necessary.* Arthur Spirling, the discussant for this panel, was my favorite discussant of the day for his helpful critique of the framing of the paper.
Thursday’s plenary session was a talk by Peter Mucha of the UNC Math Department on “Community Detection in Multislice Networks.” This paper introduced me to the karate club example, the voter model, and some cool graphs (see above).
At the evening poster session, my favorite was Jeffrey Arnold‘s “Pricing the Costly Lottery: Financial Market Reactions to Battlefield Events in the American Civil War.” The project compares the price of gold in Confederate graybacks and Union greenbacks throughout the Civil War as they track battlefield events. As you can probably guess, the paper has come cool data. My other favorite was Scott Abramson‘s labor intensive maps for his project “Production, Predation and the European State 1152–1789.”
I’ll discuss the posters and papers from Friday in tomorrow’s post.
*Curtis Signorino sends along a response, which I have abridged slightly here:
Although the variables (and parameters) grow combinatorically, the method we use is actually designed for problems where you have more regressors/parameters than observations in the data. That’s obviously a non-starter with traditional regression techniques. The underlying variable selection techniques we use (adaptive lasso and SCAD) were first applied to things like trying to find which of thousands of genetic markers might be related to breast cancer. You might only have 300 or a 1000 women in the data, but 2000-3000 genetic markers (which serve as regressors). The technique can find the one or two genetic markers associated with cancer onset. We use it to pick out the polynomial terms that best approximate the unknown functional relationship. Now, it likely won’t work well with N=50 and thousands of polynomial terms. However, it tends to work just fine with the typical numbers of regressors in poli sci articles and as little as 500-1000 observations. The memory problem I mentioned during the discussion actually occurred when we were running it on an IR dataset with something like 400,000 observations. The expanded set of polynomials required a huge amount of memory. So, it was more a memory storage issue due to having too many observations. But that will become a non-issue as memory gets cheaper, which it always does.
This is a helpful correction, and perhaps I should have pointed out that there was a fairly thorough discussion of this point during the panel. IR datasets are indeed growing rapidly, and this method helps avoid an almost infinite iteration of “well, what about the previous stage…?” questions that reviewers could pose.