THE DATA ORGANISATION

Women in Politics | Exploratory Data Analysis in R

It all started in Montana. In 1916, Jeannette Rankin, a peace activist and a strong advocate for women’s suffrage, broke down centuries-long barriers by becoming the first woman elected to Congress. Since then, 366 women have served in U.S. Congress, and thousands more in various elected and executive offices at the state-level.

In this post, we will analyze the Women in Politics dataset published by the Eagleton Institute of Politics’ Center for American Women and Politics. CAWP, the nation’s leading source of scholarly research and data related to women’s political participation in the United States, has gathered tens of thousands of records of women holding political (both elected and appointed) office across the United States, dating back to the late 1800s, and made them all available in their centralized Women in Political Office Database. We will use this data along with a variety of exploratory and modeling techniques to answer the following questions:

  • How many women have held political office by level of Government?
  • Is one political party more apt to be represented by women?
  • Does race add a layer of complexity to gender representation for political office?
  • How long do women serve in office?
  • When will the U.S. House of Representatives and Senate achieve full gender parity?
  • Is there a geographic component to equitable gender representation? Are there more female officeholders in certain states?

Let’s start by applying some basic data cleaning/manipulation techniques, dealing with missing (NA) values, and touching up our dataset.

IdYearMin YearMax YearYears Of ServiceYears In OfficeFirst NameMiddle NameLast NameFull NameStateState AbbPartyLevelPositionRace EthnicityParty Grouped
11841h19811981199091981-1990BettyS.AaronBetty S. AaronGeorgiaGADemocratState LegislativeState RepresentativeUnavailableDemocrat
11841h19821981199091981-1990BettyS.AaronBetty S. AaronGeorgiaGADemocratState LegislativeState RepresentativeUnavailableDemocrat
11841h19831981199091981-1990BettyS.AaronBetty S. AaronGeorgiaGADemocratState LegislativeState RepresentativeUnavailableDemocrat
11841h19841981199091981-1990BettyS.AaronBetty S. AaronGeorgiaGADemocratState LegislativeState RepresentativeUnavailableDemocrat
11841h19851981199091981-1990BettyS.AaronBetty S. AaronGeorgiaGADemocratState LegislativeState RepresentativeUnavailableDemocrat
11841h19861981199091981-1990BettyS.AaronBetty S. AaronGeorgiaGADemocratState LegislativeState RepresentativeUnavailableDemocrat
11841h19871981199091981-1990BettyS.AaronBetty S. AaronGeorgiaGADemocratState LegislativeState RepresentativeUnavailableDemocrat
11841h19881981199091981-1990BettyS.AaronBetty S. AaronGeorgiaGADemocratState LegislativeState RepresentativeUnavailableDemocrat

After cleaning up the data a bit and imputing some new columns, we have our dataset! In it, one row corresponds to a woman having held a given political office (either elected or appointed) for one year. If a woman has held a given political office for, say, five years, they will appear as five separate rows in the dataset.

As part of my set-up, I took a look at the number of missing (null) values for each variable (field). Based on this, I removed the “District” column, which is entirely null in the dataset I downloaded, and thus provides us no value. Middle name, which is about 1/3 null, has been left in the dataset since it’s not an essential variable.

Now that we have our data, let’s start to explore it.

In absolute terms, most of the women who have held office have done so at the state legislative level, which makes sense given that there are more seats available at that level. Interestingly enough, in U.S. Territories/D.C., a lot of women outside the two major parties have held office.

This is great to see total numbers, but our data, depending on which level of government we’re talking about, can span 130 years! Let’s breakout our totals over time to look for trends.

Here are some key takeaways:
1. The number of women holding political office has been growing incredibly fast over the past few decades.
2. Most of the positions held are at the State Legislative level, which makes sense given the greater number of positions open at that level.
3. U.S. Territories and D.C. are late to the game, not having any semblance of gender representation until the 2000’s.

Let’s look to see if these trends have differed based on political party.

For female officeholders in Congress, Democrats and Republicans were in lockstep until 1990, when Democrats took off at a much faster rate.

How does gender representation in political office change not by political party, but by race? Let’s take a look.

We can see rather clearly that white women dominate across the board (except in U.S. Territories/D.C., which is dominated by women of Hispanic/Latina ethnicity). This would not be too surprising given that places like Puerto Rico are include in this group. The difference is clearest in Statewide Executive positions, which are often not elected (Secretary of State, Treasurer, Superintendent of Public Instruction, Lt. Governor, etc.) and instead appointed.

Let’s take a look at how long individual female officeholders have held their respective office. We’ll do so by utilizing a density plot, which shows the distribution of the number of years a given politician has served in their respective office.

From this, we can see the distribution of the number of years a given politician has held her respective office over the past 120 years, broken out by different levels of government. In particular, we can clearly see that the women who have held office in various U.S. Territories/D.C. have done so for a very short period of time: 3 years. This makes sense given how women have only been holding office in U.S. Territories/D.C. since the early 2000’s. On the other end of the spectrum, women in Congress have held their positions for a much more uniform amount of time, averaging about 21 years in office!

Now let’s hone in on the Federal/Congressional piece of the puzzle. Specifically, let’s look at the proportion of women who have been members of the U.S. Senate/House of Representatives over time. Unfortunately our dataset only includes the number of women who have held political office over time, so it’s difficult to compare that to the number of men who have held office over time at all the levels of government, except for the U.S. Senate, which has had 100 members since 1959, and the U.S. House of Representatives, which has had 435 members since 1959. Thus, we’ll focus our analysis to 1959 and after.

From this, we can see that in both chambers of Congress, the share of female representatives has increased steadily, with a significant bump in the 1990s and a steady increase from there on out. The next question that comes to my head is:

When will we achieve a 50-50 parity in each chamber of Congress?

Time Series Analysis (ARIMA)

In this next section, I’d like to use an ARIMA (Auto Regressive Integrated Moving Average) time forecasting model to determine when each chamber of the U.S. Congress will achieve full gender parity. What does that mean in plain English? Can we use data science methods to project when the Senate will have 50 female members and the House 217 (435/2) female members?

By building an ARIMA model with a trend component, we can esimate the increase in the number of female officeholders in Congress over the next 80 years:

According to the Senate ARIMA model, the U.S. Senate will first achieve full gender parity in the year 2053.

Let’s now do the same thing for the House.

Thus, according to the House ARIMA model built, the U.S. House of Representatives will first achieve full gender parity in the year 2061.

Let’s plot the Senate and House graphs side-by-side, just to get a better view of the whole picture.

From these graphs, we can see that both ARIMA models took a rather linear approach, assuming a steady increase over time. This makes sense, because ARIMA usually looks for seasonality trends (which our data does NOT have) on top of the general trends (which our data does have). Obviously, this assumes that the rate of increase is steady over time and doesn’t plateau as women in Congress hit a certain threshold. Assuming the rate of increase is generally linear, the Senate will achieve full parity 8 years before the House.

Geographic Analysis

Next, we’ll look into how the number of women holding certain political office varies based on location (state). This will help us understand if there are certain areas of the country where the phenomenon of women holding office is taking off, and others where it’s lagging.

First, let’s take a look at our all-time leaders. Which states have had the most women state representatives in the past 120 years? To look at this, I’ve pulled state population data from the U.S. Census website to help normalize our results (so that California and New York, which are obviously larger states, don’t disproportionately dominate the graph compared to smaller states, like Connecticut and Vermont).

From this we can see that certain states dominate consistently throughout the past century, specifically New Hampshire, Vermont, and Connecticut. Let’s look at the same data in map-form to give us a better sense of the geographic trends.

Conclusion

Thanks for reading! I hope you were able to learn a bit more about the data behind women in political office across the United States. If you felt like the article was educational, interesting, or if you just want to support me, feel free to follow me on any social media platform (all are listed as icons on the homepage of my website) and stay tuned for the next post. If you have ideas on public policy/political topics that could use some data expertise, please send me your suggestions – the more ideas, the better!

Additional Resources

The original blog post can be found here:
https://towardsdatascience.com/women-in-politics-exploratory-data-analysis-in-r-54bfd49dcd3a

Interested in seeing my original code? Go to my GitHub repository here:
https://github.com/jschulberg/Women-in-Politics

Interested in learning more about this subject or the Center for American Women and Politics? Go to:
https://cawp.rutgers.edu/facts/milestones-for-women/

Interested in seeing the Eagleton Institue of Politics, Center for American Women and Politics Women Elected Officials Database? Go to:
https://cawpdata.rutgers.edu/

http://www.datasciencecentral.com/xn/detail/6448529:BlogPost:1007691