Codebook for U.S. House Returns 1976–2020

The data file 1976-2020-house contains constituency (district) returns for elections to the U.S. House of Representatives from 1976 to 2020. The data source is the document “Statistics of the Congressional Election,” published biennially by the Clerk of the U.S. House of Representatives. 2018 data comes from official state election websites, and for Kansas, come from Stephen Pettigrew and the Kansas Secretary of State office (in some cases, they are marked as unofficial, and will be updated at a later time).

All string variables are in upper case.

Variables

The variables are listed as they appear in the data file.

year

  • Description: year in which election was held

state

  • Description: state name

state_po

  • Description: U.S. postal code state abbreviation

state_fips

  • Description: State FIPS code

state_cen

  • Description: U.S. Census state code

state_ic

  • Description: ICPSR state code

office

  • Description: U.S. House (constant)

district

  • Description: district number
  • **Note**: At-large districts are coded as 0 (zero).

stage

  • Description: electoral stage
  • Coding:
code definition
“gen” general elections
“pri” primary elections
  • Note: Only appears in special cases. Consult original House Clerk report for these cases.

special

  • Description: special election
  • Coding
code definition
“TRUE” special elections
“FALSE” regular elections

candidate

  • Description: name of the candidate
  • Note: The name is as it appears in the House Clerk report.

party

  • Description: party of the candidate (always entirely lowercase)
    • Note: Parties are as they appear in the House Clerk report. In states that allow candidates to appear on multiple party lines, separate vote totals are indicated for each party. Therefore, for analysis that involves candidate totals, it will be necessary to aggregate across all party lines within a district. For analysis that focuses on two-party vote totals, it will be necessary to account for major party candidates who receive votes under multiple party labels. Minnesota party labels are given as they appear on the Minnesota ballots. Future versions of this file will include codes for candidates who are endorsed by major parties, regardless of the party label under which they receive votes.

writein

  • Description: vote totals associated with write-in candidates
  • Coding:
code definition
“TRUE” write-in candidates
“FALSE” non-write-in candidates

mode

  • Description: mode of voting; states with data that doesn’t break down returns by mode are marked as “total”

candidatevotes

  • Description: votes received by this candidate for this particular party

totalvotes

  • Description: total number of votes cast for this election

fusion_ticket

  • Description: A TRUE/FALSE indicator as to whether the given candidate is running on a fusion party ticket, which will in turn mean that a candidate will appear multiple times, but by different parties, for a given election. States with fusion tickets include Connecticut, New Jersey, New York, and South Carolina.

unofficial

  • Description: TRUE/FALSE indicator for unofficial result (to be updated later); this appears only for 2018 data in some cases

version

  • Description: date when this dataset was finalized

NOTES:

  • candidatevotes: for uncontested races, value is set to 1 in FL. Should user want to set a higher value for analysis purposes, consider setting the value as the maximum for a given state-year. The code in R would be the following:
    df <- read.csv("1976-2018-house.csv", stringsAsFactors = FALSE)
    df <- df %>%
     group_by(state_po,district) %>%
     mutate(max_st_year_vote = max(candidatevotes, na.rm=T)
    

    The following code should be used if the user would like to assume that uncontested candidates would have recieved as many votes as the best contested candidate.

  • district: district is set to 0 for single member states.

  • party and candidate: candidate - party combinations are recorded as they were on the state elections website. This means that for states where the same candidate might appear on multiple parties, like in NY, they are recorded as such. Therefore, for users interested in finding the primary party, run the following code:
    df <- read.csv("1976-2020-house.csv", stringsAsFactors = FALSE)
    df$district <- str_pad(df$district, width=2, pad="0", side="left)
    df$state_fips <- str_pad(df$state_fips, width=2, pad="0", side="left)
    df$GEOID <- paste(df$state_fips, df$district, sep="")
    df_max <- df %>%
     group_by(candidate, GEOID, year) %>%
     slice(which.max(candidatevotes)
    df_sum <- df %>%
     group_by(candidate, GEOID, year) %>%
     aggregate(candvotes_sum = sum(candvotes))