Quite a few people emailed me regarding my post on Econ Job Market. This post is about how you can use very basic and simple R tools to help you in sorting through the Job Openings for Economists list from the American Economic Association.This definitely helped me in developing my spreadsheet of places to apply for.
Before I begin I would like to quickly revisit EJM again.
Revisiting Using R for Econ Job Market
It turns out that my post about scraping the EJM website to obtain a listing of the Job Posts was (partly) redundant. The new EJM system available on “myeconjobmarket.org”  provides a facility to download a spreadsheet. Unfortunately, that spreadsheet does not contain the important “position ID”. This position ID is important as you can then construct a link for the applications.
An example:
https://econjobmarket.org/AdDetails.php?posid=2723
The Application Link then becomes:
https://econjobmarket.org/Apply/PosApp.php?posid=2723
In order for this to work, you ll need to have a current login session open as otherwise, you ll be redirect to the main homepage. I updated the spreadsheet and its available here for download. I emailed EJM to add the job opening ID to their spreadsheet, then you can merge the two spreadsheets.
Leveraging R for JOE?
Now I am turning to JOE. As on EJM, you can download the Job Openings. Again, they dont include a link to the job posting. However, you can easily construct this because the Job Posting ID is simply a concatenation of the fields “joe_issue_ID” and “jp_id”, separated with an underscore. This gives the JOE_ID.
https://www.aeaweb.org/joe/listing.php?JOE_ID=2014-02_111451008
Now you can try the filtering on JOE to limit the types of postings. But you can also do this in R and you can try to add some features.
Filtering Jobs/ adding a common country code
A first thing I wanted to do is just show you how to filter the job listings and add a common country name or country code.
library(countrycode)
library(data.table)
library(stringr)
options(stringsAsFactors=FALSE)
JOBS<-data.table(read.csv(file="~/Downloads/joe_resultset.csv"))
JOBS$Application_deadline<-as.POSIXct(JOBS$Application_deadline)
JOBS<-JOBS[order(Application_deadline)][Application_deadline>as.POSIXct("2014-10-20")]
###this will keep all full time academic jobs, be it international or just within US
JOBS<-JOBS[grep("Assistant|Professor|Lecturer",jp_title)][grep("Full-Time Academic", jp_section)]
##split out the country
JOBS$country<-gsub("([A-Z]*)( [A-Z]{2,})?(.*)","\\1\\2", JOBS$locations)
###get harmonized country codes...
JOBS$iso3<-countrycode(JOBS$country, "country.name", "iso3c", warn = FALSE)
###transfer application deadline into a date format
JOBS$Application_deadline<-as.POSIXct(JOBS$Application_deadline)
###drop ones that have already passed
JOBS<-JOBS[order(Application_deadline)][Application_deadline>as.POSIXct("2014-10-20")]
When doing this, you will notice something weird. The application deadline is wrong… in quite a few cases.
Consider for example the job posting for an Assistant Professor position at the Harvard Kennedy School (see https://www.aeaweb.org/joe/listing.php?JOE_ID=2014-02_111451068).
In the spreadsheet, you will see a deadline of 31.01.2015 – which definately cant be right, because the ASSA meetings are in early January. So how can we fix these up? If you look at the plain text of the posting, you will see that applications will begin to be considered 18-Nov-14… that is much more reasonable…
If you sort applications by the application deadline field provided by JOE, you run the risk of missing out due to quirks like this.
One way around this is to run some regular expressions on the main text field to flag up common date formats. This way you do not need to filter all individual job postings for a date. You can simply look at job postings that seem to have weird application deadlines (like later than December).
A regular expression could take the form:
(November|December) ([0-9]{1,2})(,)? (2014)?
which would map date formats like “November 20, Â 2014” or “November 20”. The following code maps a few common date formats and the resulting spreadsheet filtering only academic jobs is attached. This formed the starting point for my job market application spreadsheet.
JOBS$jp_full_text<-as.character(JOBS$jp_full_text) ###OTHER DATE MENTIONED IN DECEMBER / NOVEMBER MENTIONED IN THE FULL TEXT JOBS$otherdate<-"" whichare<-regexpr("(November|December) ([0-9]{1,2})(,)? (2014)?",JOBS$jp_full_text, perl=TRUE, useBytes=TRUE) JOBS[whichare[1:nrow(JOBS)]!=-1]$otherdate<-regmatches(JOBS$jp_full_text,whichare) whichare<-regexpr("([0-9]{1,2}) (November|December)(,)? (2014)?",JOBS$jp_full_text, perl=TRUE, useBytes=TRUE) JOBS[whichare[1:nrow(JOBS)]!=-1]$otherdate<-regmatches(JOBS$jp_full_text,whichare) whichare<-regexpr("([0-9\\.\\/]{1,+})([1-9]\\.\\/]{1,+})(2014)?",JOBS$jp_full_text, perl=TRUE, useBytes=TRUE) JOBS[whichare[1:nrow(JOBS)]!=-1]$otherdate<-regmatches(JOBS$jp_full_text,whichare) ###add the JOB LISTING URL JOBS$url<-JOBS[, paste("https://www.aeaweb.org/joe/listing.php?JOE_ID=",joe_issue_ID,"_",jp_id,sep="")]
The  resulting spreadsheet is attached joe-results-refined.