Data & ML

A lot of my research draws from the intersection of data science and political science. I have a particular interest in the application of machine learning to text data. Below are some of the models and datasets I have developed. Most of the models are available on the Hugging Face Model Hub.

Language models


Facebook bart-large-cnn sequence-to-sequence model trained to summarise policy positions from party press releases

LLM available at: z-dickson/bart-large-cnn-climate-change-summarization

drawing


Vinai/bertweet-large model trained to predict opposition to COVID-19 policies in US congressmembers’ tweets

LLM available at: z-dickson/US_politicians_covid_skepticism

drawing


Bert-base-multilingual-cased trained to predict the CAP issue codes of political text (i.e. bills, speeches, tweets etc.)

Details Language model trained to predict the CAP Issue Code of political text. The model was trained on the universe of coded data from the Comparative Agendas Project (huge thanks!) and can accurately predict the CAP code of political text in multiple languages and domains. The model is available on the Hugging Face Model Hub: z-dickson/CAP_multilingual

drawing


Bert-base-multilingual-cased Sentiment model trained on Polish, English, Spanish, Dutch and German Newspaper headlines.

Details Language model trained to predict the sentiment of newspaper headlines in English, Polish, Spanish, Dutch and German. The model is available on the Hugging Face Model Hub: z-dickson/multilingual_sentiment_newspaper_headlines

drawing


Datasets

  • UK Parliamentary Statutory Instruments: 1970-2021 [github]
  • Parliamentary Bills - Dáil Éireann (Ireland): 1950-2020 [github]
  • Parliamentary Bills - New Zealand Parliament: 1900-2020 [github]