models & data

Language models and datasets I have built and released.

Language Models

Facebook bart-large-cnn — sequence-to-sequence model trained to summarise policy positions from party press releases

Available at: z-dickson/bart-large-cnn-climate-change-summarization

Details Climate change party positions

vinai/bertweet-large — model trained to predict opposition to COVID-19 policies in US congressmembers’ tweets

Available at: z-dickson/US_politicians_covid_skepticism

Details Infected timeseries

bert-base-multilingual-cased — trained to predict the CAP issue codes of political text (bills, speeches, tweets, etc.)

Available at: z-dickson/CAP_multilingual

Details Language model trained to predict the CAP Issue Code of political text. The model was trained on the universe of coded data from the Comparative Agendas Project (huge thanks!) and can accurately predict the CAP code of political text in multiple languages and domains. Confusion matrix

bert-base-multilingual-cased — sentiment model trained on Polish, English, Spanish, Dutch and German newspaper headlines

Available at: z-dickson/multilingual_sentiment_newspaper_headlines

Details Language model trained to predict the sentiment of newspaper headlines in English, Polish, Spanish, Dutch and German. Attention to borders newspapers

Datasets

  • UK Parliamentary Statutory Instruments: 1970–2021 · GitHub
  • Parliamentary Bills — Dáil Éireann (Ireland): 1950–2020 · GitHub
  • Parliamentary Bills — New Zealand Parliament: 1900–2020 · GitHub