[Air-L] Free Datasets and Tools for Teaching and Collaborative Research Using Twitter Data

Shulman, Stu stu at texifter.com
Mon Oct 16 03:46:33 PDT 2023


These are the 50 largest of the more than 1,000 historical Twitter datasets
available via DiscoverText for teaching and research. All software and data
is free for academic researchers. For authenticity and compliance, Tweets
are displayed using the Twitter display. Deleted Tweets and suspended
accounts are not displayed. There is a significant amount of understudied
historical data. Signing up does not get you access to all 300,000,000
Tweets in 1,200 projects. It does start a process where we can talk about
your research or teaching needs and the best way to meet them using
the collaborative architecture of DiscoverText. Sharing web-based access to
a single copy of the data, rather than making copies for every user, is
another one of the measures that makes this ToS compliant.

We have user-friendly methods for sampling to make the very large sets more
manageable and focused, including search, filtering, clustering, duplicate
detection, crowdsource annotation, and machine-learning. For example, if
you need to separate Tweets about human migration from Tweets about
non-human migration, that data cleaning method using humans and machines is
in our wheelhouse. If you want to measure and report inter-rater
reliability, we started doing that in 2007 at Pitt and it is a core
feature. If you want to adjudicate annotator disagreements to create gold
standard training sets, while improving and measuring annotator awareness,
it is the most powerful (least used) piece of the research platform.
DiscoverText is an NSF-funded scientific instrument.

https://tinyurl.com/dtarchives

Topics in the 50 largest include: COVID, Trump, Brexit, Biden, Bots,
Suicide, Policing, Elections, Racism, and Gettr.

Date ranges in the 50 largest: 2017-2022

Archive size in the 50 largest: 1.2 million - 11.5 million

Topics in the most recent smaller 2023 collections (10,000 - 1,200,000)
include: BLM, "Pureblood" ideology, #voice (Australia), digital soldiers,
LGBTQ+ advocacy, vaping, the Lahaina fire, QAnon's return, anti-Semitism,
AmericaFirst, and a range of other events or trends where we suspect
Tweets, whether we like it or not, seem to play an outsized role in the
public perception of events.  If you are studying a current event, at least
for now, we can still create a custom dataset from the last 12 months of
Twitter. This may not be an option in the future, but it is one now.

Free sign up:
https://app.discovertext.com/Home/SignupContactTrialView

Free consultation:
https://calendly.com/discovertext

Scholarly mentions (DT literature review in a box):
https://discovertext.com/mentions/

Stu

-- 
Dr. Stuart W. Shulman
Founder and CEO, Texifter
Editor Emeritus, *Journal of Information Technology & Politics*


More information about the Air-L mailing list