|
Berkman Center for Internet and Society
Harvard Law School Norms in Cyberspace |
Overview Questions Methodology Findings (So Far) Raw Data Source Code Bibliography and Credits Next Steps |
We analyzed messages, which are the atomic unit of this study, in subsets based on four criteria – hierarchy, moderated/not moderated, domain of the message poster and group. Analysis is done on message headers and the body of messages. Linguistics analysis is implemented using PERL regular expression matching. Though more penetrating linguistics analysis is certainly possible, the time constraints of the project prevented usage of lisp and neural networks. The linguistics portion of the project has been guided by the 80-20 rule: to achieve 80% of the meaningful newsgroup statistics equipped with 20% of the relevant education.
A matrix of the properties of each message observation
is then ready to be fed through Stata which is a statistical analysis program.
Stata is able to tabulate observations, taking means and standard deviations
of each variable. The output log of Stata runs are fed through a PERL script,
making them readable to an Excel spreadsheet program, which we then use
to generate charts.