FINAL PRESENTATION VIDEO:

TECHNICAL REFLECTION AND CONCLUSIONS (VIDEO PRESENTATION TL;DR)

When thinking about the broader project and all that has been uncovered in this data, the most sucessful methods of analysis are clear. the unsupervised K-means clustering first showed that presumptions about the data were unfounded when it unexpectedly fit three clusters of conversation rather than two.This indicates that there is a richer complexity in how artists and AI enthusiasts talk online than the reported “pro” and “anti” binary reported in interviews. Those three clusters—one tightly focused on copyright, ethics, and data use; another centered on digital-art tools like tablets and pens; and a third broad “general art” cluster—laid the groundwork for deeper topic exploration.

The next big finding was from Latent Dirichlet Allocation (LDA). The flexibility offered by LDA (that KMeans clustering couldn’t reach) uncovered ten distinct themes was further bolstered by the mapping of each keyword back to its original label, revealing which topics tended to come from “AI art” spaces versus “art” spaces. For example, technical debates around consent and training data skewed heavily toward opponents of AI art, while conversations about painting techniques and book illustration were far more balanced.

Naïve Bayes then provided a complementary, probabilistic perspective: with around 86 % accuracy and a particularly high recall on AI art (93 %), it re-confirmed that opponents of AI art use a consistent, focused vocabulary, whereas human-made art discussions span a wider and more nuanced lexicon. This is repeated again in the confusion matrix and accuracies of the neural network.

FINAL CONCLUSIONS: WHAT DO WE KNOW?

By examining hundreds of news articles and reddit forum posts and comments, my data has made clear that debates are not simply “for” or “against” AI art but fall into multiple camps—from focused discussions about copyright and ethics to lively chatter about tools and techniques, and broad, general art conversations that span every medium.

The balance of viewpoints uncovered here shows both deep concern for artists’ rights and a genuine excitement about new creative possibilities. Artists’ fears about data usage and intellectual property sit happily alongside people who use AI and seem to collaborate as a community discussing prompt practices, technology and its advancements, and possible projects at large afforded by the image generation models. The conflicting tensions of focus between the communtiy spaces as represented in my data seem to fuel a rich dialogue that’s far more nuanced than any single headline or tweet can convey.

In my familiarity with the data and the discussions and their sources I believe that this analysis has subtly been pointing towards another finding: Platform matters. News articles and subreddit posts each bring their own tone and focus: forums often spark hands-on discussions of prompt crafting and software hacks, or heated personal and incidental debates, while mainstream outlets frame the debate through legal and ethical lenses, or often more specifically: incidents and events which moved into the real world. Any effort to interpret this data as a baseline on which to moderate, educate, or engage these online communities (which is a tertiary intention of my research) must be done reflect those differing priorities of data sources that may influence my data rather than assume a one-size-fits-all approach.

By applying three core steps—first clustering posts to uncover the main discussion groups, then using topic modeling to pull out the key themes within those groups, and finally running simple classifiers to see how reliably those themes predict a post’s origin—this project establishes a clear, repeatable workflow for mapping any large text collection in this specific AI vs Art context. All of the tools used (turning words into numbers with CountVectorizer, grouping with K-means, distilling topics via LDA, and testing predictions with Naïve Bayes and a basic neural network) are readily available and can be swapped into other contexts, from product reviews to social-media debates, though in such a case exploring more than just the tree successful methods would make more sense.

Together, these methods showed that the AI-art conversation is far more than a straight “for vs. against” fight. Instead, multiple sub-communities emerge, each with its own vocabulary and concerns—from data-use ethics to creative techniques and beyond. The conflicting tensions of focus between the community spaces as represented in my data seem to fuel a rich dialogue that’s far more nuanced than any single headline or tweet can convey.

Looking ahead, the same approach can be extended by tagging posts for specific stances within each group, layering in sentiment analysis to capture positive or negative tone, and profiling how different communities engage with AI art. Those additions will help reveal not just what people are talking about, but how they feel—and point toward more focused moderation, policy recommendations, or collaborative projects.

FINAL PRESENTATION VIDEO:

NeuralNetworks