Battling Trolls with AI


In my last post I mentioned that I was on some medication to fix my hormone levels by burning a tiny non-cancerous tumour off my pituitary gland. Well, it seems to be working – my energy levels are up and I’m dressing up a ton more! More than I ever have, I think. It’s been nice.

I’ve been posting a ton more too on Instagram and Facebook – not so much here on this blog though, unfortunately. While I’ve been able to find time and energy to dress up more, finding enough time and energy to write a blog post about something worthwhile is a different level of exertion! My social media following is relatively small – On the whole I don’t get a ton of followers or comments on anything I post, and nor do I really expect to. I’m neither the prettiest girl at the dance, nor the most active! But for some reason my Facebook posts have been getting absurdly popular… but only with trolls.

All of a sudden, my posts on Facebook were getting THOUSANDS of views. Some nice comments and heart emojis from the last few remnants of friendly people on Facebook, but as a counterweight to that I’m also getting hundreds of nasty comments and laugh-emojis that are pointedly meant to hurt me.

It turns out, however, that I am stubborn. When pushed, I am eager to push back.

I spent hours being sassy in the comments. If someone said I was ugly, I would respond with a screenshot of their sad looking face. If someone commented with a fundamental misunderstanding of basic grammar and punctuation rules, I would refer them back to their elementary school teachers. It’s fun for a short while, but ultimately exhausting in the long term. These folks are, by and large, sad pathetic creatures with nothing better to do with their lives than to try and make themselves feel better by putting other people down, and my wading into the swamp of their making is not worth a microjoule of my own energy. They’re not worth my time nor attention, and so the best thing to do is to hide their comments and move on. Annoyingly, that means I have to read them all. A friend of mine worried about the impact on my mental state having to subject myself to all of these distasteful drive-by insults. It’s hard to to let it get to you – when a hundred people tell you that you’re ugly, you start to wonder “Maybe?”. I tried to put that “maybe” aside and picked up another one – “maybe” there’s a better way to deal with this situation.

Facebook’s moderation tools are nowhere near as good as one might expect. They are un-nuanced and ill-designed to deal with the specific forms of hate I am dealing with. I poked at Facebook’s moderation tools initially, but being such a blunt instrument (hide comments with images, hide comments with profanity, hide comments with particular words) it missed so many comments and required so much manual verification that I was still having to see all of the vile comments to make sure my page was clean. This was not a viable option for my continued mental-health goals.

Enter, the robots.

Not too long ago I had started playing around with Claude Code and OpenAI’s APIs – I had recently built my own AI-powered radio station with these tools, and it seemed probable that with my technical skills I could solve this problem too. Working in the tech industry for the last 18 years has got to be good for something, right? Could I craft a prompt that classified comments as offensive/non-offensive based on some criteria? YOU BET YOUR STINKY LITTLE ASS I CAN! And so Claude and I set about having long conversations about potential strategies and implementation.

I named this system Izzy, after Izzy Rowlands. This video should explain everything about that particular decision.

But, in case it’s not clear… There’s buttholes everywhere. And IzzyBot is gonna deal with ’em.

The first round of changes actually worked pretty well – but we struggled with classification and where to draw the line. We updated prompts, ran backtesting on comments to understand the changes, tweaked settings, and eventually ended up with a system that allows us to separate the comment classification (e.g. misgendering, political, supportive, etc) from the comment policy (e.g. show supportive comments, hide political comments, hide misgendering comments).

Here’s a quick taste of the kind of abuse I’m getting. I’m sure you can understand why it was necessary for my mental health to not have to wade through these. I stopped counting the number of comments I got telling me to kill myself when it hit 3. I’m sure there’s been more that I’m just not seeing, and I am thankful for that. Humans can be pretty vile.

On top of add functionality to hide the mean stuff, I also wanted to make sure I was able to highlight the good stuff too! I added some pages to see the nice stuff people are saying – It’s somewhat of an ego boost – but more than that, it’s a little affirming and helps to counteract the mental impact of all hate.

It’s sad that the abuse rate is so high, but Facebook is a swamp, and the political climate we live in isn’t helping. I didn’t think at any point in my life that I’d be the target of a culture war. I’m just trying to live my life and find happiness in my small corner of the world, you know?

I made a deliberate choice a long time ago to not hide any of who I am – but not only that – to hopefully serve as some kind of example to show folks that you can be your whole self and still be happy. I’m optimistic that some people out in the world will see me and thinking “maybe I can do that too”. I’m hopeful. I still remember the first transwoman I saw, and still remember the impact they had on me. I’m standing on the shoulders of giants, and the least I can do is pay it forward somehow.

Here’s a little breakdown of the type of abuse I’m getting. “Unreadable Media” is just any comment with an image. A lot of what I’m getting are offensive images – I’m hiding them all! There’s probably some supportive ones in there too, but that’s the sacrifice I’m having to make for now.

Another thing that it felt important to show was that my page was being actively managed, and so Izzy is adding a comment to every post that has offensive commentary on it, and keeps a running count of how many buttholes are out there.

There’s still bugs that I’d like to work through – and some of that requires some jumping through hoops with Facebook’s labyrinthine app approval process, but I don’t need that for right now. At some point, I’d like to extend this out of just my own Facebook page to other people’s to help them battle their own set of trolls. It’d definitely be some work, but if it’s something you think you or anyone else would be interested to run on your Facebook pages (or even other social media accounts), let me know and we can figure it out :)

Liz xx

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.