TLDR

AI: Datasets, Markdown Files, and Repos… Oh, My! Working on a Goblin Scout.


AI:

I was getting sick of working in Google Colab today. It’s still the best bang for the buck and a super easy zero-to-working setup, but I longed for something other than Python. So, I decided to spin up a rust program to help me quickly manipulate data. I’m calling him Goblin Scout (Hunter), and two more are coming.

Scout is my data collector. I’ve been working fine without anything, but I see a day when I’d like something more robust than manual editing. Scout is starting with some Github grabs and a dash of Web scraping.

I made him in less than 1 hour. Then I thought it’d be fun to refactor him to ECS, 3 hours later… WTF, No. That’s a very dumb idea! Scout and I decided ECS was more bloated than we wanted ++ turning “everything and anything” into a learning experience is also a dumb idea.

He’s less the Scout than when I first rolled him, but he functions fine as a 1 trick pony. He’s just downloading git repos and converting them into a single markdown file, but I’ll finish processing and bringing back web scraping tomorrow.

The goal is to manipulate unstructured data better and prepare it for becoming an AI dataset. Think I’ll watch this and head to bed…

 

 

gn,

nicabar