• 0 Posts
  • 36 Comments
Joined 2 years ago
cake
Cake day: June 27th, 2023

help-circle




  • mm_maybe@sh.itjust.workstomemes@lemmy.worldAI needs to stop
    link
    fedilink
    arrow-up
    1
    arrow-down
    5
    ·
    1 year ago

    I mean you’re technically correct from a copyright standpoint since it would be easier to claim fair use for non-commercial research purposes. And bots built for one’s own amusement with open-source tools are way less concerning to me than black-box commercial chatbots that purport to contain “facts” when they are known to contain errors and biases, not to mention vast amounts of stolen copyrighted creative work. But even non-commercial generative AI has to reckon with it’s failure to recognize “data dignity”, that is, the right of individuals to control how data generated by their online activities is shared and used… virtually nobody except maybe Jaron Lanier and the folks behind Brave are even thinking about this issue, but it’s at the core of why people really hate AI.


  • mm_maybe@sh.itjust.workstomemes@lemmy.worldAI needs to stop
    link
    fedilink
    arrow-up
    2
    arrow-down
    1
    ·
    1 year ago

    Yes, you’re absolutely right. The first StarCoder model demonstrated that it is in fact possible to train a useful LLM exclusively on permissively licensed material, contrary to OpenAI’s claims. Unfortunately, the main concerns of the leading voices in AI ethics at the time this stuff began to really heat up were a) “alignment” with human values / takeover of super-intelligent AI and b) bias against certain groups of humans (which I characterize as differential alignment, i.e. with some humans but not others). The latter group has since published some work criticizing genAI from a copyright and data dignity standpoint, but their absolute position against the technology in general leaves no room for re-visiting the premise that use of non-permissively licensed work is inevitable. (Incidentally they also hate classification AI as a whole; thus smearing AI detection technology which could help on all fronts of this battle. Here again it’s obviously a matter of responsible deployment; the kind of classification AI that UHC deployed to reject valid health insurance claims, or the target selection AI that IDF has used, are examples of obviously unethical applications in which copyright infringement would be irrelevant.)



  • I’m honestly surprised that nobody has said anything about MS Office, but it’s not like I expect anyone to miss the application itself, it’s just that if your work requires you to interface with it, there really is no alternative to running Windows or MacOS. Microsoft’s own Office Online versions of the apps do a worse job of maintaining DOC/PPT formatting consistency than the possible Russian spyware that is OnlyOffice, which also screws things up too often to be relied upon. LibreOffice is, let’s be honest, a total mess (with the exception of Calc, which also isn’t consistent with the current version of Excel, but can do some things that Excel no longer can do, so I appreciate it more as a complementary tool than as a replacement).








  • r/SubSimGPT2Interactive for the lulz is my #1 use case

    i do occasionally ask Copilot programming questions and it gives reasonable answers most of the time.

    I use code autocomplete tools in VSCode but often end up turning them off.

    Controversial, but Replika actually helped me out during the pandemic when I was in a rough spot. I trained a copyright-safe (theft-free) bot on my own conversations from back then and have been chatting with the me side of that conversation for a little while now. It’s like getting to know a long-lost twin brother, which is nice.

    Otherwise, i’ve used small LLMs and classifiers for a wide range of tasks, like sentiment analysis, toxic content detection for moderation bots, AI media detection, summarization… I like using these better than just throwing everything at a huge model like GPT-4o because they’re more focused and less computationally costly (hence also better for the environment). I’m working on training some small copyright-safe base models to do certain sequence prediction tasks that come up in the course of my data science work, but they’re still a bit too computationally expensive for my clients.






  • Like any occupation, it’s a long story, and I’m happy to share more details over DM. But basically due to indecision over my major I took an abnormal amount of math, stats, and environmental science coursework even through my major was in social science, and I just kind of leaned further and further into that quirk as I transitioned into the workforce. bear in mind that data science as a field of study didn’t really exist yet when I graduated; these days I’m not sure such an unconventional path is necessary. however I still hear from a lot of junior data scientists in industry who are miserable because they haven’t figured out yet that in addition to their technical skills they need a “vertical” niche or topic area of interest (and by the way a public service dimension also does a lot to help a job feel meaningful and worthwhile even on the inevitable rough day here and there).