• Ghostalmedia@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      0
      ·
      8 months ago

      Siri was originally in the cloud, but Apple has been trying to handle more Siri requests locally so that requests can be handled faster and without internet access.

  • Fake4000@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    8 months ago

    That’s what they all say. But a lot of these so called AI features require power more than what a phone has. Offloading to a server is sometimes a must.

    • fartsparkles@sh.itjust.works
      link
      fedilink
      arrow-up
      0
      ·
      8 months ago

      Quantised models can be surprisingly small. And if Apple aren’t targeting LLMs for local use, more specific/tailored models absolutely can run on device.

      That said, given the precedent sent by Siri, their next progression of Siri into an LLM will absolutely require network connection and be executed server side.

    • kadu@lemmy.world
      link
      fedilink
      arrow-up
      0
      ·
      edit-2
      8 months ago

      Samsung’s version on One UI 6.1 lets you toggle between running the local models on the phone’s NPU versus connecting to their servers.

      The local version is slightly slower and produces worse results, but can be used for privacy or without the internet. The remote version is what you’d expect.

      The thing is, these AI features are just features already present in some way or another, just emphasizing content generation and slapping AI branding.

    • Daxtron2@startrek.website
      link
      fedilink
      arrow-up
      0
      ·
      8 months ago

      Sure if you’re running large models like gpt, smaller models tailored to specific use cases can absolutely run on phones. Whether or not they get there implementation down right is a different story though

    • chrash0@lemmy.world
      link
      fedilink
      arrow-up
      0
      ·
      8 months ago

      you’d be surprised how fast a model can be if you narrow the scope, quantize, and target specific hardware, like the AI hardware features they’re announcing.

      not a 1-1, but a quantized Mistral 7B runs at ~35 tokens/sec on my M2. that’s not even as optimized as it could be. it can write simple scripts and do some decent writing prompts.

      they could get really narrow in scope (super simple RAG, limited responses, etc), quantize down to even something like 4 bit, and run it on custom accelerated hardware. it doesn’t have to reproduce Shakespeare, but i can imagine a PoC that runs circles around Siri in semantic understanding and generated responses. being able to reach out on Slack to the engineers that built the NPU stack ain’t bad neither.

  • GlitterInfection@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    8 months ago

    This article says some funny things:

    While more advanced features will ultimately require an internet connection

    Ok, then?

    On-device processes could help eliminate certain controversies found with server-side AI tools. For example, these tools have been known to hallucinate, meaning they make up information confidently.

    What? How would on-device processes have any effect on hallucination in LLMs?

    Or are you trying to tell us that this article was written by an LLM and that the whole thing is a confidently made up hallucination?