The first version of Apple Intelligence, and the 47-page Apple's self-developed large language model technology report released.
Apple Intelligence's first version has launched the following AI features:
1. Siri upgrade. Siri has a glowing edge effect on the screen after being awakened, can understand user's unclear instructions, and can answer questions related to Apple product troubleshooting.
2. Writing tool upgrade. The new iOS provides Apple's text generation service; it also supports AI-generated emails, messages, voice transcription summaries, and other functions.
3. Visual tool upgrade. This version provides smarter image search and movie memory creation functions.
Many AI features announced by Apple in June have not appeared in the iOS 18.1 developer beta. Apple says they plan to launch them next year, including:
1. Other Siri improvements, including personal information analysis, linking with external applications to perform tasks, etc.
2. Image and visual generation functions, including emoji generation, and automatic photo cleaning and other visual-related capabilities.
3. Integration of OpenAI's ChatGPT, etc.
iPadOS 18.1 and macOS Sequoia 15.1 have also incorporated related Apple Intelligence new features, but they are currently only open to registered Apple developers who pay $99 annually.
In the paper released today, Apple revealed its ### two Apple Foundation Models (AFM).
Paper link: https://machinelearning.apple.com/papers/apple_intelligence_foundation_language_models.pdf
One is the ### 3 billion parameter on-device model ### AFM-on-device, optimized to run efficiently on iPhones and other devices; the other is the cloud model ### AFM-server, whose model parameters have not been disclosed yet.
The report for the first time interprets AFM's ### model architecture, training data, training process, inference optimization, and evaluation results, and mentions that the model training behind it used a cumulative ### 10,240 Google TPUs, without mentioning NVIDIA GPUs.
According to the paper description, Apple's self-developed large language model ### surpasses GPT-4 in tests on ### instruction following and text summarization.
I. Apple AI's First Appearance: Siri "Transforms and Changes Brain", One-Click Writing Polish
This time, the Apple Intelligence features launched in the iOS 18.1 developer beta mainly cover Siri, writing tools, email summaries, natural language photo search, and other aspects.
1. The entire screen lights up with a halo, Siri transforms
Siri's change is first in its new appearance. The previous circular light spot on the screen has been replaced by a glowing light surrounding the screen to indicate that the assistant is active.
When developers don't want to speak loudly to Siri, they can switch from voice commands to typing: double-tap the bottom of the iPhone or iPad screen to bring up the keyboard for entering Siri queries and commands.
Siri can now understand the context of multiple instructions. For example, developers can ask Siri to create a schedule, then ask to create a reminder without repeating what was said before.
2. Writing tools launched, polishing sentences, email summaries
Writing tools are a major selling point of Apple Intelligence, supporting developers to make suggestions on tone and wording, proofread text, and summarize key points.
Voice transcription is also available now. In the iOS 18.1 developer beta, the Voice Memos app and Notes app have built-in voice transcription functionality.
The writing function is available for both Apple's built-in applications and third-party applications using the standard input text system.
The Mail app now intelligently identifies priority emails and will display reminder pop-ups at the top of the inbox to remind developers of specific deadlines or to avoid forgetting some important action items.
In addition, the new version supports a focus mode called "Reduce Interruptions", which will use AI to identify and filter important notifications.
3. Natural language interaction to search photos, AI-generated short videos
Developers can now use natural language to find videos and photos. For example, when querying "photos of my daughter eating a cheeseburger", Apple will provide corresponding search results. It should make it easier to find specific images or exact moments in videos without using more generic keywords.
The new Movie Memories feature allows developers to input specific prompts to create movies using photos and videos stored in the Photos app.
Developers can input their own prompts or use prompts suggested by Apple Intelligence to get intelligently generated movies with clear chapters and themes.
These launched Apple Intelligence features still have some usage restrictions.
Currently, Apple Intelligence is only open to registered Apple developers who pay $99 annually, including three versions for iOS, iPad, and Mac. Developers need to set their device region to the United States and language to US English.
In addition, a previous June report mentioned that Apple Intelligence requires devices to be iPhone 15 Pro, iPhone 15 Pro Max, or iPad and Mac with M1 and above configurations.
II. 47-page paper interprets Apple's large language model, surpassing GPT-4 in text summarization and other tests
Compared to current AI phones, a major feature of Apple's self-developed model is the introduction of an on-device model that runs on the device.
According to Apple's latest paper released today, this on-device model is called AFM-on-device, containing about 3 billion parameters, far smaller than the hundreds of billions of parameters in models from companies like OpenAI and Meta.
To perform more complex tasks, Apple has also developed a cloud model called AFM-server. Although its specific size has not been disclosed, it is designed to run in Apple's cloud infrastructure using a system called "Private Cloud Compute" to protect user data.
As shown in the figure below, AFM-on-device surpasses open-source models such as Phi-3-mini, Mistral-7B, and Gemma-2B in human evaluation, approaching the level of Llama-3-8B.
AFM-server surpasses closed-source models such as Llama-3-70B, Mixtral-8x22B, and GPT-3.5 in human evaluation, approaching the capability of GPT-4.
At the same time, in terms of instruction following, AFM-server surpasses GPT-4 in tests, while AFM-on-device surpasses open-source models such as Llama-3-8B and Phi-3-mini.