GigaChat 2.0: A Robust Neural Network Assistant Now Accessible to Everyone

Moscow, April 14 (NationPress) Sber's GigaChat 2.0 is now accessible to all users, as stated by the company on Monday. They highlighted that a novel training approach has led to a remarkable enhancement in the model's skillset.

The artificial intelligence (AI) can now identify audio files, delve deeper into user inquiries, manage larger text volumes, and recognize images, as per the company's announcement.

All features of GigaChat are integrated into a single product across various interfaces, eliminating the need for users to switch between different services.

The model is available in two iterations: GigaChat 2 Pro and GigaChat 2 Max. The Max version is designed for tackling complex and professional challenges, while the Pro version is ideal for quick and efficient solutions to everyday tasks, ranging from answering queries to text creation and editing.

According to the company, GigaChat 2.0 can now utilize current data from the Internet. It analyzes inquiries thoroughly and delivers succinct answers with source links. The AI retrieves information for the user, filtering for relevance and backing its conclusions with links for further exploration.

For instance, users can inquire: "Where can I take children aged 7 and 12 in St. Petersburg this weekend?"; "What is the estimated cost to renovate a standard one-room apartment in Moscow?".

Users can now engage with multiple files within a single chat. A document of up to 200 A4 pages can be uploaded. Sample prompt: "What should I focus on in the lease agreement? Emphasize the laws of the Russian Federation." Remember to attach the contract.

GigaChat 2.0 handles audio files at a fundamentally advanced level. The model directly interprets audio data without converting it to text first. This capability enhances its ability to pinpoint key details and effectively respond to inquiries regarding the content.

"Simply upload a recording and pose a question. It accommodates files up to 60 minutes long and 30 MB. If typing is challenging or unfeasible, you can send a voice message. GigaChat 2.0 communicates in various languages, comprehends complex terminology better, and recognizes spoken language and accents, in addition to background noise and music," stated the company.

Examples of prompts include: "Listen to the audio recording and tell me what my colleague might have disliked"; "Generate a list of medications and recommendations based on my doctor’s voice message"; "Review the video call recording and summarize everything discussed about outdoor advertising"; "Assist me in structuring my speech for a project presentation. [text of speech]".

Now, users simply need to upload links to desired content, and GigaChat will extract key information. The model can create brief summaries of website content, compare articles on similar subjects, work with multiple links concurrently, and identify images from websites.

Sample prompt: "Help me prepare for an interview for this position."

GigaChat 2.0 is also capable of processing videos from links. By interpreting the audio track, the model can summarize a video essay or respond to questions about a lecture (this function works with English and other languages). Sample prompt: "What is the content of this video? link".

The capability to create music and songs from text prompts with GigaChat has significantly improved. The maximum song length is now up to 3 minutes, with a generation time of approximately 1 minute. Enhancements have been made in the relevance of the final generation to the prompt, sound quality, and song generation in Chinese.

Sample prompt: "Click 'Generate a song', provide the lyrics or theme, select a genre, or describe your own, e.g., 'A song in the style of modern youth pop music. Use driving bass, vibrant synths, and a strong beat.'"

The model can now analyze and extract more valuable information from an image and provide more accurate responses regarding its content. For instance, it can suggest what clothing style to choose for a specific occasion, assist in solving a textbook equation, or interpret medical test results.

Sample prompt: "I received a bill for housing and utilities. Can you clarify what I am being charged for?"

For the first time in Russia, smart speakers have been seamlessly integrated with a large language model, elevating their cognitive capabilities to a new height.

GigaChat conducts live conversations with users in a language they comprehend or in different roles while maintaining the conversation thread for up to 10 times longer.

For example, it can explain the theory of relativity in simple terms for a child or present the weather forecast as if it were a movie awards presenter.

This artificial intelligence not only manages dialogue but also implements functionalities such as music or reminders. Users can set multiple commands in one query, and the speaker will autonomously switch between them.

User interaction with the assistant is now tailored to individual preferences, offering 18 combinations of settings, including communication style, assistant's voice, and whether to address the user formally or informally.

Sample prompts include: "Hi, I drew a giraffe, but it looks dull. What can I enhance?", "Hello, explain the theory of relativity to a seven-year-old", "Hello, set your alarm for 6 a.m. daily and play workout music."

One of the initial platforms to feature GigaChat 2.0 was the Russian digital platform MAX by VK. This application includes a built-in messenger, mini-app, chatbot builder, online registration system, and payment service.

“By utilizing Sber's neural network model, MAX users can generate texts and images, transcribe audio, receive brief summaries of videos, articles, and answers to numerous questions. To explore GigaChat's capabilities, search for @gigachat and follow the provided instructions,” the company stated.