OpenAI renforce les capacités de son chatbot avec l’analyse et la génération d’images, ainsi que la synthèse vocale.

OpenAI renforce les capacités de son chatbot avec l’analyse et la génération d’images, ainsi que la synthèse vocale.

OpenAI continue to inject new capabilities into its conversational AI that has ushered humanity into a new era of true collaboration between man and machine. Web browsing, image analysis and generation, and speech analysis and synthesis are now accessible to the chatbot in its paid version.

ChatGPT has been powered by the LLM “GPT-4” model since the beginning of the year, which we know is multimodal but whose visual and audio capabilities had previously been restricted and blocked.

In recent days, OpenAI has decided to unleash more of the potential of its generative and conversational AI. Although these new features are currently limited to paying users of the ChatGPT Plus and ChatGPT Enterprise versions. For those who want to stick with the free version, Microsoft’s Bing Chat offers most of these functionalities.

The return of web browsing

It first started with the reintroduction of a briefly appeared function earlier this summer, but was quickly removed (after some clever individuals discovered that they could access paid websites for free through ChatGPT): the AI’s connectivity to the web! The underlying GPT-4 model of ChatGPT was trained with documents prior to the end of 2021. Without internet connectivity, not only could the AI not truly analyze web documents, but it also could not enrich its responses with recent information. Now, by activating the “Browse with Bing” setting in “Settings and Beta/ Beta Features,” the conversational AI can answer questions about current topics and events and connect to the web to refine its analyses.

From understanding to image generation

Another key novelty, OpenAI has finally decided to unlock the multimodal potential of GPT-4. From now on, ChatGPT relies on the brand new GPT-4V iteration of its foundation model and officially grants access to image analysis. ChatGPT Plus users will soon be able to submit images or questions illustrated with images (they can already do so on iOS and Android mobile versions) and ask the AI to analyze and comment on them. The AI can translate handwritten manuscripts, transform hand-drawn algorithms or screens into computer code, analyze and describe a photo or painting, analyze captchas, and much more.
Furthermore, OpenAI will soon integrate its spectacular image generator “Dall-E 3” into ChatGPT (it is already available on Bing Image Creator, and the results are truly more impressive than Dall-E 2), seriously competing with Midjourney while offering more style variety.

Voice, to expand interactions

One of the great strengths of generative AIs is that they revolutionize human-machine interactions by making natural language the foundation of these interactions. The idea now is to achieve such interactions with voice instead of writing. We still need to wait a bit to have a discussion like with a human being, due to the current time required for analyzing and understanding human speech. But we are getting closer.
In its mobile version, Bing Chat allows users to ask questions vocally and the AI responds vocally as well. The AI relies on the models developed by Microsoft for this purpose.
OpenAI will soon integrate its voice-to-text transformation model “Whisper” into ChatGPT Plus. The chatbot will also be able to speak thanks to a new “TTS – Text to Speech” model offering 5 different voices.

In other words, ChatGPT can now connect, see, speak, and hear. These are all new capabilities that clever users will not hesitate to exploit for unforeseen purposes by trying to bypass the limitations that OpenAI has tried to put in place to prevent malicious or inappropriate use of their AI.

dans un article qui peut se classer haut sur Google
#ChatGPT #senrichit #nouveautés #majeures
publish_date] pt]


Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.