With the introduction of additional speech and image capabilities in ChatGPT, OpenAI is once again pushing the limits of AI technology. With their enhanced intuitiveness and immersiveness, these characteristics are poised to fundamentally alter how consumers engage with the AI model.
ChatGPT voice conversations
The addition of audio chat using ChatGPT is one of this update’s most notable features. Now that users can converse with their AI helper in real time, many possibilities are possible. The speech capabilities of ChatGPT are available to help you resolve a dinner table argument, find a bedtime tale for your children, or find you on the run.
Simply click “New Features” from the Settings menu in the mobile app, opt into voice discussions, and you may begin using voice. Once activated, select one of five voices by tapping the headphone icon in the top-right area of the home screen. Professional voice actors have meticulously sculpted these voices to seem like human beings. Whisper, OpenAI’s open-source voice recognition engine, also converts spoken words into text to improve the overall quality of conversations.
Image Interaction with ChatGPT
Sharing pictures with ChatGPT is another innovative function. Users may now display one or more photographs to ChatGPT in order to investigate material, solve issues, or analyse complicated data. ChatGPT can help you understand data graphs for work, plan a dinner based on what’s in your fridge, or figure out why your grill won’t start.
Tap the photo button to take or choose an image to utilise with this feature. To upload several photos on iOS or Android, touch the addition button first. You can also use the sketching tool to direct your helper. These multimodal models, such GPT-3.5 and GPT-4, which apply language reasoning skills to a variety of visual input, like pictures, screenshots, and documents with both text and images, enable these image capabilities.
Progressive Deployment for Safety and Resilience
Over the course of the next two weeks, subscribers of Plus and Enterprise will progressively start to receive voice and picture capabilities. Images can be accessed on all platforms, while voice is only available on iOS and Android with the option to opt in through settings.
OpenAI is aware of the dangers that come with these cutting-edge capabilities. To assure authenticity and safety, speech technology has been developed in partnership with voice actors with an emphasis on voice chat. Notably, Spotify is also making use of this technology for its Voice Translation function, which helps podcasters reach a wider audience by translating their content into different languages with the use of their own voices.
In order to protect people’s privacy, OpenAI has made steps to restrict ChatGPT’s capacity to assess and directly attribute attributes to specific persons based on picture input. These precautions will need to be improved further while still preserving the tool’s utility, and user input and real-world use will be essential in this process.