OpenAI now allows ChatGPT to speak, see, hear. Here’s how people can use new voice and image features

Sep 26, 2023 - 19:30

0 28

OpenAI now allows ChatGPT to speak, see, hear. Here’s how people can use new voice and image features

OpenAI has amped up the capabilities of ChatGPT, its generative AI bot by quite a few notches. Now, ChatGPT has the ability to not only have voice-based conversations but also to see and understand images.

This basically means that ChatGPT can now hear, speak and see with whom it is interacting.

Here’s how ChatGPT’s new features work.

Voice Conversations
Users can now enjoy dynamic and interactive dialogues with their AI assistant, unlocking a realm of exciting possibilities. Whether you’re on the move, seeking a bedtime story for your family, or settling a dinner table debate, ChatGPT’s voice capabilities are primed to assist.

To initiate voice interactions, navigate to the Settings menu in the mobile app, select “New Features,” and opt into voice conversations. Once activated, simply tap the headphone icon in the top-right corner of the home screen to choose from five distinct voices.

These voices have been meticulously crafted by professional voice actors to deliver a human-like auditory experience. Additionally, Whisper, OpenAI’s open-source speech recognition system, transcribes spoken words into text, augmenting the overall conversational quality.

Images and ChatGPT
Users can now present one or more images to ChatGPT for troubleshooting, content exploration, or complex data analysis. Whether you’re attempting to diagnose why your grill won’t start, plan a meal based on the contents of your fridge, or decode a data graph for work, ChatGPT is here to assist.

To use this feature, tap the photo button to capture or select an image. On iOS or Android, tap the plus button initially to include multiple images or employ the drawing tool to guide your assistant.

These image capabilities harness the power of multimodal models, including GPT-3.5 and GPT-4, which apply linguistic reasoning skills to a wide spectrum of visual content, encompassing photos, screenshots, and documents that contain both text and images.

Safety and Responsiveness
Voice and image capabilities will be rolled out in a phased manner to Plus and Enterprise users over the next two weeks. Voice functionality is available on both iOS and Android platforms, accessible through the settings, while image capabilities will be available on all platforms.

There is a lot of potential risks linked to these advanced capabilities. Concerning voice, the emphasis is on voice chat, and the technology has been developed in collaboration with voice actors to ensure authenticity and safety.

Regarding image input, OpenAI has taken measures to limit ChatGPT’s capacity to analyze and make direct statements about individuals to respect their privacy. Real-world usage and user feedback will play a pivotal role in further enhancing these safeguards while upholding the utility of the tool.

Original Post