[Air-L] OpenAI & Rev transcriptions: Ethics concern?

Natalie Rock drnatalierock at gmail.com
Sat Sep 9 09:05:38 PDT 2023


Thanks to all for the helpful suggestions!


> On Sep 7, 2023, at 06:39, kalev leetaru via Air-L <air-l at listserv.aoir.org> wrote:
> 
> As a general note, it is important to note the massive difference between
> newer non-deterministic ASR's like Whisper that prioritize "fluency over
> fidelity" and classical and modern deterministic large model ASRs. In
> short, with models like Whisper, the transcription will vary every time you
> run the model and will often deviate from what the speaker actually said if
> it is a statistically unlikely utterance, as they are designed to
> essentially rewrite the utterances into a more readable and understandable
> text, as opposed to capture what was actually said. For a live video
> conference where understandability trumps accuracy, that might be just
> fine, but for interview transcriptions to be used for content analysis and
> even for providing the most accurate transcriptions for assistive needs,
> keep those issues in mind. Also remember that models like Whisper can
> hallucinate and truncate *badly* and recommended mitigations often don't
> work on many content types:
> 
> https://blog.gdeltproject.org/a-deep-dive-exploration-applying-openais-whisper-asr-to-a-russian-television-news-broadcast/
> https://blog.gdeltproject.org/a-deep-dive-exploration-applying-openais-whisper-asr-to-a-pbs-newshour-broadcast/
> https://blog.gdeltproject.org/testing-the-new-openai-whisper-asr-large-v2-model-on-a-russian-tv-news-broadcast/
> https://blog.gdeltproject.org/openais-whisper-asr-how-the-nato-threat-to-putin-becomes-putins-threats-to-nato-the-challenges-of-machine-translation/
> https://blog.gdeltproject.org/experiments-with-whisper-asr-model-parameters-non-determinism-temperature_increment_on_fallback/
> https://blog.gdeltproject.org/ais-pivot-to-fluency-existential-non-determinism-vs-fbis-bbcm-osint-the-centrality-of-fidelity-in-translation/
> 
> The same holds for translation - keep in mind that the newer generative
> translation models like Meta's Seamless have a number of critical
> departures from what we think of as traditional NMT:
> 
> https://blog.gdeltproject.org/experiments-with-speech-transcription-translation-metas-seamlessm4t-vs-openais-whisper-vs-gcps-sttgt-vs-gcps-usm-chirpgt/
> 
> And keep in mind that if you're using LLMs to summarize or otherwise
> distill transcripts, hallucination will manifest in highly
> unpredictable ways based on the alignment of the source topics and the
> training data:
> 
> https://blog.gdeltproject.org/hallucination-in-summarization-when-chatgpt-hallucinated-new-stories-in-an-evening-news-broadcast/
> 
> Embedding models can be used for mitigation for monolingual tasks:
> 
> https://blog.gdeltproject.org/using-embedding-ranking-to-combat-llm-hallucination-in-generative-summarization-the-abc-news-chinese-spy-balloon-story/
> 
> But keep in mind that in multilingual tasks, even bitext models and the
> most recent LLM embedders will still favor stilted NMT translations over
> human translations, so they are counter-productive for multilingual
> mitigation contexts:
> 
> https://blog.gdeltproject.org/authoritative-human-vs-nmt-llm-translation-embedding-based-quality-rankings-nmt-skew/
> 
> Kalev
> 
> 
>> On Wed, Sep 6, 2023 at 7:06 PM Hamlet Lopez via Air-L <
>> air-l at listserv.aoir.org> wrote:
>> 
>> Hi Natalie,
>> 
>> I won't talk about the ethics involved here. But you can get high
>> quality automatic transcription of your interviews for free, from your
>> personal computer, if you are not afraid of console programs. You can
>> use open source software.
>> 
>> Check https://github.com/ggerganov/whisper.cpp
>> 
>> I use whisper.cpp for interviews in Spanish, with excelent results.
>> The English language models are even better.
>> 
>> Best wishes,
>> 
>> Hamlet
>> 
>>> On 9/6/23, Natalie Rock via Air-L <air-l at listserv.aoir.org> wrote:
>>> Ethics mavens, your insights would be appreciated: should I be concerned
>>> about this note sent from Rev.com? I use Rev to create automated
>> transcripts
>>> of research interviews. Does this note imply my transcripts might be used
>>> for AI training?
>>> 
>>> 
>>> Begin forwarded message:
>>> 
>>>> From: Your Team at Rev <yourteam at rev.com>
>>>> Date: September 6, 2023 at 14:08:15 PDT
>>>> 
>>>> In your Terms of Service and Data Processing Agreement with Rev, we
>> commit
>>>> to notifying you before any new third-party sub-processor starts
>>>> processing applicable data. We’re sending this message because we’re
>>>> planning to appoint a new third-party sub-processor for Rev to support
>> an
>>>> upcoming new feature.
>>>> 
>>>> OpenAI - OpenAI is a company focused on AI research and deployment.
>> Their
>>>> API platform provides models, tools, and technologies for AI
>> development.
>>>> These services are utilized to enhance the efficiency, scalability, and
>>>> flexibility of both human and Automatic Speech Recognition (ASR)
>> services.
>>>> This new third-party sub-processor has been verified to ensure it meets
>>>> Rev security and privacy standards and will meet Rev’s data processing
>>>> terms. No customer owned data will be provided to this third party until
>>>> the sub-processor notification period has passed.‌
>>>> 
>>>> 
>>> _______________________________________________
>>> The Air-L at listserv.aoir.org mailing list
>>> is provided by the Association of Internet Researchers http://aoir.org
>>> Subscribe, change options or unsubscribe at:
>>> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>>> 
>>> Join the Association of Internet Researchers:
>>> http://www.aoir.org/
>>> 
>> 
>> 
>> --
>> Dr.C. Hamlet López García
>> Investigador
>> Instituto Cubano de Investigación Cultural
>> "Juan Marinello"
>> Profesor
>> Universidad de la Habana
>> _______________________________________________
>> The Air-L at listserv.aoir.org mailing list
>> is provided by the Association of Internet Researchers http://aoir.org
>> Subscribe, change options or unsubscribe at:
>> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>> 
>> Join the Association of Internet Researchers:
>> http://www.aoir.org/
>> 
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> 
> Join the Association of Internet Researchers:
> http://www.aoir.org/



More information about the Air-L mailing list