Electronics-Related.com
Forums

Translation services/strategies/costs

Started by Don Y December 22, 2021
On 23/12/2021 10:35, Don Y wrote:
> On 12/23/2021 2:39 AM, Martin Brown wrote:
>> Alexa can't manage for example Tyne & Wear  (tine and weir), dialect >> Chop Gate (chop yat) and Cholmondeley (Chumlee) catch out most >> non-native English speakers in fact most non-locals. For that reason >> the latter was a location for sensitive military intelligence during >> WWII. > > Worcester (WUSS-ter), Billerica (bill-RICK-a), Berlin (BURR-lin, not > burr-LIN), > etc.  Or, words that folks often mispronounce (almond, salmon). > > I can identify folks who are from my home *town* (not "state"!) by their > speech habits -- highly localized.
I can recognise a fair number of British accents but I have all but lost mine from time spent away from my home town at university and overseas.
> A neighbor claimed her firstname to be "Lara" -- though she spelled it > L-A-U-R-A ("Isn't that Laura??"). > > [BTW, I'm still waiting for a pointer to the code you want compiled...]
If you are willing to give it a go I'll email you a copy (about 150k main file plus a couple of tiny header file stubs to satisfy includes). I don't seem to have your email contact details. My own peculiar looking reply-to address is valid provided that you do not alter it in any way. First time around just throw it at the Intel compiler and send me the error messages (or if by some happenstance it compiles and links OK the output of running it with no parameters - also about 100-200k). If you have any nice fast series 10 or 11 Intel CPU's I'd be interested in the output from running an MSC 2019 compiled executable on them too. I'm looking for SSE architectural differences affecting out of order and speculative execution (and how they have changed with time). I thought you were tied up until the year end. I know how pressured year end shipment deadlines can be. Good luck! Have a super Christmas! -- Regards, Martin Brown
PS most android smartphones have text to speech and support bluetooth earpieces.
Then you could write an application and maybe even use the phone's camera to guide the blind.
If you have a clue about image recognition.

Carrying a raspi and a lipo battery around is not that hard either, 10 hours should go
and fits in ones pocket.

Show us something you did apart from babble.
Else you are - in essence  - just waiting my time.

It is what it is.




On 12/24/2021 4:32 AM, Jan Panteltje wrote:
> On a sunny day (Fri, 24 Dec 2021 03:36:53 -0700) it happened Don Y > <blockedofcourse@foo.invalid> wrote in <sq47sd$qn2$1@dont-email.me>: > >> On 12/24/2021 1:55 AM, Jan Panteltje wrote: >>> On a sunny day (Thu, 23 Dec 2021 12:57:23 -0700) it happened Don Y >>> <blockedofcourse@foo.invalid> wrote in <sq2kbf$s5s$1@dont-email.me>: >>> >>>> On 12/23/2021 9:53 AM, Jan Panteltje wrote: >>>>> On a sunny day (Thu, 23 Dec 2021 08:43:24 -0700) it happened Don Y >>>>> <blockedofcourse@foo.invalid> wrote in <sq25f9$br9$1@dont-email.me>: >>>>> >>>>>> On 12/23/2021 6:16 AM, Jan Panteltje wrote: >>>>>>> On a sunny day (Wed, 22 Dec 2021 17:21:06 -0700) it happened Don Y >>>>>>> <blockedofcourse@foo.invalid> wrote in <sq0fdu$gjc$1@dont-email.me>: >>>>>>> >>>>>>>> On 12/22/2021 10:15 AM, Jan Panteltje wrote: >>>>>>>>> On a sunny day (Wed, 22 Dec 2021 09:30:43 -0700) it happened Don Y >>>>>>>>> <blockedofcourse@foo.invalid> wrote in <spvjrv$tj1$1@dont-email.me>: >>>>>>>>> >>>>>>>>>> It would be a tough call to determine if American English had evolved more >>>>>>>>>> OR LESS than the original British. I've read that American English is, in >>>>>>>>>> many ways, truer to its British roots than modern British English. >>>>>>>>>> >>>>>>>>>> Pronunciations also evolve, over time. As well as speech patterns. >>>>>>>>>> >>>>>>>>>> E.g., I was taught "the" should be pronounced as "thee" when preceding >>>>>>>>>> a word beginning with a vowel sound: "Thee English", "Thee other guy" >>>>>>>>>> but with a schwa ahead of a consonant: "The next one", "the Frenchman". >>>>>>>>>> This seems to no longer be the norm. >>>>>>>>>> >>>>>>>>>> [You're interested in these sorts of things when you design a >>>>>>>>>> speech synthesizer; the different "wh" sounds, etc.] >>>>>>>>> >>>>>>>>> A pretty decent text to speech is google translate. >>>>>>>>> >>>>>>>>> This script, called gst2_en on my system, has a female talk in english: >>>>>>>>> >>>>>>>>> #!/bin/bash >>>>>>>>> say() { local IFS=+;/usr/bin/mplayer -ao alsa -really-quiet -noconsolecontrols >>>>>>>>> "http://translate.google.com/translate_tts?ie=UTF-8&client=tw-ob&q=$*&tl=en"; } >>>>>>>>> say $* >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> You call it like this (with your text as example): >>>>>>>>> gst2_en ">E.g., I was taught "the" should be pronounced as "thee" when preceding" >>>>>>>>> >>>>>>>>> In the script the &tl=en can be changed for the language you want, so &tl=nl for Dutch and &tl=de for German. >>>>>>>>> >>>>>>>>> If you want the output to go to a mp3 file then use mplayer -dumpstream in that script. >>>>>>>>> >>>>>>>>> I find the quality better than other things I have tried. >>>>>>>>> >>>>>>>>> All Linux of course >>>>>>>> >>>>>>>> There are lots of synthesizers out there -- FOSS as well as commercial. >>>>>>>> But, those that run on a PC tend to be bloated implementations -- large >>>>>>>> dictionaries, unit databases, etc. And, require a fair bit of CPU >>>>>>>> to deliver speech in real-time. If you're trying to run in a small >>>>>>>> footprint consuming very little "energy" (think tiny battery), there >>>>>>>> really isn't much choice -- esp if you want to be able to tweek the voice >>>>>>>> to suit the listeners' preferences (with unconstrained vocabulary) >>>>>>> >>>>>>> Sure >>>>>>> But the advantge of this script is that it uses NO resources on the PC / raspi or whatever >>>>>> >>>>>> Of course it uses resources! You need a network stack, the memory to >>>>>> handle the packets delivered across that connection, the memory to support >>>>>> the shell, the filesystem from which to load the script and other binaries, >>>>>> the kernel, etc. >>>>> >>>>> Sure >>>>> >>>>>> You just assume they cost nothing because they are already present >>>>>> in your implementation. Take a *bare* rPi and see how much you have to >>>>>> add to it to make it speak. *That* is the resource requirement. >>>>> >>>>> Not sure what you mean by a 'bare rPi', but even my old raspi one has all that. >>>> >>>> Strip all of the code off of it so you are starting with *hardware*. >>>> Then, add back what you need to make it speak. >>> >>> OK, let me give you some example in this, and why the choice between apples and oranges . >>> Let's say we have nothing but a PIC 18F14k22 (because I have those). >>> >>> To do the internet thing you need a TCP stack and add a Microchip ethernet ENC28J60 chip >>> Microchip _has_ a TCP stack, but then what fun is it, I wrote one back in the 5 1/4 inch floppy days but >>> those files got lost, but here is project with an UDP stack I wrote in PIC asm >>> http://panteltje.com/panteltje/pic/ethernet_color_pic/ >>> controls room lighting from anywhere, been working fine 24/7 since 2013 >>> You you will need: >>> 1 PIC18F14K22 >>> 1 ENC28J60 >>> >>> Now let's see if we can do audio out with that >>> Sure I have done audio with same PIC: >>> http://panteltje.com/panteltje/pic/audio_pic/ >>> that used PWM somehow, but here I would use a R2R DAC on 8 PIC output pins perhaps. >>> The B I G question is now "With This chip can I decode the mp3 stream from google translate?" >>> Perhaps, would have to look a the source of mpg123 (open source C mp2/mp3 decoder) to see if I can do it all with 256 bytes >>> RAM >>> maybe the buffer size is too small, maybe need a bigger PIC or some external memory is needed. >>> Never wrote a mp3 decoder so question mark here. >>> Config system via RS232 in EEPROM (as in projects above) for IP, gateway, MAC etc etc. >>> So 2 chips (or 3 if memory), some caps, power regulator, wall wart, and now you need to make a board if it is for production. >>> And you need to write the asm. >>> And test an debug it >>> Estimated time: some days. >>> Cost per hour of qualified person? >>> A Raspberry Pi that has all connectors plus some, costs 47$ (went up a while ago because of chip shortages I have read) >>> and a small SDcard. >>> Runs Linux, is easy to program in minutes, and proven reliable, no board layouts needed, ready in an hour or so. >>> And can fall back on whatever speech synth. you installed on it if no internet connection for any reason. >>> The advantage of using google for speech is that THEY will do there best to make the audio as good as possible >>> and support several languages >>> >>> So, in short, the Raspberry way is faster, cheaper, better, proven reliable, available everywhere. >>> Now show us what YOU did. >> >> Put it *in* a bluetooth earpiece and have it run off the battery that's >> in that earpiece. Make sure that earpiece is paired with a BT host that >> ultimately has internet access -- to get to your google service. And, >> maintain this connectivity while I walk, drive, ride a bicycle or >> any other activity -- above or below ground. >> >> You're solving the wrong problem with a sledgehammer. > > You should have spcified that right away.
I stated: [You're interested in these sorts of things when you design a speech synthesizer; the different "wh" sounds, etc.] I didn't realize I had to explain my entire application in order to defend that statement. You also need to know braille if you want to transcribe emitted text into braille "on-the-fly". And, how to Daltonize visual presentations if you want to rely on vision. And, how to render graphic images if you want to present information graphically. Do I have to defend each of these statements, as well?
> So again, PIC, bluetooth chip, asm nothing new. > Some people here can even design it all in one chip. > But we are talking text to speech no (or did you change requirenent again)? > WTF would you get the text from?
A set of applications. As well as audio from music sources, annunciators, etc.
> Much simpler to use a normal bluetooth earpiece and a Raspberry Pi talking to it from a fixed place..
That sort of thinking says it's much easier to use a land-line phone in a fixed place (than to bother with all these silly cell phones) And, why bother with tablets and laptops when you can sit bolt-upright in front of your desktop PC?!
> https://pimylifeup.com/raspberry-pi-bluetooth/ > For other platforms / system bluetooth USB adaptors plenty, I have some for the PC, also bluetooth earpieces. > Like I said,the raspi can fall back on any otehr sinth. if no internet connections
Where is that other synthesizer? What resources does *it* use?
> AGAIN where is your text coming from? > You did not show any design or code
My synthesizers are *big*. There's a lot of code to implement the text normalization, prosody assignment, waveform synthesis, unit databases, exception dictionaries, letter-to-sound rules, etc. I've implemented a formant-based synthesizer modeled after an enhanced Klatt synthesizer, a diphone synthesizer (but with only one voice, presently, as sampling speech is tedious and requires a fair bit of time from the voice model), and an LPC coder. I've implemented the NRL LTS rules, Hunnicutt's as well as McIlroy's. (all of these technologies and documentation are available, publicly -- but you may have to do some digging) [The regression tests for the rule sets are each ~50pp. Rhyme tests another dozen or so. etc.] I've implemented fixed point and floating point versions of each (as necessary) to address the possibilities of having limited target capabilities. So, I can piece together ~20 different synthesizers with different resource requirements (and capabilities/limitations) by mixing and matching these modules. Unfortunately, there are no concrete criteria that I can use to decide which (combination of algorithms) is "optimal". Judging the quality of speech output isn't something that lends itself to a simple metric. And, different approaches have different tradeoffs (e.g., it's considerably harder to alter a diphone voice than one created via LPC or formant synthesis) [If algorithm 1 pronounces A, B and C well but chokes on D and E, is that better than algorithm 2 pronouncing C, D and E well and choking on A and B? Does it matter if any of them can pronounce "platypus" correctly? Will questions ever be uttered? If not, then why deal with the associated rising intonation? Will input text always be grammatically correct? Or, will I have to potentially deal with "nonsense"?] I didn't see any of google's code presented in YOUR implementation...
> Just babbling?
I don't think you've thought through the problem space. Close your eyes -- permanently -- as if you were blind, suffered from macular degeneration, diabetic retinopathy, etc. (We'll assume you're still mentally competent and able to suss-out the details of high tech kit so we can ignore that potential complication...) Your sole means of getting information from this gizmo is via audio (i.e., speech and annunciators). Now, take that brand new gizmo that you've been given and make it talk to you. How do you connect to the internet? How do you get status messages from it telling you that it *can't* connect? Are you sure it is powered on? How do you adjust the volume? Voice used? Speaking rate? How do you even KNOW how to do these things -- as you can't READ a manual! (Gee, it sure would be nice if the device could read it *to* you... but, you'd need to be able to synthesize speech BEFORE you'd *configured* the device in order to learn HOW to *configure* it! OTOH, if it could speak "natively", then it can prompt you to determine if you need assistance: "Would you like me to read the manual to you?") When your box stops talking, how do you know if it is because it lost the connection to google? Or, a bad battery? Or, just high latency or dropped packets? Is there a way to coax it to utter something to prove it is still operational (but, that would have to be an annunciator and not spoken word) How do you ask *it* how much battery run-time is left? Does it have to send a message to google so google can sort out how to speak that information? Similarly, when charging, how do you ask it how long until fully recharged? Maybe a series of coded beeps and bops and an outrageously good memory to sort out which is which? OTOH, if you can *always* speak -- even in the absence of google -- then you can convey all of this information just like any other application-generated information -- in spoken utterances: "Battery at 46% charge. Estimate 3 hours until depleted." "Battery at 52% charge. Estimate charging complete in 2:10." "Nominal battery life exceeded. Replace soon" "Volume level 12 (of 20)" "High frequency boost +3 (of 5)" "Low frequency cut -2 (of 5)" "Voice selected is Tommy" "Speech rate set to 200 words per minute" "Received (radio) signal strength low" "Name resolution error: google.com" "Unable to acquire DHCP lease." "Using IP 192.168.1.105" "Available networks are MYHOUSE23, VERIZON19 & YourPlace" "Selected server busy; use foo.bar as alternate" "For assistance, contact Dr. Jones at x5-2323" "Access denied" etc. But, being able to *always* speak means it has to be able to do so "cheaply" as you don't know how long it will have to rely on its own capabilities to communicate with the user!
On 12/24/2021 5:26 AM, Jan Panteltje wrote:
> > PS most android smartphones have text to speech and support bluetooth earpieces. > Then you could write an application and maybe even use the phone's camera to guide the blind. > If you have a clue about image recognition.
Go get that android phone. CLOSE YOUR EYES. Now, find the application and download it. Figure out how to pair your BT earpiece to the phone (still with your eyes closed). You suffer from the typical engineering Dunning-Kruger effect regarding misjudging your own assessment of problem domains. (surely you have considered the bootstrap issues I presented in previous post -- right?) I guess you'll write another version that runs under iOS, too? And, ensure that the application takes over the phone completely so the (blind) user isn't stuck wondering why your app isn't talking to him, presently. Or, how to get BACK to it after completing a phone call. And, when there's a new android version released, keep chasing it? Or, if google stops offering their (free!) service -- assuming, of course, that you don't mind google knowing everything that is SAID to you. Spend a day blind-folded. Or, in a wheelchair. Or, with ears plugged. Then, explain how "simple" it is to address these shortcomings. And, how patient you will be with the fools who think they understand the problem space and toss out half-baked solutions. EVERYTHING is easy if you don't actually have to *do* it! Obviously simple enough that there must be TONS of solutions out there, right? (where are they all hiding?)
> Carrying a raspi and a lipo battery around is not that hard either, 10 hours should go > and fits in ones pocket.
How wonderful of you to decide what users should be willing to carry -- and how! This, in *addition* to the android phone? What if their only *input* modality is a sip-n-puff? Or, a braille keypad? Or, an ASL glove? Surely the phone is designed to accommodate all of these... I just haven't found them in any stores!
> Show us something you did apart from babble.
Yeah, that's the Larkin trick. Then, when confronted with concrete evidence, skulk away in ignorance.
> Else you are - in essence - just waiting my time.
Show me you've even thought about the application -- as your comments clearly indicate you're just tilting at windmills. Wasting *my* time. Have you ever done anything with something bigger than a PIC? In anything other than ASM? Ever designed a system other than a prebuilt box (e.g., rPi)? Surely you've written kernel drivers? Network stacks? OSs? Show us. Something beyond trivial shell scripts.
> It is what it is.
yup. My sentiments exactly! Bye!
On 24/12/2021 11:15, Jeroen Belleman wrote:
> On 2021-12-23 15:37, David Brown wrote: >> On 23/12/2021 12:23, Jeroen Belleman wrote: >>> On 2021-12-23 10:39, Martin Brown wrote: >>> [...] >>>> Cholmondeley (Chumlee) catch out most >>>> non-native English speakers in fact most non-locals. [...] >>> >>> English is well known for its complete disconnect between >>> pronunciation and spelling, but this is ridiculous. >>> >> >> It is not a "complete disconnect" - not by a long way.&nbsp; Despite some of >> the common oddities of spelling in English, and some particularly >> unusual cases, there are far worse languages.&nbsp; Look at verb endings in >> French - many different spellings have different meanings, but are >> pronounced the same.&nbsp; Mongolian and Gaelic have a very much bigger >> separation between the phonetic values of the written spellings and the >> actual pronunciation.&nbsp; [...] > > French spelling is pretty regular, in the sense that spelling > usually unambiguously specifies the pronunciation. The reverse > is far from true though.
That is a good way to put it.
> I should know, I live there. >
From my school days, I found French was not bad for reading and writing (despite it seeming like all verbs are irregular), but I could never get the hang of understanding even simple spoken French. I am fortunate that the language of my second home - Norwegian - is quite regular in both directions (although of course regional dialects vary).
On a sunny day (Fri, 24 Dec 2021 06:04:01 -0700) it happened Don Y
<blockedofcourse@foo.invalid> wrote in <sq4gga$gs4$2@dont-email.me>:

>Have you ever done anything with something bigger than a PIC? In >anything other than ASM? Ever designed a system other than a >prebuilt box (e.g., rPi)? Surely you've written kernel drivers? >Network stacks? OSs?
>Show us. Something beyond trivial shell scripts.
My site is full of it, open source too.
>> It is what it is. > >yup. My sentiments exactly!
You are an asshole, posting here babble only, have nothing to show for, and cannot read it seems.
>Bye!
Nothing missed Post to a babble group, not to a design group. You have nothing.
On a sunny day (Fri, 24 Dec 2021 15:55:55 +0100) it happened David Brown
<david.brown@hesbynett.no> wrote in <sq4n1s$phk$1@dont-email.me>:

>On 24/12/2021 11:15, Jeroen Belleman wrote: >> On 2021-12-23 15:37, David Brown wrote: >>> On 23/12/2021 12:23, Jeroen Belleman wrote: >>>> On 2021-12-23 10:39, Martin Brown wrote: >>>> [...] >>>>> Cholmondeley (Chumlee) catch out most >>>>> non-native English speakers in fact most non-locals. [...] >>>> >>>> English is well known for its complete disconnect between >>>> pronunciation and spelling, but this is ridiculous. >>>> >>> >>> It is not a "complete disconnect" - not by a long way.&nbsp; Despite some of >>> the common oddities of spelling in English, and some particularly >>> unusual cases, there are far worse languages.&nbsp; Look at verb endings in >>> French - many different spellings have different meanings, but are >>> pronounced the same.&nbsp; Mongolian and Gaelic have a very much bigger >>> separation between the phonetic values of the written spellings and the >>> actual pronunciation.&nbsp; [...] >> >> French spelling is pretty regular, in the sense that spelling >> usually unambiguously specifies the pronunciation. The reverse >> is far from true though. > >That is a good way to put it. > >> I should know, I live there. >> > >From my school days, I found French was not bad for reading and writing >(despite it seeming like all verbs are irregular), but I could never get >the hang of understanding even simple spoken French. > >I am fortunate that the language of my second home - Norwegian - is >quite regular in both directions (although of course regional dialects >vary).
Here in the Netherlands they started teaching French in kindergarten. Maybe that is why I have few problems with the language when in France. They did not start with German and English until highschool.
On 2021-12-24 16:24, Jan Panteltje wrote:
> Here in the Netherlands they started teaching French in kindergarten. > Maybe that is why I have few problems with the language when in France. > They did not start with German and English until highschool.
Oh, these memories... French started when I was 11, English at 12, German at 13. Had exams in all 4 languages. Papa fume une pipe. Maman coupe le pain. Le soldat sur la mur. etc... But it still really helps on holydays in France :-} Arie
On a sunny day (Fri, 24 Dec 2021 18:14:09 +0100) it happened Arie de Muijnck
<noreply@ademu.com> wrote in <61c5ffe1$0$9511$e4fe514c@usenet.xs4all.nl>:

>On 2021-12-24 16:24, Jan Panteltje wrote: >> Here in the Netherlands they started teaching French in kindergarten. >> Maybe that is why I have few problems with the language when in France. >> They did not start with German and English until highschool. > >Oh, these memories... French started when I was 11, English at 12, >German at 13. Had exams in all 4 languages. > > Papa fume une pipe. > Maman coupe le pain. > Le soldat sur la mur. > etc... > >But it still really helps on holydays in France :-} > >Arie
https://www.youtube.com/watch?v=IJvI0WNihyM
Arie de Muijnck <noreply@ademu.com> wrote:

> On 2021-12-24 16:24, Jan Panteltje wrote: >> Here in the Netherlands they started teaching French in kindergarten. >> Maybe that is why I have few problems with the language when in France. >> They did not start with German and English until highschool. > > Oh, these memories... French started when I was 11, English at 12, > German at 13. Had exams in all 4 languages. > > Papa fume une pipe. > Maman coupe le pain. > Le soldat sur la mur. > etc... > > But it still really helps on holydays in France :-} > > Arie
I am Canadian and married my wife while I was with NATO in Metz, France. Our first child was a boy, and by the time he was 4, he could speak fluent French and English. He knew which was which, and never got them confused. It never ceased to amaze me how quickly children pick up languages. I suppose it is an evolutionary necessity to be able to say they are hungry (they are always hungry), and to learn the other things essential to life. It is a beautiful thing to watch.