[SWLUG] Announcing apertium-cy-en pre-alpha

Neil Jones neil at nwjones.demon.co.uk
Sun Jul 6 20:00:03 UTC 2008

On Mon, 2008-06-30 at 13:29 +0100, Jimmy O'Regan wrote:
> 2008/6/30 Neil Jones <neil at nwjones.demon.co.uk>:
> >
> > On Sun, 2008-06-29 at 21:21 +0100, Jimmy O'Regan wrote:
> >> 2008/6/29 Jimmy O'Regan <joregan at gmail.com>:
> >> > Hi
> >> >
> >> > We at the Apertium project (http://www.apertium.org) have an extremely
> >> > broken Welsh<->English translation in progress, that's now available
> >>
> > Interesting project. It is quite a challenge to get it working I'll bet.
> >
> Well, the most complicated part was initial mutation - our system was
> designed originally for romance languages, so there are a few
> challenges involved for any non-romance language, but not really as
> many as you might think. That said, it's early days yet :)
> > Well it obviously isn't working perfectly yet but it isn't disastrous.
> > The biggest problem seems to be lack of vocabulary. There is an
> > infamously broken translator called intertran that is live on the web
> > and that people have actually used to translate road and shop signs.
> Yes, we're quite aware of that one, and were amused that, though we
> have very few transfer rules (less than 30; our Catalan-English
> translator, for example, has something like 200) we still got better
> results from ours than from intertran :)

> > When I tell you that at one time it was translating apostrophe N. ('n)
> > an essential part of many present tense Welsh sentences as "Heartburn",
> <spectie>     Mae'r heddlu hefyd yn ymchwilio i honiadau ei bod hi'n
> cael perthynas â dyn llawer hŷn.
> <spectie>     the police Are then investigating his allegations be she
> getting relation with older many man.
> <spectie>     "He ' is being group police force also crookedly
> ymchwiliad I claims you go be she ' heartburn have relation he goes
> tight much hn & #375." (InterTran)
> Ours is the first, though I think I've corrected the case problem
> since he harvested that example :)

Intertran  produces results rather like the famous Hungarian Phrase Book
Sketch that the Monty Python team did. 

I have just come across an even worse problem a commercial translation
program that does Chinese to English and keeps putting the F-word into
English texts that appear on signs!


> > you'll understand the level of the problem. It turned "cyclists
> > dismount" into "inflammation of the bladder overturn" and "staff and
> > pupils' entrance" into "stick and pupils thrown into a trance". Those
> > are real signs that have appeared around Wales.
> >
> > Having said that human translations often are not much better. We have
> > nitwits around who think you can translate with a pocket dictionary.
> > You get things like shops selling "traffic jam and marmalade" and
> > whisky labelled "ghosts" instead of spirits.
> >
> Oh, thank God! We often get unrealistic expectations from people who
> fail to bear this in mind! Thank you for restoring my faith in
> humanity :D
> >  My favourite at present has to be a set of notices in Swansea put up by
> > the Police with their phone number on them. The English says "No
> > parking. Tow away zone." and a hyper polite translation of the Welsh is
> > "No parking. Masturbation zone". Flickr has hundreds of examples from
> > all around Wales in an area called Sgymraeg.
> >
> > This web page is part of the local Welsh Language Society website and
> > you can see the tow away sign there.
> >
> > http://www.tyrfe.com/Tafod/enghreifftiau.htm

You can see someone blogging about it in English here

Sorry about the bad language in the link.

> >
> >
> > A few problems I have noticed. Your translator doesn't cope with some of
> > the verbs properly and there is a problem with a peculiar genitive
> > construction.
> >
> Yes; we're working on the verbs; other things, we have to basically
> wait until Welsh speakers come to our IRC channel and throw questions
> at them (#apertium on irc.freenode.net for the masochistic). The main
> problem is that, the work is being done by another of our developers,
> with a few minor contributions from me, and neither of us speak Welsh
> (though he at least lived in Wales, and has access to Welsh speakers;
> I'm Irish, so once everything has been abstracted to the level of
> types of words, the grammar is mostly the same as Irish). We can get
> it working, but it would be a lot quicker if we had more contributions
> from Welsh speakers.
I'd be happy to help but I don't fancy hanging about on IRC all day.

> > An example from the site above is "Mae croeso i pyrfyrts y ddinas yma".
> > It should be "There is a welcome to the perverts of the city here."
> > Or even "There is a welcome to the city's perverts here".
> >
> > It gives "Is welcome pyrfyrts the city here". OK it is understandable
> > but I don't know how much of that understanding is because I speak
> > Welsh.
> >
> Ah. Well, one thing I should mention is this: for debugging purposes,
> there's an option to 'Mark unknown words': word reordering rules fail
> outright when we have out-of-vocabulary items (this is a common
> problem in MT).
> I think I have an idea about how to handle this (the gory details: we
> already attempt subject reordering, so we can probably infer the
> 'there' in the case of a subjectless sentence).
> > It needs to cope with "mae" meaning "there is" without compromising the
> > ability to cope with periphrastic verbal constructions properly.
> >
> > The absence of an indefinite article in Welsh doesn't help either.
> >
> No, it doesn't help :) We're rule based MT, but in situations like
> this, the language models of statistical MT look attractive, as they
> really are the best way to correct the output. (I'm not a fan of SMT
> in and of itself, but a lot of the ideas are sound). But that's
> something for the 'Future Directions' section of someone's research
> paper :)
> > The second problem in that sentence it the construction where Welsh uses
> > "SomethingA the SomthingB" to represent "The SomethingA of the
> > SomethingB". I have seen it referred to as the "Pobol y Cwm"
> > construction after the BBC's Welsh language soap opera.
> > (the) People (of) the Valley.
> >
> Yes, Irish has the same kind of construct.
> > Another one I have noticed is "y bydd" being translated as "the will".
> > where it should be "that will".
> >
> That's because we don't have a full part of speech tagger yet. Francis
> (cc'd), the main developer of cy-en, is working on that as I type
> this.

> > I think the Welsh to English translator is best to concentrate on.
> > Practically all Welsh speakers speak English apart from those in
> > Patagonia. I did actually see a shepherd on Cader Idris (a mountain up
> > north) on the BBC recently who didn't, he pretty obviously had a degree
> > of learning difficulty as his Welsh wasn't particularly coherent either.
> Well, it's pretty much the same thing for us; when we have a rule for
> one direction, it's normally the case that we can do the opposite
> thing in the other direction. The English-Welsh direction would need
> more attention from Welsh speakers for fine tuning than the
> Welsh-English does, but that's really about the only difference. (I've
> had the same suggestion for Irish too :)

> _______________________________________________
> SWLUG Discussion List - Discuss at swlug.org
> http://swlug.org/mailman/listinfo/discuss

More information about the Swlug mailing list