[SWLUG] Announcing apertium-cy-en pre-alpha
joregan at gmail.com
Mon Jun 30 12:29:05 UTC 2008
2008/6/30 Neil Jones <neil at nwjones.demon.co.uk>:
> On Sun, 2008-06-29 at 21:21 +0100, Jimmy O'Regan wrote:
>> 2008/6/29 Jimmy O'Regan <joregan at gmail.com>:
>> > Hi
>> > We at the Apertium project (http://www.apertium.org) have an extremely
>> > broken Welsh<->English translation in progress, that's now available
> Interesting project. It is quite a challenge to get it working I'll bet.
Well, the most complicated part was initial mutation - our system was
designed originally for romance languages, so there are a few
challenges involved for any non-romance language, but not really as
many as you might think. That said, it's early days yet :)
> Well it obviously isn't working perfectly yet but it isn't disastrous.
> The biggest problem seems to be lack of vocabulary. There is an
> infamously broken translator called intertran that is live on the web
> and that people have actually used to translate road and shop signs.
Yes, we're quite aware of that one, and were amused that, though we
have very few transfer rules (less than 30; our Catalan-English
translator, for example, has something like 200) we still got better
results from ours than from intertran :)
> When I tell you that at one time it was translating apostrophe N. ('n)
> an essential part of many present tense Welsh sentences as "Heartburn",
<spectie> Mae'r heddlu hefyd yn ymchwilio i honiadau ei bod hi'n
cael perthynas â dyn llawer hŷn.
<spectie> the police Are then investigating his allegations be she
getting relation with older many man.
<spectie> "He ' is being group police force also crookedly
ymchwiliad I claims you go be she ' heartburn have relation he goes
tight much hn & #375." (InterTran)
Ours is the first, though I think I've corrected the case problem
since he harvested that example :)
> you'll understand the level of the problem. It turned "cyclists
> dismount" into "inflammation of the bladder overturn" and "staff and
> pupils' entrance" into "stick and pupils thrown into a trance". Those
> are real signs that have appeared around Wales.
> Having said that human translations often are not much better. We have
> nitwits around who think you can translate with a pocket dictionary.
> You get things like shops selling "traffic jam and marmalade" and
> whisky labelled "ghosts" instead of spirits.
Oh, thank God! We often get unrealistic expectations from people who
fail to bear this in mind! Thank you for restoring my faith in
> My favourite at present has to be a set of notices in Swansea put up by
> the Police with their phone number on them. The English says "No
> parking. Tow away zone." and a hyper polite translation of the Welsh is
> "No parking. Masturbation zone". Flickr has hundreds of examples from
> all around Wales in an area called Sgymraeg.
> This web page is part of the local Welsh Language Society website and
> you can see the tow away sign there.
> A few problems I have noticed. Your translator doesn't cope with some of
> the verbs properly and there is a problem with a peculiar genitive
Yes; we're working on the verbs; other things, we have to basically
wait until Welsh speakers come to our IRC channel and throw questions
at them (#apertium on irc.freenode.net for the masochistic). The main
problem is that, the work is being done by another of our developers,
with a few minor contributions from me, and neither of us speak Welsh
(though he at least lived in Wales, and has access to Welsh speakers;
I'm Irish, so once everything has been abstracted to the level of
types of words, the grammar is mostly the same as Irish). We can get
it working, but it would be a lot quicker if we had more contributions
from Welsh speakers.
> An example from the site above is "Mae croeso i pyrfyrts y ddinas yma".
> It should be "There is a welcome to the perverts of the city here."
> Or even "There is a welcome to the city's perverts here".
> It gives "Is welcome pyrfyrts the city here". OK it is understandable
> but I don't know how much of that understanding is because I speak
Ah. Well, one thing I should mention is this: for debugging purposes,
there's an option to 'Mark unknown words': word reordering rules fail
outright when we have out-of-vocabulary items (this is a common
problem in MT).
I think I have an idea about how to handle this (the gory details: we
already attempt subject reordering, so we can probably infer the
'there' in the case of a subjectless sentence).
> It needs to cope with "mae" meaning "there is" without compromising the
> ability to cope with periphrastic verbal constructions properly.
> The absence of an indefinite article in Welsh doesn't help either.
No, it doesn't help :) We're rule based MT, but in situations like
this, the language models of statistical MT look attractive, as they
really are the best way to correct the output. (I'm not a fan of SMT
in and of itself, but a lot of the ideas are sound). But that's
something for the 'Future Directions' section of someone's research
> The second problem in that sentence it the construction where Welsh uses
> "SomethingA the SomthingB" to represent "The SomethingA of the
> SomethingB". I have seen it referred to as the "Pobol y Cwm"
> construction after the BBC's Welsh language soap opera.
> (the) People (of) the Valley.
Yes, Irish has the same kind of construct.
> Another one I have noticed is "y bydd" being translated as "the will".
> where it should be "that will".
That's because we don't have a full part of speech tagger yet. Francis
(cc'd), the main developer of cy-en, is working on that as I type
> I think the Welsh to English translator is best to concentrate on.
> Practically all Welsh speakers speak English apart from those in
> Patagonia. I did actually see a shepherd on Cader Idris (a mountain up
> north) on the BBC recently who didn't, he pretty obviously had a degree
> of learning difficulty as his Welsh wasn't particularly coherent either.
Well, it's pretty much the same thing for us; when we have a rule for
one direction, it's normally the case that we can do the opposite
thing in the other direction. The English-Welsh direction would need
more attention from Welsh speakers for fine tuning than the
Welsh-English does, but that's really about the only difference. (I've
had the same suggestion for Irish too :)
More information about the Swlug