Clean up of translation memories?
Thread poster: Peter Berntsen (X)
Peter Berntsen (X)
Peter Berntsen (X)  Identity Verified
Sweden
Local time: 17:15
English to Swedish
+ ...
Feb 5, 2015

Does anyone have any ideas on how to go about cleaning up a large TM that contains many old terms and incorrect translations? Where do you start?

 
Vadim Shirkozhukhov
Vadim Shirkozhukhov
Russian Federation
Local time: 19:15
English to Russian
Just Edit/Delete Specific Words Feb 5, 2015

If I had to clean my translation memory (eg. if someone paid me to do that), I would do it by searching for specific words/phrases that I consider wrong (incorrectly translated). I would find all instances of those words/phrases in my TM and either edit or delete them manually. I would do it directly in my TM environment. Based on my experience, both SDL Trados and MemoQ allow you to do that.

However, like I said, it would take someone paying me to do that. Normally, the idea of cl
... See more
If I had to clean my translation memory (eg. if someone paid me to do that), I would do it by searching for specific words/phrases that I consider wrong (incorrectly translated). I would find all instances of those words/phrases in my TM and either edit or delete them manually. I would do it directly in my TM environment. Based on my experience, both SDL Trados and MemoQ allow you to do that.

However, like I said, it would take someone paying me to do that. Normally, the idea of cleaning a large TM seems non-productive to me. Even if you only clean out specific words, it takes significant amounts of time. If you want to clean the whole thing, it will take forever. Depending on what you consider large, it may take ages (like months) to just look through a large TM (again, I would do it in my TM environment). And the benefits of having a clean TM (or not having incorrect translations in your TM) seem so insignificant they do not justify the effort to my eyes.
Collapse


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 17:15
English to Hungarian
+ ...
Open ended question Feb 5, 2015

It depends on what you want/need to do. If there are specific (incorrect) terms that you need to get rid of, and simply deleting the affected segments is an acceptable solution, then you may be able to do it fairly painlessly. If you want to fix poor translations of varying types, that will take a lot of work.


For scenario 1, you could use some kind of TM manager to batch delete segments. You can also do it with the newest version of TMLookup (link in TMLookup thread), but I
... See more
It depends on what you want/need to do. If there are specific (incorrect) terms that you need to get rid of, and simply deleting the affected segments is an acceptable solution, then you may be able to do it fairly painlessly. If you want to fix poor translations of varying types, that will take a lot of work.


For scenario 1, you could use some kind of TM manager to batch delete segments. You can also do it with the newest version of TMLookup (link in TMLookup thread), but I would suggest using a TM editor instead. I believe heartsome TMX editor and Olifant are popular choices. I don't use them so I can't give specific instructions, but someone else surely will. You will probably need to 1) export the TM to TMX. 2) open that TMX with your TM editor of choice, do the automated or manual changes you want, save the TMX and 3) create a new TM in your CAT and import the modified TMX.
Collapse


 
Silvio Picinini
Silvio Picinini  Identity Verified
United States
Local time: 09:15
English to Portuguese
+ ...
Criteria to decide cleanup Aug 18, 2017

Hi,
Farkas above has pointed you to how to do it, Okapi Olifant is great, Heartsome I don't know but should work, and cleant TMs in CAT tools is usually painful, so prefer a TM editor. However, before the "how" there is the "should I do it". How do you know if you have lots of errors? I am interested in this topic if people want to share ideas.
First I would take some or the QA checks that can be done with QA tools like Verifika and Xbench, or with the QA features in CAT tools. You
... See more
Hi,
Farkas above has pointed you to how to do it, Okapi Olifant is great, Heartsome I don't know but should work, and cleant TMs in CAT tools is usually painful, so prefer a TM editor. However, before the "how" there is the "should I do it". How do you know if you have lots of errors? I am interested in this topic if people want to share ideas.
First I would take some or the QA checks that can be done with QA tools like Verifika and Xbench, or with the QA features in CAT tools. You can find out if your have lots of inconsistent segments, if you are following your glossary, and a variety of other things. You may have specific regular expressions about mandatory things from your customer (like "our slogan should not be translated"). You can apply that to the TM and find segments that do not comply, maybe because they were created before the rule was established.
Once you find these errors, you have a number for them. Then consider if it is significant or not and decide if the cleanup is needed.

I am also interested in criteria that is specific to TMs (different from the checks above that can be applied to the content that you just translated. I wonder if a purge on the TM for older segments is a good idea. Also, if you (actually your end client) have TMs for obsolete products, should you recommend that they remove those segments from the TM?

I would appreciate to hear about it.

Thanks
Collapse


 
CafeTran Training (X)
CafeTran Training (X)
Netherlands
Local time: 17:15
Correct your TM on the fly Aug 20, 2017

Peter Berntsen wrote:

Does anyone have any ideas on how to go about cleaning up a large TM that contains many old terms and incorrect translations? Where do you start?


Like others have written here, the correction of a TM can be very time consuming.

In CafeTran you have this nice feature to make changes via Find and Replace simultaneously in the project and in the translation memories attached to that project.

Whenever I encounter a wrong term, typo etc. in my legacy TM (and in the project segments that have been populated from this legacy TM), I make sure that the cursor is placed in the incorrect word, press Cmd+F (Ctrl+F), type the correct replacement word and make sure that the correct Scope radio buttons are selected.

CafeTran will make sure that the case of the replacement string is automatically adapted (when the corresponding checkbox is selected):

1

While working in the translation project, I can also remove different translations for the same source segment (to condense the TM and possibly avoid the use of different target terms):

2

CafeTran also offers a full-fledged TMX editor, that allows you to execute its QA tasks, there are two tasks here that are especially useful:

3

In this QA mode you can also perform a spell check or a check for the use of forbidden words/the use of correct terminology from a glossary.


 
CafeTran Training (X)
CafeTran Training (X)
Netherlands
Local time: 17:15
Some more suggestions Aug 21, 2017

See also: http://www.proz.com/forum/cafetran_support/317815-cleaning_up_tms.html

 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Clean up of translation memories?







Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »