tips & tricks
for translators
PDF, AutoCAD, Lotus Notes, & WordPerfect  
Various filters

ECM Engineering

ECM engineering has filters for various file formats: illustrator 10, Illustrator 11, InDesign, CorelDraw, PhotoShop, Visio, Excel, Syscat. These are NOT free. Prices range from EUR 80 to EUR 240. General comments are that these filters work without problems.

Visit: ECM Engineering

Top - Home/Disclaimer


PDF files

Dealing with PDF files


The best way to deal with PDF files is to avoid them! :-)


Ask your client for the editable file that generated the PDF file. 


If the above is not possible:


    1. If text is heavily formatted, one has to scan and OCR the PDF file. Formatting of the resulting file will not be straightforward. Remember: LOTS OF DTP/FORMATTING to be done.
      OCR software used by members of the group: Omni Pro, Abbyy Finereader. Freeware OCR: SimpleOCR

    2. If formatting is not a must, open your document in Acrobat Reader; go to View and select Continuous; go to Edit and select Select All, then select Copy. Paste the text in a Word doc; format as needed (delete paragraph marks, etc.); and import to Déjà Vu normally. 

    3. Consider a PDF File a "hard-copy" translation. Translate it in a new document with "keyed" text.

    Most PDF converters/extractors will place text inside boxes, giving the document the same look as the original, but still very difficult to handle. That is, it is not "real formatting." Most of these tools are not free. For a list of some of them, click here.


Top - Home/Disclaimer

Splitting a PDF file

Steven Marzuola

To illustrate the procedure, assume that you want to break into 3 sections, with these page numbers:
1-300, 301-600, 601-780

Click Document / Extract pages, enter the numbers 1-300, (leave the box "Delete pages after Extraction" unchecked). This opens a new PDF document containing only those pages. Do a "Save As" and give this document an appropriate filename. Close it, and you're back at the original document.

Continue by extracting and saving pages 301-600, and finally 601-780.

Top - Home/Disclaimer


AutoCAD files

Working with AutoCAD files


Steven Marzuola


There is a program that quickly extracts the text from AutoCAD drawings, for translation outside the AutoCAD environment. After translation, the text is merged with the original graphic elements to produce translated drawings.


The translator does not need AutoCAD, but should have at least an AutoCAD viewer, which is available free or at a low cost. AutoCAD or a compatible program is needed only to perform preprocessing and post-processing (to resolve interference between text and other elements such as margins, lines, text boxes), but this job does not have to be done by the translator. In some cases it might be best for the client to do this job.

Steven Marzuola, can pre and post-process your AutoCAD files. For information about his services, please visit:


I have found a suggestion at ProZ.

It seems the address of the file in the article is broken. Please look for the TRANS in Xanadu's website.

Top - Home/Disclaimer


Importing Lotus Notes

Lotus notes files (.nsf) can be exported to a tab delimited file.

Once this is done, you can translate it normally and then import it back to Lotus Notes.

Top - Home/Disclaimer


Importing WordPerfect Files

Paul Cowan

I've had good success processing WordPerfect files in Déjà Vu X using the following workflow :

1. Open WP file in

2. Import OOo file into Déjà Vu and translate in the usual way.

3. Export.

4. Open the exported file in OOo and resave in any of the Word formats. RTF will also work, but generates Para Style codes in the final WordPerfect file that need to be stripped out.

5. Open the .doc file in WordPerfect and adjust formatting.

This produces relatively idiomatic WP files with much less formatting garbage than is produced if you use WP's own RTF filters out and in.

In my testing, all three .doc formats offered by OOo produced very similar results; the 6.0 and 95 ones were identical. The smallest WP file was generated from the 97/2000/XP conversion. RTF produced the biggest file, even after the Para Style codes mentioned above were removed.

NOTE: If you are still using an OOo version earlier than 2.0, this requires an open-source filter (import only for now) called WriterPerfect - find it here.


1. - Open documents in WordPerfect

2. - Save as Word docs (6.0/95 preferably)

3. - Translate

4. - Open translated "Word" doc in WordPerfect and save as a WordPerfect file.

Comment by Paul Cowan:
It might be a good idea to reiterate that using the proprietary filters in steps 2 and 4 will produce bigger files with lots of garbage.

Top - Home/Disclaimer

HELP files

Working with .chm files (.chm; Its; .hhp)

.chm are compiled HTML help files. To translate them, one needs to decompile, translate, and recompile them back.

There are some software that can help you decompile and re-compile those files.

More info at:
Help LogicWikipediaMicrosoft 1 and Microsoft 2

To download software to help you in your task:

Microsoft - Keytools (for .chm and Its)

From Suzanne Bolduc

Message 69446:

The workflow is the same, whether you use DV3 or DVX, or any other CAT.

1) decompile the CHM, using MS Html Help workshop.

2) look at what kind of files were extracted: help topics are in HTML format, there is most likely a TOC file with extension .hhc (could be something else*), an index file with extension .hhk, possibly a CSS file and images. Maybe other files, usually not for translation.

3) make sure your client also provided the *HHP* file for this file set -- some may bundle it inside the CHM (this way it's always current and available, but most won't because of an undesirable side-effect on the full text search feature (the content of the HHP file would be indexed). You need the HHP file to recompile your translated file set into CHM format (using MS HTML Help Workshop). In the HHP file, you'll typically need to change the target file name (for instance, if it contains a language code), the title attribute (title displayed in the title bar of the help viewer may contain translatable words -- note that this title will only be displayed when the help is viewed with appropriate user/system locale settings for the target language), the language attribute (for correct index sorting, and possibly the font settings (this is usually needed for CJK languages).

Normally you just copy the image files and the CSS file over to the target repertory, and translate all HTML, HHC, HHK files, using DV's usual HTML import filter, which you may want to customize (using HTMHide.txt as per the indications in the DV help) to avoid importing all HTML file paths that are part of the index and TOC files.

Once everything is translated, compile the target helpset. Make sure your index is properly sorted (there's a sort tool in MS HHW). If the index contains only single-target entries, you can set it to binary (it would then be automatically sorted). If there is more than one help topic for some index entries then don't use a binary index (because the topic titles won't be listed where expected in the list of found topics -- this is bug we have to live with since MS HHW is not maintained any more). The proper user/system locale settings must be enabled on your system when the index is to be sorted, in order to achieve the correct sort order.

* The reason why some clients will name their TOC file with a different extension is to work around another HHW bug: if the extension starts with "h", then the TOC is included in the full-text search, and this produces incorrect " untitled" hits in the Topics found search results. To avoid this, replace the " .hhc" with anything else that does not start with "h". This workaround is not an option for the HHP file, because MS HHW only recognizes ".hhp" for its HH project files.

Message 69448:

I think that translators should always compile their translated HTML Help files into the final CHM format themselves. This is the only way one can make sure the index is properly sorted and *tweaked*. I don't want series of redundant entries to remain in my translated index.

This often happens when the same notion is worded in different ways so that the users will have more chances to find what they're looking for in the index. When such 'synonym' entries are translated they may end up starting with the same word, or with the same letter, thus appearing consecutively in the translated index. When this happens I always manually remove the redundant entries from the index file -- they don't help the user; they only create unnecessary noise.

I always review the translated index and postedit it (partly in a text editor, partly in MS Help Workshop).

Another MS HHW bug I forgot to mention in my previous message (to Herbert) is that its manual index sort feature (the A/Z button) does not work well when you have sets of second-level entries in the index that belong to different first-level entries starting with the same word. You then need to move the misplaced entries to their proper location (under their parent first-level entry) manually. For instance, instead of:

meeting room
sorting with the A/Z button produces:
meeting room

Using the binary index options (set in the HHP file) produces a properly sorted index, but it has another bug (see my previous message).

---- ...and an additional note from Suzanne:

BTW, another very powerful set of tools to work with the MS HTML Help format (and also with the MS Help 2 format for .NET applications) is Robert Chandler's FAR HTML (shareware).

FAR is the only tool I know of, from which you can *print* the TOC or index of a compiled help set.

The FAR website provides a lot of information on these help formats, and more (including on the Longhorn/Vista help format).

The author himself appears to be very responsive, including on the user group:

Top - Home/Disclaimer



Working with InDesign files

DVX works with Adobe InDesign 2 files. Files should be saved as "tagged text" in InDesign. For more info on how to work with InDesign files directly in DVX (that is, without using StoryCollector or other filter), see page 339 of the Workgroup Manual. If you need to go through Trados, many times a client's requirement, see below...

Rasmus Carlsson / Tim Wright / Guy Penet

InDesign through Trados' StoryCollector

The StoryCollector that comes with Trados FL 6.5.5 works like a charm with InDesign CS. It exports ISC-files, that needs to be converted to TTX in Trados TagEditor. The TTX import fine in DVX, and then you just need to
reverse the process.

The plug-in files should be installed in a folder called Trados in the InDesign Plug-in folder. When you start InDesign a new menu option will appear on the menubar called Trados. This contains the options for importing and exporting.

All this information and how to proceed with the import/export routines can be found in the file StoryCollectorIND1033.hlp. Double click on the file and all your frustrations will be relieved.

Indesign CS2 is not compatible with Trados 6.5. Only version 2.0 of Indesign works with Trados 6.5. You will have to upgrade to Trados 7.0 for CS version of Indesign.

Copied from the Help file of the Indesign plug-in section of Trados 7.0:

TRADOS Story Collector is an InDesign Plug-in. The Plug-in is supported by InDesign 2.0 and InDesign CS. It allows you to gather up all the stories in an InDesign document so that they can be presented, in context, within one file for translation. Note that InDesign CS2 is not currently supported by the Story Collector.
Taken from: Story Collector for InDesign Help. Copyright (c) June 2005 TRADOS Inc.

InDesign and DVX

Björn Olofsson

Björn developed a whole procedure to work with InDesign files. Please download Translating_InDesign_documents_with_DVX.pdf.

Other filters


Top - Home/Disclaimer


FLASH files

Working with Flash files

Simon Midoux

There is a free utility called Tramigo at

Top - Home/Disclaimer

XLIFF files

Info about XLIFF files

Nico van de Water

For info about this standard, visit OASIS page AND read Nico's presentation.

To download XLIFF's dtd, click here.

(DV's XML filter is not appropriate to translate XLIFF files as they are bilingual files.)

Top - Home/Disclaimer

Publisher (.pub files)

No easy way back

If you have Publisher, it is possible to export files to rtf format (Word Art will not be exported). Once in rtf, it is pretty easy to handle in any CAT tool. The problem is that there is no way back. Translation should be DTPed back into the file, using Publisher.
See better method below.

Publisher files have the extension .pub.

For more options, see How to share a Publisher file with a user who does not have Publisher installed.

Top - Home/Disclaimer

Translating Publisher files through html

Selcuk Akyuz

The best way to translate .pub files is to save them as Web Pages, translate in any CAT tool, convert back to html format, and then open back in Publisher and save as a pub files. The only problem is with WordArt, which should be edited in Publisher afterwards.

Compared to the method above (exporting to rtf format and making DTP work afterwards), this method is better. There is no need for DTP work afterwards (except for WordArt).

Top - Home/Disclaimer


Catalyst Files

Gudmund Areskoug

You'd at least need to export stuff from Catalyst, e g. in its glossary format, then use that for mid-processing in Catalyst (thanks Suzanne for off-list input!).

I don't know yet what version Catalyst is needed for it, the free one, or one of these versions:
Translator/Pro Edition: € 999
Localizer Edition: €3999
Developer/Pro Edition: €6499

What I do know, is that if you have only the free version, you need the client to send you files made with a developer/pro edition in a processable format.

Other than that: try and get behind what source files the client actually wants translated. Chances are, you can do them all in DV, with or without messing about.

...and from a different message sent by Gudmund...

Some hasty reflections:

- If you get the freelance version (be aware that there are two different versions), you will only be able to process a certain kind of ttk files - the client has to have the right version for sending ttk's that can be processed in one of the free versions.

- If the client knows her/his way around (not always the case, it would seem), they should be able to export the stuff as TMX, which you can process in DV. Do take care to export with the right charset, and that the charset indicated in the exported file is the one it's really in. Report back if you choose that way, there's a free TMX validator I can point you to.

- Depending on how sensible (IMHO) a workflow they've opted for, they may or may not have chosen to block all manner of non-translatable strings...

- Worst case, compatibility-wise, Catalyst can handle plain text, tab delimited 2-column files (bilingual) for the Catalyst variety of pretranslation ("leverage").

- Be aware that there may be problems regarding string length etc. that will only show up in Catalyst if DV is used.

- The modern versions (not sure which editions) can interact/integrate with Trados. They also handle XLIFF files. Be aware that the exact meaning of the word "supports" has to be eked out from edition to edition.

I don't know what the XLIFF support "Visual XLIFF 1.0" implicates here, but since DV doesn't handle XLIFF (yet, at least), it would mean another (potentially lossy) round trip, e. g. to the po format. If XLIFF and/or XLIFF conversion is lossy in any one step, I doubt there's anything to be gained that way. If it isn't lossy, you might benefit from having access to string specific comments and advice via F6 in DV(X).

Will answer any additional questions as best I can if they pop up.

I got a lot of invaluable and friendly input from Suzanne Bolduc back then, thanks Suzanne! :)


If I understood Gudmund correctly, one should export translatable text in Catalyst's glossary format, translate it, and then import it back. It seems the export/import should be done with a paid version of the program.

Catalyst Website:

Top - Home/Disclaimer