You can customize a model with a forced glossary or with a parallel corpus:Facebook Tweet LinkedIn How to translate Clinical Outcome Assessments (COAs) form training files to customize a translation model. Microsoft Excel - application/įor example, with curl, use the following content-type specification to indicate the format of a CSV file named glossary:. Use the file extensions shown in Supported file formats.Īlternatively, you can omit the file extension and specify one of the following content-type specifications for the file: You can indicate the format of a file by including the file extension with the file name. For more information, see Supported document formats for training data. You must encode all text data in UTF-8 format. The first row contains the language code. xlsx) - Excel file with the first two columns for aligned sentences and phrases. json) - Custom JSON format for specifying aligned sentences and phrases. The first column is for the source language code, and the second column is for the target language code. The first row must have two language codes. tab) - Tab-separated values (TSV) file with two columns for aligned sentences and phrases. csv) - Comma-separated values (CSV) file with two columns for aligned sentences and phrases. xliff) - XML Localization Interchange File Format (XLIFF) is an XML specification for the exchange of translation memories. tmx) - Translation Memory eXchange (TMX) is an XML specification for the exchange of translation memories. You can provide your training data for customization in the following document formats: For more information about customizing a translation model, including the formatting and character restrictions for data files, see Customizing your model. You can create a maximum of 10 custom models per language pair. To create a model that is customized with a parallel corpus and a forced glossary, customize the model with a parallel corpus first and then customize the resulting model with a forced glossary. The cumulative size of all uploaded corpus files for a custom model is limited to 250 MB.ĭepending on the type of customization and the size of the uploaded files, training time can range from minutes for a glossary to several hours for a large parallel corpus. To successfully train with parallel corpora, the corpora files must contain a cumulative total of at least 5000 parallel sentences. You can upload multiple parallel corpora files with a request. What your model learns from a parallel corpus can improve translation results for input text that the model has not been trained on. Use a parallel corpus when you want your custom model to learn from general translation patterns in parallel sentences in your samples.The size of a forced glossary file for a custom model is limited to 10 MB. You can upload only a single forced glossary file for a model. Use a forced glossary to force certain terms and phrases to be translated in a specific way.You can customize a model with a forced glossary or with a parallel corpus: Uploads training files to customize a translation model.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |