Ideosphere Forum
From  Wed Oct 11 19:36:07 2017
Return-Path: <>
Received: from (localhost [])
	by (8.13.8/8.13.8) with ESMTP id v9BNZtco004527
	for <>; Wed, 11 Oct 2017 19:35:55 -0400
Received: (from majordomo@localhost)
	by (8.13.8/8.13.8/Submit) id v9BNZtk1004526
	for fx-discuss-outgoing; Wed, 11 Oct 2017 19:35:55 -0400
Received: from ( [])
	by (8.13.8/8.13.8) with ESMTP id v9BNZp0T004515
	for <>; Wed, 11 Oct 2017 19:35:54 -0400
Received: from [] (
	by with esmtp (Exim 4.86_2)
	(envelope-from <>)
	id 1e2QXW-0005i1-Q9
	for; Thu, 12 Oct 2017 01:35:50 +0200
Received: from ([])
	by with bizsmtp
	id LPbq1w00433LN9o01Pbq0L; Thu, 12 Oct 2017 01:35:50 +0200
Date: Thu, 12 Oct 2017 00:35:50 +0100 (BST)
From: "chrisran.bma e-mail" <>
To: fx-discuss <>
Message-ID: <>
In-Reply-To: <>
References: <> <> <> <>
Subject: Re: fx-discuss: FX Claim Tran - Machine Translation by 2015
MIME-Version: 1.0
Content-Type: multipart/alternative; 
X-Priority: 3
Importance: Medium
X-Mailer: Open-Xchange Mailer v7.6.2-Rev60
X-Originating-Client: com.openexchange.ox.gui.dhtml
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;;
	s=meg.feb2017; t=1507764950;
Precedence: bulk

 Possibly relevant to claim Tran - Machine translation by 2015:

Findings of the 2017 Conference on Machine Translation (WMT17)

(September 7-8, 2017)


Findings of the 2016 Conference on Machine Translation (WMT16)

(11-12 August 2016)

Perhaps the following looks like what we want:


5.5.2 Human evaluation results

Table 35 includes DA results for English-German and Table 36 shows results for
German-English APE systems. Clusters are identified by grouping systems together
according to which systems significantly outperform all others in lower ranking
clusters, according to Wilcoxon rank-sum test.

# Ave % Ave z System


− 84.8 0.520 HUMAN POST EDIT


1 78.2 0.261 AMU

   77.9 0.261 FBK

   76.8 0.221 DCU


4 73.8 0.115 JXNU


5 71.9 0.038 USAAR

   71.1 0.014 CUNI

   70.2 −0.020 LIG


− 68.6 −0.083 NO POST EDIT

Table 35: EN-DE DA Human evaluation results showing average raw DA scores (Ave
%) and average standardized scores (Ave z), lines between systems indicate
clusters according to Wilcoxon rank-sum test at p-level p ≤ 0.05.


Seems to indicate that human translation is better than machine translation, but
of course that doesn't guarantee that there isn't a better translation program
somewhere from pre 31 Dec 2015 that simply didn't attend the conference.

Still if human level translation existed in 2015, you would not expect to read
things like


This steady improvement has been mainly driven by the massive migration to the
neural approach, which in 2016 allowed the winning system to achieve impressive


I don't believe there is a program that can justifiably claim "equal or better
average quality, as professional human translations" but proving a negative is
difficult. I suggest if there was such a program it would be big news, not
difficult to find, and conference findings would be markedly different to those
linked above.

Not sure how much more a judge might want before deciding how to judge the
claim. Are there any more authoritative events or other event before claim
deadline of 31 Dec 2017? (Note program has to exist by 31 Dec 2015 and
translations have to 'be of comparable cost and turnaround time'.)



(crandles 7886)

Disclosure I hold -3603 in this claim

Back to regular message display

All trademarks, copyrights, and messages on this page are owned by their respective owners.
Forum: Copyright (c) 2000-2001 Javien Inc All rights reserved. Distributed under the GPL