Método De Previsão De Média Simples Em Movimento Simples


Winston Churchill Para melhorar é mudar, então ser perfeito é ter mudado com frequência. Toda influência, todo motivo, que provoca o espírito de assassinato entre os homens, impulsiona esses montanhistas a atos de traição e violência. A forte propensão aborígene para matar, inerente a todos os seres humanos, tem nesses vales preservado em força e vigor sem precedentes. Essa religião, que acima de tudo foi fundada e propagada pela espada, cujos princípios e princípios são instintivos com incentivos ao abate e que em três continentes produziram rugas de homens estimulam um fanatismo selvagem e implacável. O amor à pilhagem, sempre característico das tribos das colinas, é promovido pelo espetáculo de opulência e luxo que, a seus olhos, as cidades e planícies do sul aparecem. Um código de honra não menos puntil que o da velha Espanha, é suportado por vendettas tão implacáveis ​​quanto os da Córsega. A História da Força de Campo de Malakand: Um Episódio de Guerra da Fronteira (1898), Capítulo I Descrição das áreas tribais do que é hoje o Paquistão. Comumente referido como Waziristan Versão (es) do e-mail para download deste livro pode ser encontrada online no Project Gutenberg. É, graças ao céu, difícil, se não impossível, que o europeu moderno aprecie plenamente a força que o fanatismo exerce entre um ignorante, guerreiro e oriental população. Várias gerações se passaram desde que as nações do Ocidente atraíram a espada em controvérsias religiosas, e as máscaras lembranças do passado sombrio desapareceram rapidamente da luz forte e clara do racionalismo e da simpatia humana. Na verdade, é evidente que o cristianismo, por mais degradado e distorcido por crueldade e intolerância, deve sempre exercer uma influência modificadora nas paixões masculinas e protegê-las das formas mais violentas de febre fanática, pois estamos protegidos contra a varíola por vacinação. Mas a religião Mahmomedan aumenta, em vez de diminuir, a fúria da intolerância. Originalmente foi propagado pela espada, e desde então, seus devotos foram sujeitos, acima das pessoas de todos os outros credos, a essa forma de loucura. Em um momento, os frutos do trabalho paciente, as perspectivas de prosperidade material, o medo da própria morte, são lançados de lado. Os caminhos mais emocionais são impotentes para resistir. Todas as considerações racionais são esquecidas. Agarrando suas armas, tornam-se perigosas Ghazisas e tão sensíveis quanto os cachorros loucos: apenas são tratados como tais. Enquanto os espíritos mais generosos entre os membros da tribo se convulsionam em um êxtase de sedentarismo religioso, as almas mais pobres e mais materiais derivam impulsos adicionais da influência de outros, as esperanças de saque e a alegria de lutar. Assim, nações inteiras são despertadas para as armas. Assim, os turcos repelem seus inimigos, os árabes do Soudan rompem os quadrados britânicos, e o aumento da fronteira indiana se espalha bem. Em cada caso, a civilização é confrontada com o Mahomedanismo militante. As forças de progresso chocam com as de reação. A religião do sangue e da guerra está cara a cara com a da paz. Felizmente, a religião da paz geralmente é melhor armada. A História da Força de Campo de Malakand: Um Episódio de Guerra de Fronteira (1898), Capítulo III. Eu passo com alívio do mar de Causas e Teoria ao terreno firme de Resultado e Fato. A História da Força de Campo de Malakand: Um Episódio de Guerra de Fronteira (1898), Capítulo III. É melhor fazer as novidades do que levá-lo a ser um ator e não um crítico. A História da Força de Campo de Malakand: Um Episódio de Guerra de Fronteira (1898), Capítulo VIII. Nada na vida é tão excitante quanto a ser atingido sem resultado. A História da Força de Campo de Malakand: Um Episódio de Guerra de Fronteira (1898), Capítulo X. Quão terríveis são as maldições que o mahometanismo coloca em seus devotos Além do frenesi fanático, que é tão perigoso em um homem como a hidrofobia em um cachorro, lá É essa terrível apatia fatalista. Os efeitos são evidentes em muitos países. Os hábitos improvisados, os sistemas desleais da agricultura, os métodos de comércio lentos e a insegurança da propriedade existem onde quer que os seguidores do Profeta dominem ou vivam. Um sensualismo degradado priva essa vida de sua graça e refinamento ao longo da sua dignidade e santidade. O fato de que, na lei maometana, toda mulher deve pertencer a algum homem como sua propriedade absoluta, seja como criança, esposa ou concubina, deve atrasar a extinção final da escravidão até que a fé do Islã tenha deixado de ser um grande poder entre homens. Os muçulmanos individuais podem mostrar qualidades esplêndidas. Milhares se tornam os soldados valentes e leais da Rainha, todos sabem como morrer, mas a influência da religião paralisa o desenvolvimento social daqueles que a seguem. Não existe força retrógrada mais forte no mundo. Longe de ser moribunda, o mahometanismo é uma fé militante e proselitista. Já se espalhou por toda a África Central, levantando guerreiros destemidos a cada passo e se não fosse que o cristianismo fosse protegido nos braços fortes da ciência, a ciência contra a qual tinha lutado em vão, a civilização da Europa moderna poderia cair, como caiu a civilização Da Roma antiga. The River War: Uma conta histórica da Reconquista do Soudan (1899), Volume II, pp. 248250 (Esta passagem não aparece no resumo de um volume de 1902, a versão publicada pelo Project Gutenberg.) Versão (s) de download do e-mail Deste livro pode ser encontrado on-line no Project Gutenberg. É o hábito do boa constrictor ensopar o corpo de sua vítima com um limo sujo antes de devorá-lo e há muitas pessoas na Inglaterra, e talvez em outros lugares, que parecem incapazes Para contemplar operações militares para objetos políticos claros, a menos que eles possam persuadir-se da crença de que seu inimigo é totalmente e irremediavelmente vil. Para este fim, os Dervixes, do Mahdi e da Khalifa, foram carregados com toda variedade de abusos e encarregados de todos os crimes imagináveis. Isso pode ser muito reconfortante para as pessoas filantrópicas em casa, mas quando um exército no campo se torna imbuído com a idéia de que o inimigo é um vermeiro que acumula a terra, exemplos de barbaridade podem ser facilmente o resultado. Esta condenação não medida é, além disso, tão injusta como perigosa e desnecessária. The River War: Uma Relação Histórica da Reconquista do Soudan (1899), Volume II pp. 394395 (Esta passagem não aparece no resumo de um volume de 1902, a versão publicada pelo Projeto Gutenberg). Qual é a raiz verdadeira e original da aversão holandesa ao domínio britânico. É o medo e o ódio constantes do movimento que busca colocar o nativo ao nível do homem branco, o Kaffir deve ser declarado o irmão do europeu, ser Constituiu o seu direito igual, para ser armado com direitos políticos. Na Guerra do Boer. Londres para Ladysmith via Pretoria (1900). Eu acho que teremos que levar os chineses na mão e regulá-los. Acredito que, à medida que as nações civilizadas se tornarem mais poderosas, elas se tornarão mais implacáveis, e chegará o momento em que o mundo suportará impacientemente a existência de grandes nações bárbaras que, em qualquer momento, se armarão e ameaçariam nações civilizadas. Eu acredito na partição final da China, quero dizer, final. Espero que não possamos fazer isso em nossos dias. O estoque ariano é obrigado a triunfar. Discurso e entrevista na Universidade de Michigan, 1902. Nos dias anteriores, quando as guerras surgiam de causas individuais, da política de um ministro ou da paixão de um rei, quando foram combatidas por pequenos exércitos regulares de soldados profissionais e quando O curso foi retardado pelas dificuldades de comunicação e fornecimento, e muitas vezes suspenso pela temporada de inverno, foi possível limitar as responsabilidades dos combatentes. Mas agora, quando as populações poderosas são impelidas umas às outras, cada indivíduo é severamente amargado e inflamado quando os recursos da ciência e da civilização varrem tudo o que pode mitigar sua fúria, uma guerra européia só pode acabar com a ruína dos vencidos e quase não fatais Dislocação comercial e exaustão dos conquistadores. A democracia é mais vingativa do que os armários. As guerras dos povos serão mais terríveis que as dos reis. Câmara dos Comuns, 13 de maio de 1901, Hansard vol. 93 col. 1572. A capacidade de prever o que vai acontecer amanhã, na próxima semana, no próximo mês e no próximo ano e ter a capacidade depois de explicar por que isso não aconteceu. Entrevista de jornal (1902), quando perguntado quais as qualidades exigidas por um político, Halle, Kay, Irrepressível Churchill. Cleveland: World, 1966. citado em Churchill por ele mesmo (2008), ed. Langworth, PublicAffairs, p. 489 ISBN 1586486381 Os governos não criam nada e não têm nada para dar, exceto o que eles primeiro levaram, você pode colocar dinheiro nos bolsos de um conjunto de ingleses, mas será dinheiro tirado dos bolsos de outro conjunto de ingleses e a maior parte Será derrubado no caminho. Todo voto dado pela proteção é um voto para dar aos governos o direito de roubar Peter para pagar Paul e acusar o público de uma bela comissão no trabalho. Por que eu sou um Free Trader, Capítulo I em T. W. Steads journal Coming Men on Coming Questions (13 de abril de 1905), p. 9. As doutrinas que, ao manter os bens estrangeiros mais riqueza e, conseqüentemente, mais emprego, serão criados em casa, são verdadeiras ou não são verdadeiras. Afirmamos que eles não são verdadeiros. Nós afirmamos que, para que uma nação tente se tributar em prosperidade, é como um homem de pé em um balde e tentando se levantar pelo punho. 1: 9 Por Por que eu sou um comerciante livre (1905), Churchill revisou isso várias vezes, a primeira versão gravada que vem do discurso para o Comércio Livre no Free Trade Hall, Manchester, 19 de fevereiro de 1904: é a teoria do protecionista Que as importações são um mal. Ele pensa que, se você encerrar os produtos manufaturados importados estrangeiros, você irá fazer esses bens, além dos bens que você faz agora, incluindo os bens que fazemos para trocar pelos bens estrangeiros que entram. Se um homem pode acreditar Que ele pode acreditar em qualquer coisa. (Risos.) Nós comerciantes livres dizem que não é verdade. Pensar que você pode tornar um homem mais rico ao colocar um imposto é como um homem que pensa que ele pode ficar em um balde e levantar-se pelo punho. (Risos e alegrias.) 2: Vol. I: 261 A política é quase tão excitante quanto a guerra, e tão perigosa. Na guerra, você só pode ser morto uma vez. Mas na política muitas vezes. De uma troca conversacional com Harold Begbie, citada em Master Workers. Begbie, Methuen amp Co. (1906), p. 177. Por minha parte, sempre senti que um político deve ser julgado pelas animosidades que ele excita entre seus oponentes. Eu sempre me estabeleci não apenas para apreciar, mas para merecer a censura. 17 de novembro de 1906, Institute of Journalists Dinner, London in Churchill, Ele mesmo (2008), ed. Langworth, PublicAffairs, p. 392 ISBN 1586486381 As condições da ordenação do Transvaal sob as quais o Trabalho chinês está sendo atualizado não, na minha opinião, constituem um estado de escravidão. Um contrato de trabalho em que os homens entram voluntariamente por um período limitado e por um curto período, sob o qual recebem salários que consideram adequados, nos termos dos quais não são comprados ou vendidos e dos quais podem obter alívio mediante pagamento de dezessete libras dez xelins , O custo de sua passagem, pode não ser um contrato saudável ou adequado, mas não pode, na opinião do Governo de Sua Majestade, ser classificado como escravidão na aceitação extrema da palavra sem algum risco de inexactidão terminológica. Na Câmara dos Comuns. 22 de fevereiro de 1906 Kings Speech (Motion for a Address). Como subsecretário do escritório colonial. Repetindo o que havia dito durante a campanha eleitoral de 1906. Este é o contexto original para a inexactidão terminológica. Usado simplesmente literalmente, enquanto que mais tarde o termo assumiu o sentido de um eufemismo ou circunlocução para uma mentira. Conforme citado em Sayings of the Century (1984), de Nigel Rees. Eu dou respeitosamente à Casa como um princípio geral de que nossa responsabilidade neste assunto é diretamente proporcional ao nosso poder. Onde há um grande poder, há uma grande responsabilidade. Onde há menos poder, há menos responsabilidade, e onde não há poder pode, eu acho, não ser responsável. Na Câmara dos Comuns. 28 de fevereiro de 1906, as corridas nativas da África do Sul O Times está sem palavras, e leva três colunas para expressar a sua sintonia. Discurso no Kinnaird Hall, Dundee, Escócia (The Dundee Election), 14 de maio de 1908, no Liberalism and the Social Problem (1909), Churchill, BiblioBazaar (Segunda edição, 2006), p. 148 ISBN 1426451989 Qual é o uso da vida, se não for para lutar por causas nobres e fazer deste mundo confuso um lugar melhor para aqueles que viverão nela depois de nos termos. De outra maneira, podemos nos colocar em uma relação harmoniosa com a Grandes verdades e consolações do infinito e eterno E confesso a minha fé que estamos marchando em direção a melhores dias. A humanidade não será abatida. Nós estamos indo balançando bravamente para a frente ao longo da grande estrada e já atrás das montanhas distantes é a promessa do sol. Discurso no Kinnaird Hall, Dundee, Escócia (Desemprego), 10 de outubro de 1908, no Liberalismo e no Problema Social (1909), Churchill, Echo Library (2007), p. 87 ISBN 1406845817 O crescimento não natural e cada vez mais rápido das classes fracas e insanas, juntamente com uma restrição constante entre todos os estoques econômicos, energéticos e superiores, constitui um perigo nacional e racial que é impossível exagerar. Eu sinto que a fonte a partir da qual o fluxo de loucura é alimentado deve ser cortada e selada antes que outro ano tenha passado. (Secretário do Interior) Churchill ao Primeiro Ministro Asquith sobre a esterilização compulsória dos fracos e insanos citados, da seguinte forma (extraído de uma nota mais longa). Vale ressaltar que a eugenia não era um movimento marginal de cientistas obscuros, mas muitas vezes liderou e apoiou, na Grã-Bretanha e na América, por algumas das figuras públicas mais proeminentes do dia, em toda a divisão política, como Julian Huxley, Aldous Huxley, DH Lawrence, John Maynard Keynes e Theodore Roosevelt. De fato . Nenhum outro que Winston Churchill. Enquanto Ministro do Interior em 1910, fez a seguinte observação: texto da citação (citado em Jones, 1994: 9). . Em raça, esporte e sociedade britânica (2001), Carrington amp McDonald, Routledge, Introdução, nota 4, p. 20 ISBN 0415246296 Eu proponho que 100 mil britânicos degenerados sejam esterilizados por força e outros colocados em campos de trabalho para impedir o declínio da raça britânica. Como Secretário do Interior em um Documento Departamental de 1910. O documento original está na coleção de artigos da Asquiths na Bodleian Library em Oxford. Também citado em Clive Ponting. Churchill (Sinclair Stevenson, 1994). Tudo tende a catástrofe e colapso. Estou interessado, preparado e feliz. Não é horrível ser feito assim em uma carta a sua esposa Clemmie, durante a construção da Primeira Guerra Mundial. Como perseguir uma pílula de quinina em torno de uma pastagem de vaca. Ao jogar golfe. Como citado no verificador de cotações: quem disse o que, onde e quando (2006), Keyes, Macmillan, p. 27 ISBN 0312340044 Claro que eu sou disso, que você só tem que suportar conquistar. Você só precisa perseverar para salvar-se e salvar todos aqueles que confiam em você. Você só tem que ir para a direita, e no final da estrada, seja curto ou longo, a vitória e a honra serão encontradas. Observações no Guildhall, 4 de setembro de 1914, após a primeira vitória naval britânica da Primeira Guerra Mundial, o naufrágio de três cruzadores alemães na Batalha de Heligoland Bight. Como citado em Churchill: A Life. Martin Gilbert, Macmillan (1992), p. 279. ISBN 0805023968 Estou terminado. Ao perder sua posição no Almirantado em 1915. Disse a Lord Riddell. Como citado em Maxims e Reflections. Capítulo I (Sobre Si), Churchill, Houghton Mifflin Company (1947). A verdade é incontestável. O pânico pode se ressentir, a ignorância pode ridicularizar, a maldade pode distorcê-lo, mas está lá. Discurso na Câmara dos Comuns, 17 de maio de 1916, parecer real. Eu acho que uma maldição deve descansar em mim porque eu amo essa guerra. Eu sei que está esmagando e destruindo a vida de milhares a cada momento e, no entanto, não posso ajudá-lo. Aproveito cada segundo. Uma carta a um amigo (1916). Não há compromisso sobre o propósito principal sem paz até a vitória sem pacto com um erro impenitente - essa é a Declaração de 4 de julho de 1918. Em uma reunião anglo-americana conjunta em Westminster, em 4 de julho de 1918, falando contra os apelos para uma trégua negociada com Alemanha. Tal como impresso em War, pretende amplificar os ideais de paz: seleções em prosa amp verso (1919), editado pelo amplificador Tucker Brooke Henry Seidel Canby, Yale University Press, p. 138. A Grande Guerra diferiu de todas as guerras antigos no imenso poder dos combatentes e suas terríveis agências de destruição e de todas as guerras modernas na absoluta crueldade com que foi travada. A Europa e grandes partes da Ásia e da África tornaram-se um vasto campo de batalha no qual depois de anos de luta não exércitos, mas as nações quebraram e correu. Quando tudo acabou, a Tortura e o Canibalismo eram os únicos dois expedientes que os Estados civilizados, científicos e cristãos tinham sido capazes de negar-se: e eram de utilidade duvidosa. De The World Crisis, 19111918. Capítulo I (The Vials of Wrath), Churchill, Butterworth (1923). Pode-se também legalizar a sodomia para reconhecer os bolcheviques. Paris, 24 de janeiro de 1919. Churchill: A Life. Gilbert, Martin (1992). Nova Iorque: Holt, p. 408. ISBN 9780805023961 Eu não entendo essa sensação sobre o uso de gás. Nós definitivamente adotamos o cargo na Conferência da Paz de argumentar a favor da retenção de gás como um método permanente de guerra. É pura afetação lacerar um homem com o fragmento venenoso de uma concha quebrada e boggle em fazer seus olhos água por meio de gás lacrimogênico. Sou fortemente a favor do uso de gás envenenado contra tribos não civilizadas. O efeito moral deve ser tão bom que a perda de vidas deve ser reduzida ao mínimo. Não é necessário usar apenas os gases mais mortais: podem ser utilizados gases que causam grandes inconvenientes e que espalhariam um terror vivo e ainda não deixariam efeitos permanentes sérios sobre a maioria dos afetados. Não podemos, de qualquer modo, concordar com a não - Utilização de armas que estejam disponíveis para obter uma rápida extinção da desordem que prevalece na fronteira. Declaração como presidente do Conselho Aéreo, Minério Departamental do Departamento de Guerra (1919-05-12) Churchill Papers 1616, Churchill Archives Center. Cambridge. Muitos argumentam que as citações desta passagem são muitas vezes tomadas fora do contexto, porque Churchill está distinguindo entre agentes não-letais e os gases mortais usados ​​na Primeira Guerra Mundial e enfatizando o uso de armas não-letais, porém Churchill não exclui claramente o uso De gases letais, simplesmente afirmando que não é necessário usar apenas o mais mortal. Às vezes, afirma-se que o gás matou muitos curdos e árabes jovens e idosos quando a RAF bombardeou as aldeias rebeldes no Iraque em 1920 durante a ocupação britânica. Para mais informações sobre este assunto, veja Gás na Mesopotâmia. Lênin foi enviado para a Rússia pelos alemães da mesma forma que você poderia enviar um frasco contendo uma cultura de febre tifóide ou cólera para ser derramado no abastecimento de água de uma cidade excelente e funcionou com uma precisão surpreendente. Sobre Vladimir Ilyich Lenin. Na Câmara dos Comuns, 5 de novembro de 1919, citada em Churchill por Ele mesmo (2008), Ed. Langworth, PublicAffairs, p. 355 ISBN 1586486381 Primeiro, existem os judeus que, habitando em todos os países do mundo, se identificam com esse país, entram em sua vida nacional e, ao aderir fielmente à sua própria religião, consideram-se como cidadãos no sentido mais completo do Estado Que os recebeu. Um judeu que vivia na Inglaterra diria que eu sou um homem inglês que pratica a fé judaica. Esta é uma concepção digna e útil no mais alto grau. Nós, na Grã-Bretanha, sabemos bem que, durante a grande luta, a influência do que se pode chamar judeus nacionais em muitas terras foi lançada predominantemente do lado dos Aliados e no nosso próprio exército. Os soldados judeus desempenharam um papel muito distinto, alguns subindo para O comando dos exércitos, outros conquistando a Cruz de Victoria para o valour. Não há necessidade de exagerar o papel desempenhado na criação do bolchevismo e na concretização da Revolução Russa, por esses judeus internacionais e, na sua maioria, ateístas. Certamente é muito bom, provavelmente supera todos os outros. Com a notável exceção de Lenin, a maioria das figuras principais são judeus. Além disso, a principal inspiração e força motriz vem dos líderes judeus. Assim, Tchitcherin, um russo puro, é eclipsado por seu subordinado nominal Litvinoff, e a influência de russos como Bukharin ou Lunacharski não pode ser comparada com o poder de Trotsky, ou de Zinovieff, o Ditador da Citadela Vermelha (Petrogrado) ou de Krassin ou Radek - todos os judeus. Nas instituições soviéticas, a predominância dos judeus é ainda mais surpreendente. E o proeminente, se não o principal, parte no sistema de terrorismo aplicado pelas Comissões Extraordinárias para Combater a Contra-Revolução foi tomado pelos judeus e, em alguns casos notáveis, por judias. A mesma proeminência do mal foi obtida pelos judeus no breve período de terror durante o qual Bela Kun governou na Hungria. O mesmo fenômeno foi apresentado na Alemanha (especialmente na Baviera), na medida em que essa loucura foi autorizada a vencer a prostração temporária do povo alemão. Embora em todos esses países haja muitos não-judeus tão ruins quanto os piores dos revolucionários judeus, o papel desempenhado por este último em proporção ao número deles na população é surpreendente. O sionismo contra o bolchevismo, ilustrado Sunday Herald (fevereiro de 1920) (Uma nota: Churchill viu o bolchevismo como um fenômeno fortemente judaico. Ele contrastou o papel judaico na criação do bolchevismo com uma visão mais positiva do papel que os judeus haviam desempenhado na Inglaterra.1 ). Os esquemas dos judeus internacionais. Os adeptos desta sinistrada confederação são principalmente homens criados entre as infelizes populações de países onde os judeus são perseguidos por causa de sua raça. A maioria, se não todos eles, abandonou a fé de seus antepassados ​​e separou de suas mentes todas as esperanças espirituais do próximo mundo. Esse movimento entre os judeus não é novo. Nos dias de Spartacus-Weishaupt aos de Karl Marx, e a Trotsky (Rússia), Bela Kun (Hungria), Rosa Luxemburg (Alemanha) e Emma Goldman (Estados Unidos), essa conspiração mundial para o derrube de A civilização e a reconstituição da sociedade com base no desenvolvimento preso, na malevolência invejosa e na igualdade impossível, vem aumentando constantemente. Jogou, como um escritor moderno, a Sra. Webster, mostrou tão bem, uma parte definitivamente reconhecível na tragédia da Revolução Francesa. Tem sido a força motriz de todo movimento subversivo durante o século XIX e, finalmente, essa banda de personalidades extraordinárias do mundo subterrâneo das grandes cidades da Europa e América tem agarrado o povo russo pelo cabelo de suas cabeças e tornaram-se praticamente incontestáveis Mestres desse enorme império. Rt. Hon. O Bolchevismo de Winston Churchill versus o sionismo luta pela alma do povo judeu no Herald diário ilustrado, 8 de fevereiro de 1920. No entanto, podemos considerar as dificuldades do general Dyer durante os tumultos Amritsar, sobre a situação ansiosa e crítica no Punjab, sobre o Perigo para os europeus em toda a província, um fato tremendo se destaca: o abate de quase 400 pessoas e o ferimento de provavelmente três a quatro vezes mais, no Jallian Wallah Bagh em 13 de abril. Esse é um episódio que me parece sem precedentes ou paralelos na história moderna do Império Britânico. É um evento extraordinário, um evento monstruoso, um evento que se destaca em um isolamento singular e sinistro. Discurso na Câmara dos Comuns, 8 de julho de 1920, Amritsar na época, Churchill servia como Secretário de Estado da Guerra sob o primeiro-ministro David Lloyd George Men, que toma as armas contra o Estado deve esperar a qualquer momento para ser demitido. Os homens que tomam armas ilegalmente não podem esperar que as tropas esperem até que estejam prontos para começar o conflito. Discurso na Câmara dos Comuns, 8 de julho de 1920, Amritsar na época, Churchill servia como Secretário de Estado da Guerra sob o primeiro-ministro David Lloyd George A Frightfulness não é um remédio conhecido pela British Pharmacopaeia. Discurso na Câmara dos Comuns, 8 de julho de 1920 Amritsar na época, Churchill servia como Secretário de Estado da Guerra sob o primeiro-ministro David Lloyd George. Eu não cedo a ninguém na minha detestança do bolchevismo e da violência revolucionária que a precede . Mas meu ódio ao bolchevismo e aos bolcheviques não se baseia no seu sistema de economia bárbara, ou na sua absurda doutrina de uma igualdade impossível. Ela surge do terrorismo sangrento e devastador que eles praticam em todas as terras em que eles quebraram, e pelo qual seu regime criminal pode ser mantido. Os governos que se apoderaram do poder pela violência e pela usurpação freqüentemente recorreram ao terrorismo em seus esforços desesperados para manter o que roubaram, mas a estrutura augusta e venerável do Império Britânico não precisa de tal ajuda. Tais idéias são absolutamente estranhas à maneira britânica de fazer as coisas. Discurso na Câmara dos Comuns, 8 de julho de 1920 Amritsar Deixe-me reunir os fatos. A multidão estava desarmada, exceto com pancada. Não estava atacando ninguém nem nada. Estava realizando uma reunião sediciosa. Quando o fogo foi aberto sobre ele para dispersá-lo, ele tentou fugir. Pinched em um lugar estreito consideravelmente menor do que Trafalgar Square, com quase nenhuma saída, e embalado para que uma bala passasse por três ou quatro corpos, as pessoas correram loucamente dessa maneira e a outra. Quando o fogo foi direcionado para o centro, eles correu para os lados. O fogo foi então direcionado para os lados. Muitos se jogaram no chão e o fogo foi direcionado no chão. Isso foi continuado por 8 ou 10 minutos. Se a estrada não tivesse sido tão estreita, as metralhadoras e os carros blindados se juntaram. Finalmente, quando a munição chegou ao ponto em que bastava o suficiente para permitir o retorno seguro das tropas, e depois de 379 pessoas terem sido Mortos e, quando com certeza 1.200 ou mais foram feridos, as tropas, a quem nem uma pedra havia sido jogada, se aproximaram e marcharam. Temos que deixar absolutamente claro que esta não é a maneira britânica de fazer negócios. Nosso reinado, na Índia ou em qualquer outro lugar, nunca foi baseado apenas na força física, e seria fatal para o Império Britânico se tentássemos nos basear apenas nisso. Discurso na Câmara dos Comuns, 8 de julho de 1920 Amritsar Não consigo fingir que é imparcial sobre as cores. Eu me regozijo com os brilhantes, e realmente sinto muito pelos pobres marrons. Na pintura como um passatempo, publicado pela primeira vez na revista Strand em duas partes (dezembro de 1921, janeiro de 1922), citada em Churchill por ele mesmo (2008), ed. Langworth, PublicAffairs, p. 456 ISBN 1586486381 Ele deveria estar preso mão e pé às portas de Deli, e depois pisoteado por um enorme elefante com o novo vice-rei sentado de costas. Referindo-se a Mahatma Gandhi em conversa com Edwin Montagu, Secretário de Estado da Índia, 1921. 3 4 Todos os dias você pode fazer progresso. Cada passo pode ser frutífero. No entanto, vai se esticar diante de você um caminho cada vez maior, cada vez mais ascendente e sempre melhorado. Você sabe que você nunca chegará ao final da jornada. Mas isso, longe de desencorajar, só contribui para a alegria e a glória da escalada. Na pintura como um passatempo, The Strand Magazine (dezembro de 1921, janeiro de 1922), citado em Churchill por ele mesmo (2008), ed. Langworth, PublicAffairs, p. 568 ISBN 1586486381 Estou ansioso por lidar com assuntos que todos os membros sabem são assuntos extremamente delicados, não devo usar nenhuma frase ou expressão que possa causar ofensa aos nossos amigos e aliados no continente ou ao longo do Oceano Atlântico. Falando sobre dívidas inter-aliadas na Câmara dos Comuns (10 de dezembro de 1924) relatadas em Debates parlamentares (Commons) (1925), 5ª série, vol. 179, col. 259. A escolha estava claramente aberta: esmague-os com força vã e sem forças, ou tente dar-lhes o que eles querem. Estas eram as únicas alternativas, e embora cada uma tivesse defensores ardentes, a maioria das pessoas não estava preparada para qualquer um. Aqui, de fato, o espectro irlandês era horrível e inexorável. A Crise Mundial, Volume V. as Consequências (1929), Churchill, Butterworth (Londres). Nenhuma hora de vida é perdida que é passada na sela. Minha vida adiantada, 18741904 (1930), Churchill, Winston S. p. 45 (1996 Touchstone Edition), ISBN 0684823454 Poderia uma bomba não maior que uma laranja ser encontrada para possuir um poder secreto para destruir um bloco inteiro de edifícios, não para concentrar a força de mil toneladas de cordite e explodir um município de golpe. Pall Mall Gazette (1924) sobre a sugestão de HG Wells de uma bomba atômica, no artigo da BBC Muitas vezes o homem forte e silencioso é silencioso apenas porque ele não sabe o que dizer e é reputado forte apenas porque ele permaneceu em silêncio. Winston S. Churchill: His Complete Speeches (1974), Chelsea House, Volume IV: 19221928, p. 3462 ISBN 0835206939 Eu declínio totalmente imparcial quanto entre o corpo de bombeiros e o fogo. Discurso na Câmara dos Comuns, 7 de julho de 1926, Serviços de Emergência. Respondendo às críticas de que ele editou a Gazeta britânica de forma tendenciosa durante a greve geral. Como citado no The Yale Book of Quotations (2006), ed. Fred R. Shapiro, Yale University Press, p. 152 ISBN 0300107986 Faça suas mentes perfeitamente claras, que, se alguma vez, você soltará sobre nós novamente uma greve geral, perderemos sobre você outra Gazeta britânica. Discurso na Câmara dos Comuns, 7 de julho de 1926 Serviços de Emergência no momento, Churchill atuava como Chanceler do Excheqer sob o primeiro ministro Stanley Baldwin. Ameaçando o Partido Trabalhista e o movimento sindical com um retorno do jornal publicado pelo governo que ele editou durante essa greve geral de Mays. Se eu tivesse sido italiano, tenho certeza de que eu teria estado inteiramente com você desde o início até o fim de sua luta vitoriosa contra os apetites e paixões bestiais do leninismo. Para Benito Mussolini em conferência de imprensa em Roma (janeiro de 1927), citado em Churchill. A Life (1992) de Martin Gilbert. Uma ovelha em roupas de carneiros. Em Ramsay MacDonald. Isso geralmente é tomado como referente a Clément Attlee. Mas o historiador escocês D. W. Brogan é citado em Safires Political Dictionary (2008), William Safire. Oxford University Press US, p. 352 ISBN 0195343344 as follows: Sir Winston Churchill never said of Clement Attlee that he was a sheep in sheeps clothing. I have this on the excellent authority of Sir Winston himself. The phrase was totally inapplicable to Mr. Attlee. It was applicable, and applied, to J. Ramsay MacDonald, a very different kind of Labour leader. To improve is to change, so to be perfect is to have changed often. Winston Churchill, His complete speeches, 18971963, edited by Robert Rhodes James, Chelsea House ed. Vol. 4 (19221928), p. 3706. Lors dun dbat avec Philipp Snowden, chancelier de lEchiquier, propos des droits de douane sur la soie. Often misquoted as: To improve is to change, to be perfect is to change often. An infected Russia, a plague-bearing Russia a Russia of armed hordes not only smiting with bayonet and with cannon, but accompanied and preceded by swarms of typhus-bearing vermin which slew the bodies of men, and political doctrines which destroyed the health and even the souls of nations. The Aftermath . by Winston Churchill (published 1929), p. 274 My Early Life: A Roving Commission (1930) Edit She shone for me like the Evening Star. I loved her dearly but at a distance. On his mother, Lady Randolph Churchill, Chapter 1 (Childhood). Where my reason, imagination or interest were not engaged, I would not or I could not learn. Chapter 1 (Childhood). Thus I got into my bones the essential structure of the ordinary British sentence, which is a noble thing. On studying English rather than Latin at school, Chapter 2 (Harrow). Headmasters have powers at their disposal with which Prime Ministers have never yet been invested. Chapter 2 (Harrow). Mr. Gladstone read Homer for fun, which I thought served him right. Chapter 2 (Harrow). I then had one of the three or four long intimate conversations with him which are all I can boast. On his father, Lord Randolph Churchill, Chapter 3 (Examinations). In retrospect these years form not only the least agreeable, but the only barren and unhappy period of my life. I was happy as a child with my toys in my nursery. I have been happier every year since I became a man. But this interlude of school makes a sombre grey patch upon the chart of my journey. It was an unending spell of worries that did not then seem petty, of toil uncheered by fruition a time of discomfort, restriction and purposeless monotony. This train of thought must not lead me to exaggerate the character of my school days Harrow was a very good school Most of the boys were very happy I can only record the fact that, no doubt through my own shortcomings, I was an exception. I was on the whole considerably discouraged All my contemporaries and even younger boys seemed in every way better adapted to the conditions of our little world. They were far better both at the games and at the lessons. It is not pleasant to feel oneself so completely outclassed and left behind at the very beginning of the race. Chapter 3 (Examinations). Certainly the prolonged education indispensable to the progress of Society is not natural to mankind. It cuts against the grain. A boy would like to follow his father in pursuit of food or prey. He would like to be doing serviceable things so far as his utmost strength allowed. He would like to be earning wages however small to help to keep up the home. He would like to have some leisure of his own to use or misuse as he pleased. He would ask little more than the right to work or starve. And then perhaps in the evenings a real love of learning would come to those who are worthy and why try to stuff in those who are not and knowledge and thought would open the magic casements of the mind. Chapter 3 (Examinations). I had a feeling once about Mathematics, that I saw it allDepth beyond depth was revealed to methe Byss and the Abyss. I saw, as one might see the transit of Venusor even the Lord Mayors Show, a quantity passing through infinity and changing its sign from plus to minus. I saw exactly how it happened and why the tergiversation was inevitable: and how the one step involved all the others. It was like politics. But it was after dinner and I let it go Chapter 3 (Examinations), p. 27. Although always prepared for martyrdom, I preferred that it should be postponed. Chapter 4 (Sandhurst), p. 72. You will make all kinds of mistakes but as long as you are generous and true, and also fierce, you cannot hurt the world or even seriously distress her. Chapter 4 (Sandhurst). I wonder whether any other generation has seen such astounding revolutions of data and values as those through which we have lived. Scarcely anything material or established which I was brought up to believe was permanent and vital, has lasted. Everything I was sure or taught to be sure was impossible, has happened. Chapter 5 (The Fourth Hussars). I have no doubt that the Romans planned the time-table of their days far better than we do. They rose before the sun at all seasons. Except in wartime we never see the dawn. Sometimes we see sunset. The message of sunset is sadness the message of dawn is hope. The rest and the spell of sleep in the middle of the day refresh the human frame far more than a long night. We were not made by Nature to work, or even play, from eight oclock in the morning till midnight. We throw a strain upon our system which is unfair and improvident. For every purpose of business or pleasure, mental or physical, we ought to break our days and our marches into two. Chapter 6 (Cuba). I do think unpunctuality is a vile habit, and all my life I have tried to break myself of it. Chapter 7 (Hounslow). I now began for the first time to envy those young cubs at the university who had fine scholars to tell them what was what professors who had devoted their lives to mastering and focusing ideas in every branch of learning who were eager to distribute the treasures they had gathered before they were overtaken by the night. But now I pity undergraduates, when I see what frivolous lives many of them lead in the midst of precious fleeting opportunity. After all, a mans Life must be nailed to a cross either of Thought or Action. Without work there is no play. Chapter 9 (Education At Bangalore). I accumulated in those years so fine a surplus in the Book of Observance that I have been drawing confidently upon it ever since. Chapter 9 (Education At Bangalore). It is a good thing for an uneducated man to read books of quotations. Bartletts Familiar Quotations is an admirable work, and I studied it intently. The quotations when engraved upon the memory give you good thoughts. They also make you anxious to read the authors and look for more. Chapter 9 (Education At Bangalore). I had been brought up and trained to have the utmost contempt for people who got drunk and I would have liked to have the boozing scholars of the Universities wheeled into line and properly chastised for their squalid misuse of what I must ever regard as a gift of the gods. Chapter 10 (The Malakand Field Force). Never, never, never believe any war will be smooth and easy, or that anyone who embarks on the strange voyage can measure the tides and hurricanes he will encounter. The statesman who yields to war fever must realise that once the signal is given, he is no longer the master of policy but the slave of unforeseeable and uncontrollable events. Antiquated War Offices, weak, incompetent, or arrogant Commanders, untrustworthy allies, hostile neutrals, malignant Fortune, ugly surprises, awful miscalculations all take their seats at the Council Board on the morrow of a declaration of war. Always remember, however sure you are that you could easily win, that there would not be a war if the other man did not think he also had a chance. Chapter 18 (With Buller To The Cape), p. 246 Quoted in This Time Its Our War (2003) by Leonard Fein in The Forward (July 25. 2003 ). The 1930s Edit After annexation Zaolzie (part of Czechoslovakia) by Poland in October 1938 Poland is a greedy hyena of Europe. I remember, when I was a child, being taken to the celebrated Barnums circus, which contained an exhibition of freaks and monstrosities. But the exhibit on the programme which I most desired to see was the one described as The Boneless Wonder. My parents judged that that spectacle would be too revolting and demoralising for my youthful eyes, and I have waited 50 years to see the boneless wonder sitting on the Treasury Bench. A jibe at Prime Minister (and First Lord of the Treasury ) Ramsay MacDonald during a speech in the House of Commons, January 28, 1931 Trade Disputes and Trade Unions (Amendment) Bill . India is a geographical term. It is no more a united nation than the equator. Speech at Royal Albert Hall, London (18 March 1931). It is alarming and also nauseating to see Mr. Gandhi. a seditious Middle Temple lawyer of the type well-known in the East, now posing as a fakir. striding half naked up the steps of the Viceregal palace to parley on equal terms with the representative of the King-Emperor. Comment on Gandhis meeting with the Viceroy of India. addressing the Council of the West Essex Unionist Association (23 February 1931) as quoted in Mr Churchill on India in The Times (24 February 1931). We shall escape the absurdity of growing a whole chicken in order to eat the breast or wing, by growing these parts separately under a suitable medium. Fifty Years Hence, The Strand Magazine (December 1931). We are stripped bare by the curse of plenty. Lecture at Cleveland, Ohio (February 3, 1932), reported in Robert Rhodes James, ed. Winston S. Churchill: His Complete Speeches, 18971963 (1974), vol. 5, p. 5130 referring to the theory that over-production caused the Depression. We know that he has, more than any other man, the gift of compressing the largest number of words into the smallest amount of thought. A jibe directed at Ramsay MacDonald. during a speech in the House of Commons, March 23, 1933 European Situation. This quote is similar to a remark (He can compress the most words into the smallest ideas of any man I ever met) made by Abraham Lincoln. Frederick Trevor Hill credits Lincoln with this remark in Lincoln the Lawyer (1906), adding that History has considerately sheltered the identity of the victim. One may dislike Hitlers system and yet admire his patriotic achievement. If our country were defeated, I hope we should find a champion as indomitable to restore our courage and lead us back to our place among the nations. Hitler and His Choice, The Strand Magazine (November 1935). We cannot tell whether Hitler will be the man who will once again let loose upon the world another war in which civilisation will irretrievably succumb, or whether he will go down in history as the man who restored honour and peace of mind to the Great Germanic nation. Hitler and His Choice, The Strand Magazine (November 1935). Mr. Gandhi has gone very high in my esteem since he stood up for the untouchables I do not care whether you are more or less loyal to Great Britain Tell Mr. Gandhi to use the powers that are offered and make the thing a success. Letter to G. D. Birla (1935) published in Winston S. Churchill, Volume Five: The Coming of War 19221939 (1979) by Sir Martin Gilbert The world looks with some awe upon a man who appears unconcernedly indifferent to home, money, comfort, rank, or even power and fame. The world feels not without a certain apprehension, that here is some one outside its jurisdiction someone before whom its allurements may be spread in vain some one strangely enfranchised, untamed, untrammelled by convention, moving independent of the ordinary currents of human action. At an unveiling of a memorial to T. E. Lawrence at the Oxford High School for Boys (3 October 1936) as quoted in Lawrence of Arabia: The Authorized Biography of T. E. Lawrence (1989) by Jeremy M Wilson. Occasionally he stumbled over the truth, but hastily picked himself up and hurried on as if nothing had happened. On Stanley Baldwin. as cited in Churchill by Himself (2008), Ed. Langworth, PublicAffairs, p. 322 ISBN 1586486381 Also quoted by Kay Halle in Irrepressible Churchill: A Treasury of Winston Churchills Wit (1966). Anyone can see what the position is. The Government simply cannot make up their mind, or they cannot get the Prime Minister to make up his mind. So they go on in strange paradox, decided only to be undecided, resolved to be irresolute, adamant for drift, solid for fluidity, all powerful to be impotent. So we go on preparing more months and years precious, perhaps vital to the greatness of Britain for the locusts to eat. Speech in the House of Commons, November 12, 1936 Debate on the Address. criticizing the Government of Stanley Baldwin for its conciliatory stance toward Hitler . The era of procrastination, of half-measures, of soothing and baffling expedients, of delays, is coming to its close. In its place we are entering a period of consequences. Speech in the House of Commons, November 12, 1936 Debate on the Address Cited in Al Gore s documentary An Inconvenient Truth This speech is also commonly known by the name The Locust Years . Courage is rightly esteemed the first of human qualities, because, as has been said, it is the quality which guarantees all others. In Great Contemporaries. Alfonso XIII (1937). The essence and foundation of House of Commons debating is formal conversation. The set speech, the harangue addressed to constituents, or to the wider public out of doors, has never succeeded much in our small wisely-built chamber. To do any good you have got to get down to grips with the subject and in human touch with the audience. In Great Contemporaries . Clemenceau (1937). Whatever one may think about democratic government, it is just as well to have practical experience of its rough and slatternly foundations. No part of the education of a politician is more indispensable than the fighting of elections. In Great Contemporaries . Lord Rosebery (1937). I do not agree that the dog in a manger has the final right to the manger even though he may have lain there for a very long time. I do not admit that right. I do not admit for instance, that a great wrong has been done to the Red Indians of America or the black people of Australia. I do not admit that a wrong has been done to these people by the fact that a stronger race, a higher-grade race, a more worldly wise race to put it that way, has come in and taken their place. To the Peel Commission (1937) on a Jewish Homeland in Palestine. Dictators ride to and fro on tigers from which they dare not dismount. And the tigers are getting hungry. Armistice - or Peace, published in The Evening Standard (11 November 1937). For five years I have talked to the House on these matters not with very great success. I have watched this famous island descending incontinently, fecklessly, the stairway which leads to a dark gulf. It is a fine broad stairway at the beginning, but after a bit the carpet ends. A little farther on there are only flagstones, and a little farther on still these break beneath your feet. Look back upon the last five years since, that is to say, Germany began to rearm in earnest and openly to seek revenge historians a thousand years hence will still be baffled by the mystery of our affairs. They will never understand how it was that a victorious nation, with everything in hand, suffered themselves to be brought low, and to cast away all that they had gained by measureless sacrifice and absolute victory gone with the wind Now the victors are the vanquished, and those who threw down their arms in the field and sued for an armistice are striding on to world mastery. That is the position that is the terrible transformation that has taken place bit by bit. Speech in the House of Commons (24 March 1938) Foreign Affairs and Rearmament. 12 days after the Anschluss (the Nazi annexation of Austria). Our loyal, brave people should know the truth. they should know that we have sustained a defeat without a war, and that the terrible words have for the time being been pronounced against the Western democracies Thou art weighed in the balance and found wanting. And do not suppose that this is the end. This is only the beginning of the reckoning. This is only the first sip, the first foretaste of a bitter cup which will be proferred to us year by year unless by a supreme recovery of moral health and martial vigour, we arise again and take our stand for freedom as in the olden time. Speech in the House of Commons (5 October 1938) Policy of His Majestys Government. a week after the announcement of the Munich Accords. The stations of uncensored expression are closing down the lights are going out but there is still time for those to whom freedom and parliamentary government mean something, to consult together. Let me, then, speak in truth and earnestness while time remains. Winston Churchill . in The Defence of Freedom and Peace (The Lights are Going Out), radio broadcast to the United States and to London (16 October 1938) . People say we ought not to allow ourselves to be drawn into a theoretical antagonism between Nazidom and democracy but the antagonism is here now. It is this very conflict of spiritual and moral ideas which gives the free countries a great part of their strength. You see these dictators on their pedestals, surrounded by the bayonets of their soldiers and the truncheons of their police. On all sides they are guarded by masses of armed men, cannons, aeroplanes, fortifications, and the like they boast and vaunt themselves before the world, yet in their hearts there is unspoken fear. They are afraid of words and thoughts words spoken abroad, thoughts stirring at home all the more powerful because forbidden terrify them. A little mouse of thought appears in the room, and even the mightiest potentates are thrown into panic. They make frantic efforts to bar our thoughts and words they are afraid of the workings of the human mind. Cannons, airplanes, they can manufacture in large quantities but how are they to quell the natural promptings of human nature, which after all these centuries of trial and progress has inherited a whole armoury of potent and indestructible knowledge Winston Churchill . in The Defence of Freedom and Peace (The Lights are Going Out), radio broadcast to the United States and to London (16 October 1938). I have always said that if Great Britain were defeated in war I hoped we should find a Hitler to lead us back to our rightful position among the nations. I am sorry, however, that he has not been mellowed by the great success that has attended him. The whole world would rejoice to see the Hitler of peace and tolerance, and nothing would adorn his name in world history so much as acts of magnanimity and of mercy and of pity to the forlorn and friendless, to the weak and poor. Let this great man search his own heart and conscience before he accuses anyone of being a warmonger. Mr. Churchills Reply in The Times (7 November 1938). Britain and France had to choose between war and dishonour. They chose dishonour. They will have war. To Neville Chamberlain in the House of Commons, after the Munich accords (1938). The Second World War (19391945) Edit Winston Churchill addressing a joint session of the United States Congress, May 1943. I cannot forecast to you the action of Russia. It is a riddle wrapped in a mystery inside an enigma . but perhaps there is a key. That key is Russian national interest. BBC broadcast (The Russian Enigma), London, October 1, 1939 (partial text. transcript of the First Month of War speech ). First, Poland has been again overrun by two of the great powers which held her in bondage for 150 years but were unable to quench the spirit of the Polish nation. The heroic defense of Warsaw shows that the soul of Poland is indestructible, and that she will rise again like a rock which may for a spell be submerged by a tidal wave but which remains a rock. BBC broadcast (The Russian Enigma), London, October 1, 1939 (First Month of War (excerpt). transcript of the full text ). I would say to the House, as I said to those who have joined this Government: I have nothing to offer but blood, toil, tears, and sweat. We have before us an ordeal of the most grievous kind. We have before us many, many long months of struggle and of suffering. You ask, what is our policy I will say: It is to wage war, by sea, land and air, with all our might and with all the strength that God can give us: to wage war against a monstrous tyranny, never surpassed in the dark, lamentable catalogue of human crime. That is our policy. You ask, what is our aim I can answer in one word: It is victory, victory at all costs, victory in spite of all terror, victory, however long and hard the road may be for without victory, there is no survival. Speech in the House of Commons. after taking office as Prime Minister (13 May 1940) This has often been misquoted in the form: I have nothing to offer but blood, sweat and tears. The Official Report, House of Commons (5th Series), 13 May 1940, vol. 360, c. 1502. Audio records of the speech do spare out the It is before the in the beginning of the Victory-Part. Side by side the British and French peoples have advanced to rescue mankind from the foulest and most soul-destroying tyranny which has ever darkened and stained the pages of history. Behind them gather a group of shattered States and bludgeoned races: the Czechs, the Poles, the Norwegians, the Danes, the Dutch, the Belgians -- upon all of whom the long night of barbarism will descend, unbroken even by a star of hope, unless we conquer, as conquer we must as conquer we shall. Radio broadcast, Be Ye Men of Valour. May 19, 1940 (partial text ). Every morn brought forth a noble chance, and every chance brought forth a noble knight. Speech in the House of Commons, June 4, 1940 passage praising the airmen of the Royal Air Force and their efforts during the evacuation of Dunkirk. This is a close paraphrase of Tennyson: When every morning brought a noble chance, And every chance brought out a noble knight. Alfred Tennyson. Morte dArthur . stanza 23 (1842), and the expanded The Passing of Arthur , stanza 36 in Idylls of the King (18561885) Wikisource has original text related to: We shall not flag or fail. We shall go on to the end, we shall fight in France, we shall fight on the seas and oceans, we shall fight with growing confidence and growing strength in the air, we shall defend our Island, whatever the cost may be, we shall fight on the beaches, we shall fight on the landing grounds, we shall fight in the fields and in the streets, we shall fight in the hills we shall never surrender . and even if, which I do not for a moment believe, this Island or a large part of it were subjugated and starving, then our Empire beyond the seas, armed and guarded by the British Fleet, would carry on the struggle, until, in Gods good time, the New World, with all its power and might, steps forth to the rescue and the liberation of the Old. Speech in the House of Commons (4 June 1940). Bearing ourselves humbly before God we await undismayed the impending assault be the ordeal sharp or long, or both, we shall seek no terms, we shall tolerate no parlay we may show mercy we shall ask for none. BBC Broadcast, London, July 14, 1940 War of the Unknown Warriors . Of this I am quite sure, that if we open a quarrel between the past and the present, we shall find that we have lost the future. Speech in the House of Commons, June 18, 1940 War Situation . Upon this battle depends the survival of Christian civilisation. Upon it depends our own British life and the long continuity of our institutions and our Empire. The whole fury and might of the enemy must very soon be turned on us now. Hitler knows that he will have to break us in this island or lose the war. If we can stand up to him, all Europe may be free and the life of the world may move forward into broad, sunlit uplands. But if we fail, then the whole world, including the United States, including all that we have known and cared for, will sink into the abyss of a new Dark Age, made more sinister, and perhaps more protracted, by the lights of perverted science. Let us therefore brace ourselves to our duties, and so bear ourselves that, if the British Empire and its Commonwealth last for a thousand years, men will still say, This was their finest hour. Speech in the House of Commons, June 18, 1940 War Situation . The gratitude of every home in our Island, in our Empire, and indeed throughout the world, except in the abodes of the guilty, goes out to the British airmen who, undaunted by odds, unwearied in their constant challenge and mortal danger, are turning the tide of the World War by their prowess and by their devotion. Never in the field of human conflict was so much owed by so many to so few. All hearts go out to the fighter pilots, whose brilliant actions we see with our own eyes day after day but we must never forget that all the time, night after night, month after month, our bomber squadrons travel far into Germany, find their targets in the darkness by the highest navigational skill, aim their attacks, often under the heaviest fire, often with serious loss, with deliberate careful discrimination, and inflict shattering blows upon the whole of the technical and war-making structure of the Nazi power. Speech in the House of Commons. also known as The Few , made on 20 August 1940. However Churchill first made his comment, Never in the field of human conflict was so much owed by so many to so few to General Hastings Ismay as they got into their car to leave RAF Uxbridge on 16 August 1940 after monitoring the battle from the Operations Room. Farewell to RAF Uxbridge. Global Aviation Resource ( 6 April 2010 ). Retrieved on 12 September 2010. Crozier, Hazel. RAF Uxbridge 90th Anniversary 19172007 . RAF High Wycombe: Air Command Media Services. Churchill repeated the quote in a speech to Parliament four days later complimenting the pilots in the Royal Air Force during the Battle of Britain. The speech in the House of Commons is often incorrectly cited as the origin of the popular phrase never was so much owed by so many to so few . Queen Elizabeth II during her speech in Polish Parliament 26.03.1996 said that Churchill said so few about unforgettable and brave Polish pilots from Battle of Britain. We are waiting for the long-promised invasion. So are the fishes. Radio broadcast, London, Dieu Protge La France God protect France , October 21, 1940 (partial text ). Goodnight then: sleep to gather strength for the morning. For the morning will come. Brightly will it shine on the brave and true, kindly upon all who suffer for the cause, glorious upon the tombs of heroes. Thus will shine the dawn. Vive la France Long live also the forward march of the common people in all the lands towards their just and true inheritance, and towards the broader and fuller age. Radio broadcast, London, Dieu Protge La France God protect France , October 21, 1940 (partial text ). These cruel, wanton, indiscriminate bombings of London are, of course, a part of Hitlers invasion plans. He hopes, by killing large numbers of civilians, and women and children, that he will terrorise and cow the people of this mighty imperial city Little does he know the spirit of the British nation, or the tough fibre of the Londoners. Radio broadcast during the London Blitz, September 11, 1940. Quoted by Martin Gilbert in Churchill: A Life . Macmillan (1992), p. 675 ISBN 0805023968 We do not covet anything from any nation except their respect. Radio broadcast to German occupied. Vichy. and Free France (21 October 1940) The hour has come kill the Hun. How Churchill said he would end his speech if Germany invaded Britain (John Colville s diary entry for January 25, 1941). In The Churchill War Papers. 1941 (1993), ed. Gilbert, W. W. Norton, pp. 132133 ISBN 0393019594 Here is the answer which I will give to President Roosevelt: Put your confidence in us. We shall not fail or falter we shall not weaken or tire. Neither the sudden shock of battle, nor the long-drawn trials of vigilance and exertion will wear us down. Give us the tools and we will finish the job. BBC radio broadcast, February 9, 1941. In The Churchill War Papers. 1941 (1993), ed. Gilbert, W. W. Norton, pp. 199200 ISBN 0393019594 I must point out that the British nation is unique in this respect. They are the only people who like to be told how bad things are, who like to be told the worst, and like to be told that they are very likely to get much worse in the future and must prepare themselves for further reverses. Speech in the House of Commons, June 10, 1941 Defence of Crete. in The Churchill War Papers. 1941 (1993), ChurchillGilbert, Norton, p. 785 ISBN 0393019594 . If Hitler invaded Hell, I would make at least a favourable reference to the devil in the House of Commons. To his personal secretary John Colville the evening before Operation Barbarossa. the German invasion of the Soviet Union. As quoted by Andrew Nagorski in The Greatest Battle (2007), Simon amp Schuster, pp. 150151 ISBN 0743281101 Hitler is a monster of wickedness, insatiable in his lust for blood and plunder. Not content with having all Europe under his heel, or else terrorised into various forms of abject submission, he must now carry his work of butchery and desolation among the vast multitudes of Russia and of Asia. The terrible military machine which we and the rest of the civilised world so foolishly, so supinely, so insensately allowed the Nazi gangsters to build up year by year from almost nothing cannot stand idle lest it rust or fall to pieces. So now this bloodthirsty guttersnipe must launch his mechanized armies upon new fields of slaughter, pillage and devastation. Radio broadcast on the German invasion of Russia, June 22, 1941. In The Churchill War Papers. 1941 (1993), W. W. Norton, pp. 835836 ISBN 0393019594 We ask no favours of the enemy. We seek from them no compunction. On the contrary, if tonight the people of London were asked to cast their votes as to whether a convention should be entered into to stop the bombing of all cities, an overwhelming majority would cry, No, we will mete out to the Germans the measure, and more than the measure, they have meted out to us. The people of London with one voice would say to Hitler: You have committed every crime under the sun. Where you have been the least resisted there you have been the most brutal. It was you who began the indiscriminate bombing. We remember Warsaw In the first few days of the war. We remember Rotterdam. We have been newly reminded of your habits by the hideous massacre in Belgrade. We know too well the bestial assaults youre making upon the Russian people, to whom our hearts go out in their valiant struggle We will have no truce or parley with you, or the grisly gang who work your wicked will You do your worst and we will do our best Perhaps it may be our turn soon. Perhaps it may be our turn now. July 14. 1941. in a speech before the London County Council. The original can be found in Churchills The Unrelenting Struggle (English edition 187 American edition 182) or in the Complete Speeches VI:6448. Never give in never, never, never, never, in nothing great or small, large or petty, never give in except to convictions of honour and good sense. Never yield to force never yield to the apparently overwhelming might of the enemy. Speech given at Harrow School. Harrow, England, October 29, 1941. Quoted in Churchill by Himself (2008), ed. Langworth, PublicAffairs, 2008, p. 23 ISBN 1586486381 We have not journeyed all this way across the centuries, across the oceans, across the mountains, across the prairies, because we are made of sugar candy. Speech before Joint Session of the Canadian Parliament, Ottawa (December 30. 1941 ) The Yale Book of Quotations . Ed. Fred R. Shapiro, Yale University Press (2006), p. 153 ISBN 0300107986 When we consider the resources of the United States and the British Empire compared to those of Japan, when we remember those of China, which has so long and valiantly withstood invasion and when also we observe the Russian menace which hangs over Japan, it becomes still more difficult to reconcile Japanese action with prudence or even with sanity. What kind of a people do they think we are Is it possible they do not realise that we shall never cease to persevere against them until they have been taught a lesson which they and the world will never forget Members of the Senate and members of the House of Representatives, I turn for one moment more from the turmoil and convulsions of the present to the broader basis of the future. Here we are together facing a group of mighty foes who seek our ruin here we are together defending all that to free men is dear. Twice in a single generation the catastrophe of world war has fallen upon us twice in our lifetime has the long arm of fate reached across the ocean to bring the United States into the forefront of the battle. If we had kept together after the last War, if we had taken common measures for our safety, this renewal of the curse need never have fallen upon us. Do we not owe it to ourselves, to our children, to mankind tormented, to make sure that these catastrophes shall not engulf us for the third time Speech to a joint session of the United States Congress, Washington, D. C. (26 December 1941) . It is not given to us to peer into the mysteries of the future. Still, I avow my hope and faith, sure and inviolate, that in the days to come the British and American peoples will for their own safety and for the good of all walk together side by side in majesty, in justice, and in peace. Ending of the Speech to a joint session of the United States Congress, Washington, D. C. (26 December 1941) reported in Winston S. Churchill: His Complete Speeches, 18971963 . Ed. Robert Rhodes James (1974), vol. 6, p. 6541. The Congressional Record reports that this speech was followed by Prolonged applause, the Members of the Senate and their guests rising Congressional Record . Vol. 87, p. 10119. When I warned them that Britain would fight on alone whatever they did, their generals told their Prime Minister and his divided Cabinet, In three weeks England will have her neck wrung like a chicken. Some chicken Some neck Reference to the French government speech before Joint Session of the Canadian Parliament, Ottawa (December 30. 1941 ) The Yale Book of Quotations . Ed. Fred R. Shapiro, Yale University Press (2006), p. 153 ISBN 0300107986 The most dangerous moment of the War, and the one which caused me the greatest alarm . was when the Japanese Fleet was heading for Ceylon and the naval base there. The capture of Ceylon, the consequent control of the Indian Ocean, and the possibility at the same time of a German conquest of Egypt would have closed the ring and the future would have been black. Quote about the (April 5, 1942) Easter Sunday Raid on Colombo, Ceylon (Sri Lanka). From a conversation at the British Embassy, Washington D. C. as described by Leonard Birchall. RCAF, in Battle for the Skies (2004), Michael Paterson, David amp Charles, ISBN 0715318152 It was an experience of great interest to me to meet Premier Stalin It is very fortunate for Russia in her agony to have this great rugged war chief at her head. He is a man of massive outstanding personality, suited to the sombre and stormy times in which his life has been cast a man of inexhaustible courage and will-power and a man direct and even blunt in speech, which, having been brought up in the House of Commons, I do not mind at all, especially when I have something to say of my own. Above all, he is a man with that saving sense of humour which is of high importance to all men and all nations, but particularly to great men and great nations. Stalin also left upon me the impression of a deep, cool wisdom and a complete absence of illusions of any kind. I believe I made him feel that we were good and faithful comrades in this war but that, after all, is a matter which deeds not words will prove. Speech in the House of Commons, September 8, 1942 War Situation . I hate Indians. They are a beastly people with a beastly religion . In conversation to Leo Amery. Secretary of State for India. This quotation is widely cited as written in a letter to Leo Amery (e. g. in Jolly Good Fellows and Their Nasty Ways by Vinay Lal in Times of India (15 January 2007)) but it is actually attributed to Churchill as a remark, in an entry for September 1942 in Leo Amery. Diaries (1988), edited John Barnes and David Nicholson, p. 832. During my talk with Winston he burst out with: I hate Indians. They are a beastly people with a beastly religion. Now this is not the end. It is not even the beginning of the end. But it is, perhaps, the end of the beginning. speech at Lord Mayors Luncheon, Mansion House, London, November 10. 1942. (partial text ) Referring to the British victory over the German Afrika Korps at the Second Battle of El Alamein in Egypt. The problems of victory are more agreeable than those of defeat, but they are no less difficult. Speech in the House of Commons. November 11, 1942 Debate on the address . I have not become the Kings First Minister in order to preside over the liquidation of the British Empire. speech at Lord Mayors Luncheon, Mansion House, London, November 10. 1942 The Yale Book of Quotations . Ed. Fred R. Shapiro, Yale University Press (2006), p. 153 ISBN 0300107986 Before Alamein we never had a victory. After Alamein, we never had a defeat. The Second World War, Volume IV. The Hinge of Fate (1951) Chapter 33 (The Battle of Alamein) BBC News story on the 60th anniversary of Alamein . The maxim Nothing avails but perfection may be spelt shorter: Paralysis. Minute brief note to General Ismay, December 6. 1942. on proposed improvements to landing-craft. In The Second World War, Volume IV. The Hinge of Fate (1951), Appendix C. I am sure it would be sensible to restrict as much as possible the work of these gentlemen, who are capable of doing an immense amount of harm with what may very easily degenerate into charlatanry. The tightest hand should be kept over them, and they should not be allowed to quarter themselves in large numbers among Fighting Services at the public expense. On psychiatrists, in a letter to John Anderson. Lord President of the Council (December 19, 1942) In The Second World War, Volume IV. The Hinge of Fate (1951), Appendix C. There is no finer investment for any community than putting milk into babies. Radio broadcast (March 21, 1943), cited in Churchill by Himself (2008), ed. Langworth, PublicAffairs, p. 21 ISBN 1586486381 By its sudden collapse, the proud German army has once again proved the truth of the saying, The Hun is always either at your throat or at your feet. Speech before a Joint Session of Congress (May 19, 1943), Washington, D. C. in Never Give In. The best of Winston Churchills Speeches (2003), Hyperion, p. 352 ISBN 1401300561 The empires of the future are the empires of the mind. Speech at Harvard University. September 6, 1943, in The Oxford Dictionary of Quotations (1999), Knowles amp Partington, Oxford University Press, p. 215 ISBN 0198601735 To achieve the extirpation of Nazi tyranny there are no lengths of violence to which we will not go. Speech to Parliament, September 21, 1943. Quoted in Churchill, Hitler, and the Unnecessary War (2008) by Patrick J Buchanan. P. 396. I have nothing to add to the reply which has already been sent. Response to Dundee Council after refusing to expand on his reasons for not accepting the Freedom of the City Memo (October 27. 1943 ). I hate nobody except Hitler and that is professional. Churchill to John Colville during WWII, quoted by Colville in his book The Churchillians (1981) ISBN 0297779095 Everyone is in favour of free speech. Hardly a day passes without its being extolled, but some peoples idea of it is that they are free to say what they like, but if anyone says anything back, that is an outrage The Coalmining Situation, Speech to the House of Commons (October 13, 1943) 5 We shape our buildings, and afterwards our buildings shape us. Speech to the House of Commons (October 28, 1943), on plans for the rebuilding of the Chamber (destroyed by an enemy bomb May 10, 1941), in Never Give In. The best of Winston Churchills Speeches (2003), Hyperion, p. 358 ISBN 1401300561 The essence of good House of Commons speaking is the conversational style, the facility for quick, informal interruptions and interchanges. Harangues from a rostrum would be a bad substitute for the conversational style in which so much of our business is done. But the conversational style requires a fairly small space, and there should be on great occasions a sense of crowd and urgency. There should be a sense of the importance of much that is said and a sense that great matters are being decided, there and then, by the House. It has a collective personality which enjoys the regard of the public, and which imposes itself upon the conduct not only of individual Members but of parties. Speech in the House of Commons, October 28, 1943 House of Commons Rebuilding . The House of Commons has lifted our affairs above the mechanical sphere into the human sphere. It thrives on criticism, it is perfectly impervious to newspaper abuse or taunts from any quarter, and it is capable of digesting almost anything or almost any body of gentlemen, whatever be the views with which they arrive. There is no situation to which it cannot address itself with vigour and ingenuity. It is the citadel of British liberty it is the foundation of our laws its traditions and its privileges are as lively today it broke the arbitrary power of the Crown and substituted that Constitutional Monarchy under which we have enjoyed so many blessings. Speech in the House of Commons, October 28, 1943 House of Commons Rebuilding . You might however consider whether you should not unfold as a background the great privilege of habeas corpus and trial by jury, which are the supreme protection invented by the English people for ordinary individuals against the state. The power of the Executive to cast a man in prison without formulating any charge known to the law, and particularly to deny him the judgment of his peers is in the highest degree odious and is the foundation of all totalitarian government, whether Nazi or Communist. In a telegram (November 21, 1942) by Churchill from Cairo, Egypt to Home Secretary Herbert Morrison cited in In the Highest Degree Odious (1992), Simpson, Clarendon Press, p. 391 ISBN 0198257759 When I make a statement of facts within my knowledge I expect it to be accepted. To Joseph Stalin in 1944, on the fact that there had been no plot between Britain and Germany to invade the Soviet Union. The Grand Alliance, Winston S. Churchill. The object of presenting medals, stars, and ribbons is to give pride and pleasure to those who have deserved them. At the same time a distinction is something which everybody does not possess. If all have it it is of less value A medal glitters, but it also casts a shadow. Speech in the House of Commons, March 22, 1944 War Decorations . I have left the obvious, essential fact to this point, namely, that it is the Russian Armies who have done the main work in tearing the guts out of the German army. In the air and on the oceans we could maintain our place, but there was no force in the world which could have been called into being, except after several more years, that would have been able to maul and break the German army unless it had been subjected to the terrible slaughter and manhandling that has fallen to it through the strength of the Russian Soviet Armies. Speech in the House of Commons, August 2, 1944 War Situation . The Russians will sweep through your country and your people will be liquidated. You are on the verge of annihiliation. To Stanisaw Mikoajczyk in Moscow, October 14, 1944. Quoted in Churchill, Hitler, and the Unnecessary War (2008) by Patrick J Buchanan. P. 380. A love of tradition has never weakened a nation, indeed it has strengthened nations in their hour of peril but the new view must come, the world must roll forward Let us have no fear of the future. Speech in the House of Commons, November 29, 1944 Debate on the Address . It seems to me that the moment has come when the question of bombing of German cities simply for the sake of increasing the terror, though under other pretexts, should be reviewed. After the devastation of Dresden by aerial bombing, and the resulting fire storm (February 1945). Quoted in Where the Right Went Wrong (2004) by Patrick J Buchanan. P. 119 ISBN 0312341156 It is a mistake to look too far ahead. Only one link in the chain of destiny can be handled at a time. Speech in the House of Commons, February 27, 1945 Crimea Conference in The Second World War, Volume VI: Triumph and Tragedy (1954), Chapter XXIII Yalta: Finale. I am going to tell you something you must not tell to any human being. We have split the atom. The report of the great experiment has just come in. A bomb was let off in some wild spot in New Mexico. It was only a thirteen-pound bomb, but it made a crater half a mile across. People ten miles away lay with their feet towards the bomb when it went off they rolled over and tried to look at the sky. But even with the darkest glasses it was impossible. It was the middle of the night, but it was as if seven suns had lit the earth two hundred miles away the light could be seen. The bomb sent up smoke into the stratosphere. It is the Second Coming. The secret has been wrested from nature. Fire was the first discovery this is the second. Churchill on the atom bomb in conversation with his doctor, Lord Moran, on 23 July 1945 (Lord Moran, Winston Churchill: The Struggle for Survival, 1940-1965 (London: Sphere, 1968), p. 305). The Gathering Storm Edit In the Second World War every bond between man and man was to perish. Crimes were committed by the Germans under the Hitlerite domination to which they allowed themselves to be subjected which find no equal in scale and wickedness with any that have darkened the human record. The wholesale massacre by systematised processes of six or seven millions of men, women, and children in the German execution camps exceeds in horror the rough-and-ready butcheries of Genghis Khan, and in scale reduces them to pigmy proportions. Deliberate extermination of whole populations was contemplated and pursued by both Germany and Russia in the Eastern war. We have at length emerged from a scene of material ruin and moral havoc the like of which had never darkened the imagination of former centuries. The Foreign Secretary has a special position in a British Cabinet. He is treated with marked respect in his high and responsible office, but he usually conducts his affairs under the continuous scrutiny, if not of the whole Cabinet, at least of its principal members. He is under an obligation to keep them informed. He circulates to his colleagues, as a matter of custom and routine, all his executive telegrams, the reports from our embassies abroad, the records of his interviews with foreign Ambassadors or other notables. At least this has been the case during my experience of Cabinet life. This supervision is, of course, especially maintained by the Prime Minister, who personally or through his Cabinet is responsible for controlling, and has the power to control, the main course of foreign policy. From him at least there must be no secrets. No Foreign Secretary can do his work unless he is supported constantly by his chief. To make things go smoothly, there must not only be agreement between them on fundamentals, but also a harmony of outlook and even to some extent of temperament. This is all the more important if the Prime Minister himself devotes special attention to foreign affairs. intro to ch.14 Mr. Eden at the Foreign Office: His Resignation, The Gathering Storm . Volume I of The Second World War, by Winston S. Churchill. Post-war years (19451955) Edit We must all turn our backs upon the horrors of the past. We must look to the future. We cannot afford to drag forward across the years that are to come the hatreds and revenges which have sprung from the injuries of the past. Crowdsourcing is a very popular means of obtaining the large amounts of labeled data that modern machine learning methods require. Although cheap and fast to obtain, crowdsourced labels suffer from significant amounts of error, thereby degrading the performance of downstream machine learning tasks. With the goal of improving the quality of the labeled data, we seek to mitigate the many errors that occur due to silly mistakes or inadvertent errors by crowdsourcing workers. We propose a two-stage setting for crowdsourcing where the worker first answers the questions, and is then allowed to change her answers after looking at a (noisy) reference answer. We mathematically formulate this process and develop mechanisms to incentivize workers to act appropriately. Our mathematical guarantees show that our mechanism incentivizes the workers to answer honestly in both stages, and refrain from answering randomly in the first stage or simply copying in the second. Numerical experiments reveal a significant boost in performance that such 8220self-correction8221 can provide when using crowdsourcing to train machine learning algorithms. There are various parametric models for analyzing pairwise comparison data, including the Bradley-Terry-Luce (BTL) and Thurstone models, but their reliance on strong parametric assumptions is limiting. In this work, we study a flexible model for pairwise comparisons, under which the probabilities of outcomes are required only to satisfy a natural form of stochastic transitivity. This class includes parametric models including the BTL and Thurstone models as special cases, but is considerably more general. We provide various examples of models in this broader stochastically transitive class for which classical parametric models provide poor fits. Despite this greater flexibility, we show that the matrix of probabilities can be estimated at the same rate as in standard parametric models. On the other hand, unlike in the BTL and Thurstone models, computing the minimax-optimal estimator in the stochastically transitive model is non-trivial, and we explore various computationally tractable alternatives. We show that a simple singular value thresholding algorithm is statistically consistent but does not achieve the minimax rate. We then propose and study algorithms that achieve the minimax rate over interesting sub-classes of the full stochastically transitive class. We complement our theoretical results with thorough numerical simulations. We show how any binary pairwise model may be uprooted to a fully symmetric model, wherein original singleton potentials are transformed to potentials on edges to an added variable, and then rerooted to a new model on the original number of variables. The new model is essentially equivalent to the original model, with the same partition function and allowing recovery of the original marginals or a MAP conguration, yet may have very different computational properties that allow much more efficient inference. This meta-approach deepens our understanding, may be applied to any existing algorithm to yield improved methods in practice, generalizes earlier theoretical results, and reveals a remarkable interpretation of the triplet-consistent polytope. We show how deep learning methods can be applied in the context of crowdsourcing and unsupervised ensemble learning. First, we prove that the popular model of Dawid and Skene, which assumes that all classifiers are conditionally independent, is to a Restricted Boltzmann Machine (RBM) with a single hidden node. Hence, under this model, the posterior probabilities of the true labels can be instead estimated via a trained RBM. Next, to address the more general case, where classifiers may strongly violate the conditional independence assumption, we propose to apply RBM-based Deep Neural Net (DNN). Experimental results on various simulated and real-world datasets demonstrate that our proposed DNN approach outperforms other state-of-the-art methods, in particular when the data violates the conditional independence assumption. Revisiting Semi-Supervised Learning with Graph Embeddings Zhilin Yang Carnegie Mellon University . William Cohen CMU . Ruslan Salakhudinov U. of Toronto Paper AbstractWe present a semi-supervised learning framework based on graph embeddings. Given a graph between instances, we train an embedding for each instance to jointly predict the class label and the neighborhood context in the graph. We develop both transductive and inductive variants of our method. In the transductive variant of our method, the class labels are determined by both the learned embeddings and input feature vectors, while in the inductive variant, the embeddings are defined as a parametric function of the feature vectors, so predictions can be made on instances not seen during training. On a large and diverse set of benchmark tasks, including text classification, distantly supervised entity extraction, and entity classification, we show improved performance over many of the existing models. Reinforcement learning can acquire complex behaviors from high-level specifications. However, defining a cost function that can be optimized effectively and encodes the correct task is challenging in practice. We explore how inverse optimal control (IOC) can be used to learn behaviors from demonstrations, with applications to torque control of high-dimensional robotic systems. Our method addresses two key challenges in inverse optimal control: first, the need for informative features and effective regularization to impose structure on the cost, and second, the difficulty of learning the cost function under unknown dynamics for high-dimensional continuous systems. To address the former challenge, we present an algorithm capable of learning arbitrary nonlinear cost functions, such as neural networks, without meticulous feature engineering. To address the latter challenge, we formulate an efficient sample-based approximation for MaxEnt IOC. We evaluate our method on a series of simulated tasks and real-world robotic manipulation problems, demonstrating substantial improvement over prior methods both in terms of task complexity and sample efficiency. In learning latent variable models (LVMs), it is important to effectively capture infrequent patterns and shrink model size without sacrificing modeling power. Various studies have been done to 8220diversify8221 a LVM, which aim to learn a diverse set of latent components in LVMs. Most existing studies fall into a frequentist-style regularization framework, where the components are learned via point estimation. In this paper, we investigate how to 8220diversify8221 LVMs in the paradigm of Bayesian learning, which has advantages complementary to point estimation, such as alleviating overfitting via model averaging and quantifying uncertainty. We propose two approaches that have complementary advantages. One is to define diversity-promoting mutual angular priors which assign larger density to components with larger mutual angles based on Bayesian network and von Mises-Fisher distribution and use these priors to affect the posterior via Bayes rule. We develop two efficient approximate posterior inference algorithms based on variational inference and Markov chain Monte Carlo sampling. The other approach is to impose diversity-promoting regularization directly over the post-data distribution of components. These two methods are applied to the Bayesian mixture of experts model to encourage the 8220experts8221 to be diverse and experimental results demonstrate the effectiveness and efficiency of our methods. High dimensional nonparametric regression is an inherently difficult problem with known lower bounds depending exponentially in dimension. A popular strategy to alleviate this curse of dimensionality has been to use additive models of emph , which model the regression function as a sum of independent functions on each dimension. Though useful in controlling the variance of the estimate, such models are often too restrictive in practical settings. Between non-additive models which often have large variance and first order additive models which have large bias, there has been little work to exploit the trade-off in the middle via additive models of intermediate order. In this work, we propose salsa, which bridges this gap by allowing interactions between variables, but controls model capacity by limiting the order of interactions. salsas minimises the residual sum of squares with squared RKHS norm penalties. Algorithmically, it can be viewed as Kernel Ridge Regression with an additive kernel. When the regression function is additive, the excess risk is only polynomial in dimension. Using the Girard-Newton formulae, we efficiently sum over a combinatorial number of terms in the additive expansion. Via a comparison on 15 real datasets, we show that our method is competitive against 21 other alternatives. We propose an extension to Hawkes processes by treating the levels of self-excitation as a stochastic differential equation. Our new point process allows better approximation in application domains where events and intensities accelerate each other with correlated levels of contagion. We generalize a recent algorithm for simulating draws from Hawkes processes whose levels of excitation are stochastic processes, and propose a hybrid Markov chain Monte Carlo approach for model fitting. Our sampling procedure scales linearly with the number of required events and does not require stationarity of the point process. A modular inference procedure consisting of a combination between Gibbs and Metropolis Hastings steps is put forward. We recover expectation maximization as a special case. Our general approach is illustrated for contagion following geometric Brownian motion and exponential Langevin dynamics. Rank aggregation systems collect ordinal preferences from individuals to produce a global ranking that represents the social preference. To reduce the computational complexity of learning the global ranking, a common practice is to use rank-breaking. Individuals preferences are broken into pairwise comparisons and then applied to efficient algorithms tailored for independent pairwise comparisons. However, due to the ignored dependencies, naive rank-breaking approaches can result in inconsistent estimates. The key idea to produce unbiased and accurate estimates is to treat the paired comparisons outcomes unequally, depending on the topology of the collected data. In this paper, we provide the optimal rank-breaking estimator, which not only achieves consistency but also achieves the best error bound. This allows us to characterize the fundamental tradeoff between accuracy and complexity in some canonical scenarios. Further, we identify how the accuracy depends on the spectral gap of a corresponding comparison graph. Dropout distillation Samuel Rota Bul FBK . Lorenzo Porzi FBK . Peter Kontschieder Microsoft Research Cambridge Paper AbstractDropout is a popular stochastic regularization technique for deep neural networks that works by randomly dropping (i. e. zeroing) units from the network during training. This randomization process allows to implicitly train an ensemble of exponentially many networks sharing the same parametrization, which should be averaged at test time to deliver the final prediction. A typical workaround for this intractable averaging operation consists in scaling the layers undergoing dropout randomization. This simple rule called 8216standard dropout8217 is efficient, but might degrade the accuracy of the prediction. In this work we introduce a novel approach, coined 8216dropout distillation8217, that allows us to train a predictor in a way to better approximate the intractable, but preferable, averaging process, while keeping under control its computational efficiency. We are thus able to construct models that are as efficient as standard dropout, or even more efficient, while being more accurate. Experiments on standard benchmark datasets demonstrate the validity of our method, yielding consistent improvements over conventional dropout. Metadata-conscious anonymous messaging Giulia Fanti UIUC . Peter Kairouz UIUC . Sewoong Oh UIUC . Kannan Ramchandran UC Berkeley . Pramod Viswanath UIUC Paper AbstractAnonymous messaging platforms like Whisper and Yik Yak allow users to spread messages over a network (e. g. a social network) without revealing message authorship to other users. The spread of messages on these platforms can be modeled by a diffusion process over a graph. Recent advances in network analysis have revealed that such diffusion processes are vulnerable to author deanonymization by adversaries with access to metadata, such as timing information. In this work, we ask the fundamental question of how to propagate anonymous messages over a graph to make it difficult for adversaries to infer the source. In particular, we study the performance of a message propagation protocol called adaptive diffusion introduced in (Fanti et al. 2015). We prove that when the adversary has access to metadata at a fraction of corrupted graph nodes, adaptive diffusion achieves asymptotically optimal source-hiding and significantly outperforms standard diffusion. We further demonstrate empirically that adaptive diffusion hides the source effectively on real social networks. The Teaching Dimension of Linear Learners Ji Liu University of Rochester . Xiaojin Zhu University of Wisconsin . Hrag Ohannessian University of Wisconsin-Madison Paper AbstractTeaching dimension is a learning theoretic quantity that specifies the minimum training set size to teach a target model to a learner. Previous studies on teaching dimension focused on version-space learners which maintain all hypotheses consistent with the training data, and cannot be applied to modern machine learners which select a specific hypothesis via optimization. This paper presents the first known teaching dimension for ridge regression, support vector machines, and logistic regression. We also exhibit optimal training sets that match these teaching dimensions. Our approach generalizes to other linear learners. Truthful Univariate Estimators Ioannis Caragiannis University of Patras . Ariel Procaccia Carnegie Mellon University . Nisarg Shah Carnegie Mellon University Paper AbstractWe revisit the classic problem of estimating the population mean of an unknown single-dimensional distribution from samples, taking a game-theoretic viewpoint. In our setting, samples are supplied by strategic agents, who wish to pull the estimate as close as possible to their own value. In this setting, the sample mean gives rise to manipulation opportunities, whereas the sample median does not. Our key question is whether the sample median is the best (in terms of mean squared error) truthful estimator of the population mean. We show that when the underlying distribution is symmetric, there are truthful estimators that dominate the median. Our main result is a characterization of worst-case optimal truthful estimators, which provably outperform the median, for possibly asymmetric distributions with bounded support. Why Regularized Auto-Encoders learn Sparse Representation Devansh Arpit SUNY Buffalo . Yingbo Zhou SUNY Buffalo . Hung Ngo SUNY Buffalo . Venu Govindaraju SUNY Buffalo Paper AbstractSparse distributed representation is the key to learning useful features in deep learning algorithms, because not only it is an efficient mode of data representation, but also 8212 more importantly 8212 it captures the generation process of most real world data. While a number of regularized auto-encoders (AE) enforce sparsity explicitly in their learned representation and others don8217t, there has been little formal analysis on what encourages sparsity in these models in general. Our objective is to formally study this general problem for regularized auto-encoders. We provide sufficient conditions on both regularization and activation functions that encourage sparsity. We show that multiple popular models (de-noising and contractive auto encoders, e. g.) and activations (rectified linear and sigmoid, e. g.) satisfy these conditions thus, our conditions help explain sparsity in their learned representation. Thus our theoretical and empirical analysis together shed light on the properties of regularizationactivation that are conductive to sparsity and unify a number of existing auto-encoder models and activation functions under the same analytical framework. k-variates: more pluses in the k-means Richard Nock Nicta 038 ANU . Raphael Canyasse Ecole Polytechnique and The Technion . Roksana Boreli Data61 . Frank Nielsen Ecole Polytechnique and Sony CS Labs Inc. Paper Abstractk-means seeding has become a de facto standard for hard clustering algorithms. In this paper, our first contribution is a two-way generalisation of this seeding, k-variates, that includes the sampling of general densities rather than just a discrete set of Dirac densities anchored at the point locations, textit a generalisation of the well known Arthur-Vassilvitskii (AV) approximation guarantee, in the form of a textit approximation bound of the textit optimum. This approximation exhibits a reduced dependency on the 8220noise8221 component with respect to the optimal potential 8212 actually approaching the statistical lower bound. We show that k-variates textit to efficient (biased seeding) clustering algorithms tailored to specific frameworks these include distributed, streaming and on-line clustering, with textit approximation results for these algorithms. Finally, we present a novel application of k-variates to differential privacy. For either the specific frameworks considered here, or for the differential privacy setting, there is little to no prior results on the direct application of k-means and its approximation bounds 8212 state of the art contenders appear to be significantly more complex and or display less favorable (approximation) properties. We stress that our algorithms can still be run in cases where there is textit closed form solution for the population minimizer. We demonstrate the applicability of our analysis via experimental evaluation on several domains and settings, displaying competitive performances vs state of the art. Multi-Player Bandits 8212 a Musical Chairs Approach Jonathan Rosenski Weizmann Institute of Science . Ohad Shamir Weizmann Institute of Science . Liran Szlak Weizmann Institute of Science Paper AbstractWe consider a variant of the stochastic multi-armed bandit problem, where multiple players simultaneously choose from the same set of arms and may collide, receiving no reward. This setting has been motivated by problems arising in cognitive radio networks, and is especially challenging under the realistic assumption that communication between players is limited. We provide a communication-free algorithm (Musical Chairs) which attains constant regret with high probability, as well as a sublinear-regret, communication-free algorithm (Dynamic Musical Chairs) for the more difficult setting of players dynamically entering and leaving throughout the game. Moreover, both algorithms do not require prior knowledge of the number of players. To the best of our knowledge, these are the first communication-free algorithms with these types of formal guarantees. The Information Sieve Greg Ver Steeg Information Sciences Institute . Aram Galstyan Information Sciences Institute Paper AbstractWe introduce a new framework for unsupervised learning of representations based on a novel hierarchical decomposition of information. Intuitively, data is passed through a series of progressively fine-grained sieves. Each layer of the sieve recovers a single latent factor that is maximally informative about multivariate dependence in the data. The data is transformed after each pass so that the remaining unexplained information trickles down to the next layer. Ultimately, we are left with a set of latent factors explaining all the dependence in the original data and remainder information consisting of independent noise. We present a practical implementation of this framework for discrete variables and apply it to a variety of fundamental tasks in unsupervised learning including independent component analysis, lossy and lossless compression, and predicting missing values in data. Deep Speech 2. End-to-End Speech Recognition in English and Mandarin Dario Amodei . Rishita Anubhai . Eric Battenberg . Carl Case . Jared Casper . Bryan Catanzaro . JingDong Chen . Mike Chrzanowski Baidu USA, Inc. . Adam Coates . Greg Diamos Baidu USA, Inc. . Erich Elsen Baidu USA, Inc. . Jesse Engel . Linxi Fan . Christopher Fougner . Awni Hannun Baidu USA, Inc. . Billy Jun . Tony Han . Patrick LeGresley . Xiangang Li Baidu . Libby Lin . Sharan Narang . Andrew Ng . Sherjil Ozair . Ryan Prenger . Sheng Qian Baidu . Jonathan Raiman . Sanjeev Satheesh Baidu SVAIL . David Seetapun . Shubho Sengupta . Chong Wang . Yi Wang . Zhiqian Wang . Bo Xiao . Yan Xie Baidu . Dani Yogatama . Jun Zhan . zhenyao Zhu Paper AbstractWe show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speechtwo vastly different languages. Because it replaces entire pipelines of hand-engineered components with neural networks, end-to-end learning allows us to handle a diverse variety of speech including noisy environments, accents and different languages. Key to our approach is our application of HPC techniques, enabling experiments that previously took weeks to now run in days. This allows us to iterate more quickly to identify superior architectures and algorithms. As a result, in several cases, our system is competitive with the transcription of human workers when benchmarked on standard datasets. Finally, using a technique called Batch Dispatch with GPUs in the data center, we show that our system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale. An important question in feature selection is whether a selection strategy recovers the 8220true8221 set of features, given enough data. We study this question in the context of the popular Least Absolute Shrinkage and Selection Operator (Lasso) feature selection strategy. In particular, we consider the scenario when the model is misspecified so that the learned model is linear while the underlying real target is nonlinear. Surprisingly, we prove that under certain conditions, Lasso is still able to recover the correct features in this case. We also carry out numerical studies to empirically verify the theoretical results and explore the necessity of the conditions under which the proof holds. We propose minimum regret search (MRS), a novel acquisition function for Bayesian optimization. MRS bears similarities with information-theoretic approaches such as entropy search (ES). However, while ES aims in each query at maximizing the information gain with respect to the global maximum, MRS aims at minimizing the expected simple regret of its ultimate recommendation for the optimum. While empirically ES and MRS perform similar in most of the cases, MRS produces fewer outliers with high simple regret than ES. We provide empirical results both for a synthetic single-task optimization problem as well as for a simulated multi-task robotic control problem. CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy Ran Gilad-Bachrach Microsoft Research . Nathan Dowlin Princeton . Kim Laine Microsoft Research . Kristin Lauter Microsoft Research . Michael Naehrig Microsoft Research . John Wernsing Microsoft Research Paper AbstractApplying machine learning to a problem which involves medical, financial, or other types of sensitive data, not only requires accurate predictions but also careful attention to maintaining data privacy and security. Legal and ethical requirements may prevent the use of cloud-based machine learning solutions for such tasks. In this work, we will present a method to convert learned neural networks to CryptoNets, neural networks that can be applied to encrypted data. This allows a data owner to send their data in an encrypted form to a cloud service that hosts the network. The encryption ensures that the data remains confidential since the cloud does not have access to the keys needed to decrypt it. Nevertheless, we will show that the cloud service is capable of applying the neural network to the encrypted data to make encrypted predictions, and also return them in encrypted form. These encrypted predictions can be sent back to the owner of the secret key who can decrypt them. Therefore, the cloud service does not gain any information about the raw data nor about the prediction it made. We demonstrate CryptoNets on the MNIST optical character recognition tasks. CryptoNets achieve 99 accuracy and can make around 59000 predictions per hour on a single PC. Therefore, they allow high throughput, accurate, and private predictions. Spectral methods for dimensionality reduction and clustering require solving an eigenproblem defined by a sparse affinity matrix. When this matrix is large, one seeks an approximate solution. The standard way to do this is the Nystrom method, which first solves a small eigenproblem considering only a subset of landmark points, and then applies an out-of-sample formula to extrapolate the solution to the entire dataset. We show that by constraining the original problem to satisfy the Nystrom formula, we obtain an approximation that is computationally simple and efficient, but achieves a lower approximation error using fewer landmarks and less runtime. We also study the role of normalization in the computational cost and quality of the resulting solution. As a widely used non-linear activation, Rectified Linear Unit (ReLU) separates noise and signal in a feature map by learning a threshold or bias. However, we argue that the classification of noise and signal not only depends on the magnitude of responses, but also the context of how the feature responses would be used to detect more abstract patterns in higher layers. In order to output multiple response maps with magnitude in different ranges for a particular visual pattern, existing networks employing ReLU and its variants have to learn a large number of redundant filters. In this paper, we propose a multi-bias non-linear activation (MBA) layer to explore the information hidden in the magnitudes of responses. It is placed after the convolution layer to decouple the responses to a convolution kernel into multiple maps by multi-thresholding magnitudes, thus generating more patterns in the feature space at a low computational cost. It provides great flexibility of selecting responses to different visual patterns in different magnitude ranges to form rich representations in higher layers. Such a simple and yet effective scheme achieves the state-of-the-art performance on several benchmarks. We propose a novel multi-task learning method that can minimize the effect of negative transfer by allowing asymmetric transfer between the tasks based on task relatedness as well as the amount of individual task losses, which we refer to as Asymmetric Multi-task Learning (AMTL). To tackle this problem, we couple multiple tasks via a sparse, directed regularization graph, that enforces each task parameter to be reconstructed as a sparse combination of other tasks, which are selected based on the task-wise loss. We present two different algorithms to solve this joint learning of the task predictors and the regularization graph. The first algorithm solves for the original learning objective using alternative optimization, and the second algorithm solves an approximation of it using curriculum learning strategy, that learns one task at a time. We perform experiments on multiple datasets for classification and regression, on which we obtain significant improvements in performance over the single task learning and symmetric multitask learning baselines. This paper illustrates a novel approach to the estimation of generalization error of decision tree classifiers. We set out the study of decision tree errors in the context of consistency analysis theory, which proved that the Bayes error can be achieved only if when the number of data samples thrown into each leaf node goes to infinity. For the more challenging and practical case where the sample size is finite or small, a novel sampling error term is introduced in this paper to cope with the small sample problem effectively and efficiently. Extensive experimental results show that the proposed error estimate is superior to the well known K-fold cross validation methods in terms of robustness and accuracy. Moreover it is orders of magnitudes more efficient than cross validation methods. We study the convergence properties of the VR-PCA algorithm introduced by cite for fast computation of leading singular vectors. We prove several new results, including a formal analysis of a block version of the algorithm, and convergence from random initialization. We also make a few observations of independent interest, such as how pre-initializing with just a single exact power iteration can significantly improve the analysis, and what are the convexity and non-convexity properties of the underlying optimization problem. We consider the problem of principal component analysis (PCA) in a streaming stochastic setting, where our goal is to find a direction of approximate maximal variance, based on a stream of i. i.d. data points in realsd. A simple and computationally cheap algorithm for this is stochastic gradient descent (SGD), which incrementally updates its estimate based on each new data point. However, due to the non-convex nature of the problem, analyzing its performance has been a challenge. In particular, existing guarantees rely on a non-trivial eigengap assumption on the covariance matrix, which is intuitively unnecessary. In this paper, we provide (to the best of our knowledge) the first eigengap-free convergence guarantees for SGD in the context of PCA. This also partially resolves an open problem posed in cite . Moreover, under an eigengap assumption, we show that the same techniques lead to new SGD convergence guarantees with better dependence on the eigengap. Dealbreaker: A Nonlinear Latent Variable Model for Educational Data Andrew Lan Rice University . Tom Goldstein University of Maryland . Richard Baraniuk Rice University . Christoph Studer Cornell University Paper AbstractStatistical models of student responses on assessment questions, such as those in homeworks and exams, enable educators and computer-based personalized learning systems to gain insights into students knowledge using machine learning. Popular student-response models, including the Rasch model and item response theory models, represent the probability of a student answering a question correctly using an affine function of latent factors. While such models can accurately predict student responses, their ability to interpret the underlying knowledge structure (which is certainly nonlinear) is limited. In response, we develop a new, nonlinear latent variable model that we call the dealbreaker model, in which a students success probability is determined by their weakest concept mastery. We develop efficient parameter inference algorithms for this model using novel methods for nonconvex optimization. We show that the dealbreaker model achieves comparable or better prediction performance as compared to affine models with real-world educational datasets. We further demonstrate that the parameters learned by the dealbreaker model are interpretablethey provide key insights into which concepts are critical (i. e. the dealbreaker) to answering a question correctly. We conclude by reporting preliminary results for a movie-rating dataset, which illustrate the broader applicability of the dealbreaker model. We derive a new discrepancy statistic for measuring differences between two probability distributions based on combining Stein8217s identity and the reproducing kernel Hilbert space theory. We apply our result to test how well a probabilistic model fits a set of observations, and derive a new class of powerful goodness-of-fit tests that are widely applicable for complex and high dimensional distributions, even for those with computationally intractable normalization constants. Both theoretical and empirical properties of our methods are studied thoroughly. Variable Elimination in the Fourier Domain Yexiang Xue Cornell University . Stefano Ermon . Ronan Le Bras Cornell University . Carla . Bart Paper AbstractThe ability to represent complex high dimensional probability distributions in a compact form is one of the key insights in the field of graphical models. Factored representations are ubiquitous in machine learning and lead to major computational advantages. We explore a different type of compact representation based on discrete Fourier representations, complementing the classical approach based on conditional independencies. We show that a large class of probabilistic graphical models have a compact Fourier representation. This theoretical result opens up an entirely new way of approximating a probability distribution. We demonstrate the significance of this approach by applying it to the variable elimination algorithm. Compared with the traditional bucket representation and other approximate inference algorithms, we obtain significant improvements. Low-rank matrix approximation has been widely adopted in machine learning applications with sparse data, such as recommender systems. However, the sparsity of the data, incomplete and noisy, introduces challenges to the algorithm stability 8212 small changes in the training data may significantly change the models. As a result, existing low-rank matrix approximation solutions yield low generalization performance, exhibiting high error variance on the training dataset, and minimizing the training error may not guarantee error reduction on the testing dataset. In this paper, we investigate the algorithm stability problem of low-rank matrix approximations. We present a new algorithm design framework, which (1) introduces new optimization objectives to guide stable matrix approximation algorithm design, and (2) solves the optimization problem to obtain stable low-rank approximation solutions with good generalization performance. Experimental results on real-world datasets demonstrate that the proposed work can achieve better prediction accuracy compared with both state-of-the-art low-rank matrix approximation methods and ensemble methods in recommendation task. Given samples from two densities p and q, density ratio estimation (DRE) is the problem of estimating the ratio pq. Two popular discriminative approaches to DRE are KL importance estimation (KLIEP), and least squares importance fitting (LSIF). In this paper, we show that KLIEP and LSIF both employ class-probability estimation (CPE) losses. Motivated by this, we formally relate DRE and CPE, and demonstrate the viability of using existing losses from one problem for the other. For the DRE problem, we show that essentially any CPE loss (eg logistic, exponential) can be used, as this equivalently minimises a Bregman divergence to the true density ratio. We show how different losses focus on accurately modelling different ranges of the density ratio, and use this to design new CPE losses for DRE. For the CPE problem, we argue that the LSIF loss is useful in the regime where one wishes to rank instances with maximal accuracy at the head of the ranking. In the course of our analysis, we establish a Bregman divergence identity that may be of independent interest. We study nonconvex finite-sum problems and analyze stochastic variance reduced gradient (SVRG) methods for them. SVRG and related methods have recently surged into prominence for convex optimization given their edge over stochastic gradient descent (SGD) but their theoretical analysis almost exclusively assumes convexity. In contrast, we prove non-asymptotic rates of convergence (to stationary points) of SVRG for nonconvex optimization, and show that it is provably faster than SGD and gradient descent. We also analyze a subclass of nonconvex problems on which SVRG attains linear convergence to the global optimum. We extend our analysis to mini-batch variants of SVRG, showing (theoretical) linear speedup due to minibatching in parallel settings. Hierarchical Variational Models Rajesh Ranganath . Dustin Tran Columbia University . Blei David Columbia Paper AbstractBlack box variational inference allows researchers to easily prototype and evaluate an array of models. Recent advances allow such algorithms to scale to high dimensions. However, a central question remains: How to specify an expressive variational distribution that maintains efficient computation To address this, we develop hierarchical variational models (HVMs). HVMs augment a variational approximation with a prior on its parameters, which allows it to capture complex structure for both discrete and continuous latent variables. The algorithm we develop is black box, can be used for any HVM, and has the same computational efficiency as the original approximation. We study HVMs on a variety of deep discrete latent variable models. HVMs generalize other expressive variational distributions and maintains higher fidelity to the posterior. The field of mobile health (mHealth) has the potential to yield new insights into health and behavior through the analysis of continuously recorded data from wearable health and activity sensors. In this paper, we present a hierarchical span-based conditional random field model for the key problem of jointly detecting discrete events in such sensor data streams and segmenting these events into high-level activity sessions. Our model includes higher-order cardinality factors and inter-event duration factors to capture domain-specific structure in the label space. We show that our model supports exact MAP inference in quadratic time via dynamic programming, which we leverage to perform learning in the structured support vector machine framework. We apply the model to the problems of smoking and eating detection using four real data sets. Our results show statistically significant improvements in segmentation performance relative to a hierarchical pairwise CRF. Binary embeddings with structured hashed projections Anna Choromanska Courant Institute, NYU . Krzysztof Choromanski Google Research NYC . Mariusz Bojarski NVIDIA . Tony Jebara Columbia . Sanjiv Kumar . Yann Paper AbstractWe consider the hashing mechanism for constructing binary embeddings, that involves pseudo-random projections followed by nonlinear (sign function) mappings. The pseudorandom projection is described by a matrix, where not all entries are independent random variables but instead a fixed budget of randomness is distributed across the matrix. Such matrices can be efficiently stored in sub-quadratic or even linear space, provide reduction in randomness usage (i. e. number of required random values), and very often lead to computational speed ups. We prove several theoretical results showing that projections via various structured matrices followed by nonlinear mappings accurately preserve the angular distance between input high-dimensional vectors. To the best of our knowledge, these results are the first that give theoretical ground for the use of general structured matrices in the nonlinear setting. In particular, they generalize previous extensions of the Johnson - Lindenstrauss lemma and prove the plausibility of the approach that was so far only heuristically confirmed for some special structured matrices. Consequently, we show that many structured matrices can be used as an efficient information compression mechanism. Our findings build a better understanding of certain deep architectures, which contain randomly weighted and untrained layers, and yet achieve high performance on different learning tasks. We empirically verify our theoretical findings and show the dependence of learning via structured hashed projections on the performance of neural network as well as nearest neighbor classifier. A Variational Analysis of Stochastic Gradient Algorithms Stephan Mandt Columbia University . Matthew Hoffman Adobe Research . Blei David Columbia Paper AbstractStochastic Gradient Descent (SGD) is an important algorithm in machine learning. With constant learning rates, it is a stochastic process that, after an initial phase of convergence, generates samples from a stationary distribution. We show that SGD with constant rates can be effectively used as an approximate posterior inference algorithm for probabilistic modeling. Specifically, we show how to adjust the tuning parameters of SGD such as to match the resulting stationary distribution to the posterior. This analysis rests on interpreting SGD as a continuous-time stochastic process and then minimizing the Kullback-Leibler divergence between its stationary distribution and the target posterior. (This is in the spirit of variational inference.) In more detail, we model SGD as a multivariate Ornstein-Uhlenbeck process and then use properties of this process to derive the optimal parameters. This theoretical framework also connects SGD to modern scalable inference algorithms we analyze the recently proposed stochastic gradient Fisher scoring under this perspective. We demonstrate that SGD with properly chosen constant rates gives a new way to optimize hyperparameters in probabilistic models. This paper proposes a new mechanism for sampling training instances for stochastic gradient descent (SGD) methods by exploiting any side-information associated with the instances (for e. g. class-labels) to improve convergence. Previous methods have either relied on sampling from a distribution defined over training instances or from a static distribution that fixed before training. This results in two problems a) any distribution that is set apriori is independent of how the optimization progresses and b) maintaining a distribution over individual instances could be infeasible in large-scale scenarios. In this paper, we exploit the side information associated with the instances to tackle both problems. More specifically, we maintain a distribution over classes (instead of individual instances) that is adaptively estimated during the course of optimization to give the maximum reduction in the variance of the gradient. Intuitively, we sample more from those regions in space that have a textit gradient contribution. Our experiments on highly multiclass datasets show that our proposal converge significantly faster than existing techniques. Tensor regression has shown to be advantageous in learning tasks with multi-directional relatedness. Given massive multiway data, traditional methods are often too slow to operate on or suffer from memory bottleneck. In this paper, we introduce subsampled tensor projected gradient to solve the problem. Our algorithm is impressively simple and efficient. It is built upon projected gradient method with fast tensor power iterations, leveraging randomized sketching for further acceleration. Theoretical analysis shows that our algorithm converges to the correct solution in fixed number of iterations. The memory requirement grows linearly with the size of the problem. We demonstrate superior empirical performance on both multi-linear multi-task learning and spatio-temporal applications. This paper presents a novel distributed variational inference framework that unifies many parallel sparse Gaussian process regression (SGPR) models for scalable hyperparameter learning with big data. To achieve this, our framework exploits a structure of correlated noise process model that represents the observation noises as a finite realization of a high-order Gaussian Markov random process. By varying the Markov order and covariance function for the noise process model, different variational SGPR models result. This consequently allows the correlation structure of the noise process model to be characterized for which a particular variational SGPR model is optimal. We empirically evaluate the predictive performance and scalability of the distributed variational SGPR models unified by our framework on two real-world datasets. Online Stochastic Linear Optimization under One-bit Feedback Lijun Zhang Nanjing University . Tianbao Yang University of Iowa . Rong Jin Alibaba Group . Yichi Xiao Nanjing University . Zhi-hua Zhou Paper AbstractIn this paper, we study a special bandit setting of online stochastic linear optimization, where only one-bit of information is revealed to the learner at each round. This problem has found many applications including online advertisement and online recommendation. We assume the binary feedback is a random variable generated from the logit model, and aim to minimize the regret defined by the unknown linear function. Although the existing method for generalized linear bandit can be applied to our problem, the high computational cost makes it impractical for real-world applications. To address this challenge, we develop an efficient online learning algorithm by exploiting particular structures of the observation model. Specifically, we adopt online Newton step to estimate the unknown parameter and derive a tight confidence region based on the exponential concavity of the logistic loss. Our analysis shows that the proposed algorithm achieves a regret bound of O(dsqrt ), which matches the optimal result of stochastic linear bandits. We present an adaptive online gradient descent algorithm to solve online convex optimization problems with long-term constraints, which are constraints that need to be satisfied when accumulated over a finite number of rounds T, but can be violated in intermediate rounds. For some user-defined trade-off parameter beta in (0, 1), the proposed algorithm achieves cumulative regret bounds of O(Tmax ) and O(T ), respectively for the loss and the constraint violations. Our results hold for convex losses, can handle arbitrary convex constraints and rely on a single computationally efficient algorithm. Our contributions improve over the best known cumulative regret bounds of Mahdavi et al. (2012), which are respectively O(T12) and O(T34) for general convex domains, and respectively O(T23) and O(T23) when the domain is further restricted to be a polyhedral set. We supplement the analysis with experiments validating the performance of our algorithm in practice. Motivated by an application of eliciting users8217 preferences, we investigate the problem of learning hemimetrics, i. e. pairwise distances among a set of n items that satisfy triangle inequalities and non-negativity constraints. In our application, the (asymmetric) distances quantify private costs a user incurs when substituting one item by another. We aim to learn these distances (costs) by asking the users whether they are willing to switch from one item to another for a given incentive offer. Without exploiting structural constraints of the hemimetric polytope, learning the distances between each pair of items requires Theta(n2) queries. We propose an active learning algorithm that substantially reduces this sample complexity by exploiting the structural constraints on the version space of hemimetrics. Our proposed algorithm achieves provably-optimal sample complexity for various instances of the task. For example, when the items are embedded into K tight clusters, the sample complexity of our algorithm reduces to O(n K). Extensive experiments on a restaurant recommendation data set support the conclusions of our theoretical analysis. We present an approach for learning simple algorithms such as copying, multi-digit addition and single digit multiplication directly from examples. Our framework consists of a set of interfaces, accessed by a controller. Typical interfaces are 1-D tapes or 2-D grids that hold the input and output data. For the controller, we explore a range of neural network-based models which vary in their ability to abstract the underlying algorithm from training instances and generalize to test examples with many thousands of digits. The controller is trained using Q-learning with several enhancements and we show that the bottleneck is in the capabilities of the controller rather than in the search incurred by Q-learning. Learning Physical Intuition of Block Towers by Example Adam Lerer Facebook AI Research . Sam Gross Facebook AI Research . Rob Fergus Facebook AI Research Paper AbstractWooden blocks are a common toy for infants, allowing them to develop motor skills and gain intuition about the physical behavior of the world. In this paper, we explore the ability of deep feed-forward models to learn such intuitive physics. Using a 3D game engine, we create small towers of wooden blocks whose stability is randomized and render them collapsing (or remaining upright). This data allows us to train large convolutional network models which can accurately predict the outcome, as well as estimating the trajectories of the blocks. The models are also able to generalize in two important ways: (i) to new physical scenarios, e. g. towers with an additional block and (ii) to images of real wooden blocks, where it obtains a performance comparable to human subjects. Structure Learning of Partitioned Markov Networks Song Liu The Inst. of Stats. Matemática. . Taiji Suzuki . Masashi Sugiyama University of Tokyo . Kenji Fukumizu The Institute of Statistical Mathematics Paper AbstractWe learn the structure of a Markov Network between two groups of random variables from joint observations. Since modelling and learning the full MN structure may be hard, learning the links between two groups directly may be a preferable option. We introduce a novel concept called the emph whose factorization directly associates with the Markovian properties of random variables across two groups. A simple one-shot convex optimization procedure is proposed for learning the emph factorizations of the partitioned ratio and it is theoretically guaranteed to recover the correct inter-group structure under mild conditions. The performance of the proposed method is experimentally compared with the state of the art MN structure learning methods using ROC curves. Real applications on analyzing bipartisanship in US congress and pairwise DNAtime-series alignments are also reported. This work focuses on dynamic regret of online convex optimization that compares the performance of online learning to a clairvoyant who knows the sequence of loss functions in advance and hence selects the minimizer of the loss function at each step. By assuming that the clairvoyant moves slowly (i. e. the minimizers change slowly), we present several improved variation-based upper bounds of the dynamic regret under the true and noisy gradient feedback, which are in light of the presented lower bounds. The key to our analysis is to explore a regularity metric that measures the temporal changes in the clairvoyant8217s minimizers, to which we refer as path variation. Firstly, we present a general lower bound in terms of the path variation, and then show that under full information or gradient feedback we are able to achieve an optimal dynamic regret. Secondly, we present a lower bound with noisy gradient feedback and then show that we can achieve optimal dynamic regrets under a stochastic gradient feedback and two-point bandit feedback. Moreover, for a sequence of smooth loss functions that admit a small variation in the gradients, our dynamic regret under the two-point bandit feedback matches that is achieved with full information. Beyond CCA: Moment Matching for Multi-View Models Anastasia Podosinnikova INRIA 8211 ENS . Francis Bach Inria . Simon Lacoste-Julien INRIA Paper AbstractWe introduce three novel semi-parametric extensions of probabilistic canonical correlation analysis with identifiability guarantees. We consider moment matching techniques for estimation in these models. For that, by drawing explicit links between the new models and a discrete version of independent component analysis (DICA), we first extend the DICA cumulant tensors to the new discrete version of CCA. By further using a close connection with independent component analysis, we introduce generalized covariance matrices, which can replace the cumulant tensors in the moment matching framework, and, therefore, improve sample complexity and simplify derivations and algorithms significantly. As the tensor power method or orthogonal joint diagonalization are not applicable in the new setting, we use non-orthogonal joint diagonalization techniques for matching the cumulants. We demonstrate performance of the proposed models and estimation techniques on experiments with both synthetic and real datasets. We present two computationally inexpensive techniques for estimating the numerical rank of a matrix, combining powerful tools from computational linear algebra. These techniques exploit three key ingredients. The first is to approximate the projector on the non-null invariant subspace of the matrix by using a polynomial filter. Two types of filters are discussed, one based on Hermite interpolation and the other based on Chebyshev expansions. The second ingredient employs stochastic trace estimators to compute the rank of this wanted eigen-projector, which yields the desired rank of the matrix. In order to obtain a good filter, it is necessary to detect a gap between the eigenvalues that correspond to noise and the relevant eigenvalues that correspond to the non-null invariant subspace. The third ingredient of the proposed approaches exploits the idea of spectral density, popular in physics, and the Lanczos spectroscopic method to locate this gap. Unsupervised Deep Embedding for Clustering Analysis Junyuan Xie University of Washington . Ross Girshick Facebook . Ali Farhadi University of Washington Paper AbstractClustering is central to many data-driven application domains and has been studied extensively in terms of distance functions and grouping algorithms. Relatively little work has focused on learning representations for clustering. In this paper, we propose Deep Embedded Clustering (DEC), a method that simultaneously learns feature representations and cluster assignments using deep neural networks. DEC learns a mapping from the data space to a lower-dimensional feature space in which it iteratively optimizes a clustering objective. Our experimental evaluations on image and text corpora show significant improvement over state-of-the-art methods. Dimensionality reduction is a popular approach for dealing with high dimensional data that leads to substantial computational savings. Random projections are a simple and effective method for universal dimensionality reduction with rigorous theoretical guarantees. In this paper, we theoretically study the problem of differentially private empirical risk minimization in the projected subspace (compressed domain). Empirical risk minimization (ERM) is a fundamental technique in statistical machine learning that forms the basis for various learning algorithms. Starting from the results of Chaudhuri et al. (NIPS 2009, JMLR 2011), there is a long line of work in designing differentially private algorithms for empirical risk minimization problems that operate in the original data space. We ask: is it possible to design differentially private algorithms with small excess risk given access to only projected data In this paper, we answer this question in affirmative, by showing that for the class of generalized linear functions, we can obtain excess risk bounds of O(w(Theta) n ) under eps-differential privacy, and O((w(Theta)n) ) under (eps, delta)-differential privacy, given only the projected data and the projection matrix. Here n is the sample size and w(Theta) is the Gaussian width of the parameter space that we optimize over. Our strategy is based on adding noise for privacy in the projected subspace and then lifting the solution to original space by using high-dimensional estimation techniques. A simple consequence of these results is that, for a large class of ERM problems, in the traditional setting (i. e. with access to the original data), under eps-differential privacy, we improve the worst-case risk bounds of Bassily et al. (FOCS 2014). We consider the maximum likelihood parameter estimation problem for a generalized Thurstone choice model, where choices are from comparison sets of two or more items. We provide tight characterizations of the mean square error, as well as necessary and sufficient conditions for correct classification when each item belongs to one of two classes. These results provide insights into how the estimation accuracy depends on the choice of a generalized Thurstone choice model and the structure of comparison sets. We find that for a priori unbiased structures of comparisons, e. g. when comparison sets are drawn independently and uniformly at random, the number of observations needed to achieve a prescribed estimation accuracy depends on the choice of a generalized Thurstone choice model. For a broad set of generalized Thurstone choice models, which includes all popular instances used in practice, the estimation error is shown to be largely insensitive to the cardinality of comparison sets. On the other hand, we found that there exist generalized Thurstone choice models for which the estimation error decreases much faster with the cardinality of comparison sets. Large-Margin Softmax Loss for Convolutional Neural Networks Weiyang Liu Peking University . Yandong Wen South China University of Technology . Zhiding Yu Carnegie Mellon University . Meng Yang Shenzhen University Paper AbstractCross-entropy loss together with softmax is arguably one of the most common used supervision components in convolutional neural networks (CNNs). Despite its simplicity, popularity and excellent performance, the component does not explicitly encourage discriminative learning of features. In this paper, we propose a generalized large-margin softmax (L-Softmax) loss which explicitly encourages intra-class compactness and inter-class separability between learned features. Moreover, L-Softmax not only can adjust the desired margin but also can avoid overfitting. We also show that the L-Softmax loss can be optimized by typical stochastic gradient descent. Extensive experiments on four benchmark datasets demonstrate that the deeply-learned features with L-softmax loss become more discriminative, hence significantly boosting the performance on a variety of visual classification and verification tasks. A Random Matrix Approach to Echo-State Neural Networks Romain Couillet CentraleSupelec . Gilles Wainrib ENS Ulm, Paris, France . Hafiz Tiomoko Ali CentraleSupelec, Gif-sur-Yvette, France . Harry Sevi ENS Lyon, Lyon, Paris Paper AbstractRecurrent neural networks, especially in their linear version, have provided many qualitative insights on their performance under different configurations. This article provides, through a novel random matrix framework, the quantitative counterpart of these performance results, specifically in the case of echo-state networks. Beyond mere insights, our approach conveys a deeper understanding on the core mechanism under play for both training and testing. One-hot CNN (convolutional neural network) has been shown to be effective for text categorization (Johnson 038 Zhang, 2015). We view it as a special case of a general framework which jointly trains a linear model with a non-linear feature generator consisting of text region embedding pooling8217. Under this framework, we explore a more sophisticated region embedding method using Long Short-Term Memory (LSTM). LSTM can embed text regions of variable (and possibly large) sizes, whereas the region size needs to be fixed in a CNN. We seek effective and efficient use of LSTM for this purpose in the supervised and semi-supervised settings. The best results were obtained by combining region embeddings in the form of LSTM and convolution layers trained on unlabeled data. The results indicate that on this task, embeddings of text regions, which can convey complex concepts, are more useful than embeddings of single words in isolation. We report performances exceeding the previous best results on four benchmark datasets. Crowdsourcing systems are popular for solving large-scale labelling tasks with low-paid (or even non-paid) workers. We study the problem of recovering the true labels from noisy crowdsourced labels under the popular Dawid-Skene model. To address this inference problem, several algorithms have recently been proposed, but the best known guarantee is still significantly larger than the fundamental limit. We close this gap under a simple but canonical scenario where each worker is assigned at most two tasks. In particular, we introduce a tighter lower bound on the fundamental limit and prove that Belief Propagation (BP) exactly matches this lower bound. The guaranteed optimality of BP is the strongest in the sense that it is information-theoretically impossible for any other algorithm to correctly la - bel a larger fraction of the tasks. In the general setting, when more than two tasks are assigned to each worker, we establish the dominance result on BP that it outperforms other existing algorithms with known provable guarantees. Experimental results suggest that BP is close to optimal for all regimes considered, while existing state-of-the-art algorithms exhibit suboptimal performances. Learning control has become an appealing alternative to the derivation of control laws based on classic control theory. However, a major shortcoming of learning control is the lack of performance guarantees which prevents its application in many real-world scenarios. As a step in this direction, we provide a stability analysis tool for controllers acting on dynamics represented by Gaussian processes (GPs). We consider arbitrary Markovian control policies and system dynamics given as (i) the mean of a GP, and (ii) the full GP distribution. For the first case, our tool finds a state space region, where the closed-loop system is provably stable. In the second case, it is well known that infinite horizon stability guarantees cannot exist. Instead, our tool analyzes finite time stability. Empirical evaluations on simulated benchmark problems support our theoretical results. Learning a classifier from private data distributed across multiple parties is an important problem that has many potential applications. How can we build an accurate and differentially private global classifier by combining locally-trained classifiers from different parties, without access to any partys private data We propose to transfer the knowledge of the local classifier ensemble by first creating labeled data from auxiliary unlabeled data, and then train a global differentially private classifier. We show that majority voting is too sensitive and therefore propose a new risk weighted by class probabilities estimated from the ensemble. Relative to a non-private solution, our private solution has a generalization error bounded by O(epsilon M ). This allows strong privacy without performance loss when the number of participating parties M is large, such as in crowdsensing applications. We demonstrate the performance of our framework with realistic tasks of activity recognition, network intrusion detection, and malicious URL detection. Network Morphism Tao Wei University at Buffalo . Changhu Wang Microsoft Research . Yong Rui Microsoft Research . Chang Wen Chen Paper AbstractWe present a systematic study on how to morph a well-trained neural network to a new one so that its network function can be completely preserved. We define this as network morphism in this research. After morphing a parent network, the child network is expected to inherit the knowledge from its parent network and also has the potential to continue growing into a more powerful one with much shortened training time. The first requirement for this network morphism is its ability to handle diverse morphing types of networks, including changes of depth, width, kernel size, and even subnet. To meet this requirement, we first introduce the network morphism equations, and then develop novel morphing algorithms for all these morphing types for both classic and convolutional neural networks. The second requirement is its ability to deal with non-linearity in a network. We propose a family of parametric-activation functions to facilitate the morphing of any continuous non-linear activation neurons. Experimental results on benchmark datasets and typical neural networks demonstrate the effectiveness of the proposed network morphism scheme. Second-order optimization methods such as natural gradient descent have the potential to speed up training of neural networks by correcting for the curvature of the loss function. Unfortunately, the exact natural gradient is impractical to compute for large models, and most approximations either require an expensive iterative procedure or make crude approximations to the curvature. We present Kronecker Factors for Convolution (KFC), a tractable approximation to the Fisher matrix for convolutional networks based on a structured probabilistic model for the distribution over backpropagated derivatives. Similarly to the recently proposed Kronecker-Factored Approximate Curvature (K-FAC), each block of the approximate Fisher matrix decomposes as the Kronecker product of small matrices, allowing for efficient inversion. KFC captures important curvature information while still yielding comparably efficient updates to stochastic gradient descent (SGD). We show that the updates are invariant to commonly used reparameterizations, such as centering of the activations. In our experiments, approximate natural gradient descent with KFC was able to train convolutional networks several times faster than carefully tuned SGD. Furthermore, it was able to train the networks in 10-20 times fewer iterations than SGD, suggesting its potential applicability in a distributed setting. Budget constrained optimal design of experiments is a classical problem in statistics. Although the optimal design literature is very mature, few efficient strategies are available when these design problems appear in the context of sparse linear models commonly encountered in high dimensional machine learning and statistics. In this work, we study experimental design for the setting where the underlying regression model is characterized by a ell1-regularized linear function. We propose two novel strategies: the first is motivated geometrically whereas the second is algebraic in nature. We obtain tractable algorithms for this problem and also hold for a more general class of sparse linear models. We perform an extensive set of experiments, on benchmarks and a large multi-site neuroscience study, showing that the proposed models are effective in practice. The latter experiment suggests that these ideas may play a small role in informing enrollment strategies for similar scientific studies in the short-to-medium term future. Minding the Gaps for Block Frank-Wolfe Optimization of Structured SVMs Anton Osokin . Jean-Baptiste Alayrac ENS . Isabella Lukasewitz INRIA . Puneet Dokania INRIA and Ecole Centrale Paris . Simon Lacoste-Julien INRIA Paper AbstractIn this paper, we propose several improvements on the block-coordinate Frank-Wolfe (BCFW) algorithm from Lacoste-Julien et al. (2013) recently used to optimize the structured support vector machine (SSVM) objective in the context of structured prediction, though it has wider applications. The key intuition behind our improvements is that the estimates of block gaps maintained by BCFW reveal the block suboptimality that can be used as an adaptive criterion. First, we sample objects at each iteration of BCFW in an adaptive non-uniform way via gap-based sampling. Second, we incorporate pairwise and away-step variants of Frank-Wolfe into the block-coordinate setting. Third, we cache oracle calls with a cache-hit criterion based on the block gaps. Fourth, we provide the first method to compute an approximate regularization path for SSVM. Finally, we provide an exhaustive empirical evaluation of all our methods on four structured prediction datasets. Exact Exponent in Optimal Rates for Crowdsourcing Chao Gao Yale University . Yu Lu Yale University . Dengyong Zhou Microsoft Research Paper AbstractCrowdsourcing has become a popular tool for labeling large datasets. This paper studies the optimal error rate for aggregating crowdsourced labels provided by a collection of amateur workers. Under the Dawid-Skene probabilistic model, we establish matching upper and lower bounds with an exact exponent mI(pi), where m is the number of workers and I(pi) is the average Chernoff information that characterizes the workers8217 collective ability. Such an exact characterization of the error exponent allows us to state a precise sample size requirement m ge frac logfrac in order to achieve an epsilon misclassification error. In addition, our results imply optimality of various forms of EM algorithms given accurate initializers of the model parameters. Unsupervised learning and supervised learning are key research topics in deep learning. However, as high-capacity supervised neural networks trained with a large amount of labels have achieved remarkable success in many computer vision tasks, the availability of large-scale labeled images reduced the significance of unsupervised learning. Inspired by the recent trend toward revisiting the importance of unsupervised learning, we investigate joint supervised and unsupervised learning in a large-scale setting by augmenting existing neural networks with decoding pathways for reconstruction. First, we demonstrate that the intermediate activations of pretrained large-scale classification networks preserve almost all the information of input images except a portion of local spatial details. Then, by end-to-end training of the entire augmented architecture with the reconstructive objective, we show improvement of the network performance for supervised tasks. We evaluate several variants of autoencoders, including the recently proposed 8220what-where8221 autoencoder that uses the encoder pooling switches, to study the importance of the architecture design. Taking the 16-layer VGGNet trained under the ImageNet ILSVRC 2012 protocol as a strong baseline for image classification, our methods improve the validation-set accuracy by a noticeable margin. (LRR) has been a significant method for segmenting data that are generated from a union of subspaces. It is also known that solving LRR is challenging in terms of time complexity and memory footprint, in that the size of the nuclear norm regularized matrix is n-by-n (where n is the number of samples). In this paper, we thereby develop a novel online implementation of LRR that reduces the memory cost from O(n2) to O(pd), with p being the ambient dimension and d being some estimated rank (d 20 reduction in the model size without any loss in accuracy on CIFAR-10 benchmark. We also demonstrate that fine-tuning can further enhance the accuracy of fixed point DCNs beyond that of the original floating point model. In doing so, we report a new state-of-the-art fixed point performance of 6.78 error-rate on CIFAR-10 benchmark. Provable Algorithms for Inference in Topic Models Sanjeev Arora Princeton University . Rong Ge . Frederic Koehler Princeton University . Tengyu Ma Princeton University . Ankur Moitra Paper AbstractRecently, there has been considerable progress on designing algorithms with provable guarantees 8212typically using linear algebraic methods8212for parameter learning in latent variable models. Designing provable algorithms for inference has proved more difficult. Here we tak e a first step towards provable inference in topic models. We leverage a property of topic models that enables us to construct simple linear estimators for the unknown topic proportions that have small variance, and consequently can work with short documents. Our estimators also correspond to finding an estimate around which the posterior is well-concentrated. We show lower bounds that for shorter documents it can be information theoretically impossible to find the hidden topics. Finally, we give empirical results that demonstrate that our algorithm works on realistic topic models. It yields good solutions on synthetic data and runs in time comparable to a single iteration of Gibbs sampling. This paper develops an approach for efficiently solving general convex optimization problems specified as disciplined convex programs (DCP), a common general-purpose modeling framework. Specifically we develop an algorithm based upon fast epigraph projections, projections onto the epigraph of a convex function, an approach closely linked to proximal operator methods. We show that by using these operators, we can solve any disciplined convex program without transforming the problem to a standard cone form, as is done by current DCP libraries. We then develop a large library of efficient epigraph projection operators, mirroring and extending work on fast proximal algorithms, for many common convex functions. Finally, we evaluate the performance of the algorithm, and show it often achieves order of magnitude speedups over existing general-purpose optimization solvers. We study the fixed design segmented regression problem: Given noisy samples from a piecewise linear function f, we want to recover f up to a desired accuracy in mean-squared error. Previous rigorous approaches for this problem rely on dynamic programming (DP) and, while sample efficient, have running time quadratic in the sample size. As our main contribution, we provide new sample near-linear time algorithms for the problem that 8211 while not being minimax optimal 8211 achieve a significantly better sample-time tradeoff on large datasets compared to the DP approach. Our experimental evaluation shows that, compared with the DP approach, our algorithms provide a convergence rate that is only off by a factor of 2 to 4, while achieving speedups of three orders of magnitude. Energetic Natural Gradient Descent Philip Thomas CMU . Bruno Castro da Silva . Christoph Dann Carnegie Mellon University . Emma Paper AbstractWe propose a new class of algorithms for minimizing or maximizing functions of parametric probabilistic models. These new algorithms are natural gradient algorithms that leverage more information than prior methods by using a new metric tensor in place of the commonly used Fisher information matrix. This new metric tensor is derived by computing directions of steepest ascent where the distance between distributions is measured using an approximation of energy distance (as opposed to Kullback-Leibler divergence, which produces the Fisher information matrix), and so we refer to our new ascent direction as the energetic natural gradient. Partition Functions from Rao-Blackwellized Tempered Sampling David Carlson Columbia University . Patrick Stinson Columbia University . Ari Pakman Columbia University . Liam Paper AbstractPartition functions of probability distributions are important quantities for model evaluation and comparisons. We present a new method to compute partition functions of complex and multimodal distributions. Such distributions are often sampled using simulated tempering, which augments the target space with an auxiliary inverse temperature variable. Our method exploits the multinomial probability law of the inverse temperatures, and provides estimates of the partition function in terms of a simple quotient of Rao-Blackwellized marginal inverse temperature probability estimates, which are updated while sampling. We show that the method has interesting connections with several alternative popular methods, and offers some significant advantages. In particular, we empirically find that the new method provides more accurate estimates than Annealed Importance Sampling when calculating partition functions of large Restricted Boltzmann Machines (RBM) moreover, the method is sufficiently accurate to track training and validation log-likelihoods during learning of RBMs, at minimal computational cost. In this paper we address the identifiability and efficient learning problems of finite mixtures of Plackett-Luce models for rank data. We prove that for any kgeq 2, the mixture of k Plackett-Luce models for no more than 2k-1 alternatives is non-identifiable and this bound is tight for k2. For generic identifiability, we prove that the mixture of k Plackett-Luce models over m alternatives is if kleqlfloorfrac 2rfloor. We also propose an efficient generalized method of moments (GMM) algorithm to learn the mixture of two Plackett-Luce models and show that the algorithm is consistent. Our experiments show that our GMM algorithm is significantly faster than the EMM algorithm by Gormley 038 Murphy (2008), while achieving competitive statistical efficiency. The combinatorial explosion that plagues planning and reinforcement learning (RL) algorithms can be moderated using state abstraction. Prohibitively large task representations can be condensed such that essential information is preserved, and consequently, solutions are tractably computable. However, exact abstractions, which treat only fully-identical situations as equivalent, fail to present opportunities for abstraction in environments where no two situations are exactly alike. In this work, we investigate approximate state abstractions, which treat nearly-identical situations as equivalent. We present theoretical guarantees of the quality of behaviors derived from four types of approximate abstractions. Additionally, we empirically demonstrate that approximate abstractions lead to reduction in task complexity and bounded loss of optimality of behavior in a variety of environments. Power of Ordered Hypothesis Testing Lihua Lei Lihua . William Fithian UC Berkeley, Department of Statistics Paper AbstractOrdered testing procedures are multiple testing procedures that exploit a pre-specified ordering of the null hypotheses, from most to least promising. We analyze and compare the power of several recent proposals using the asymptotic framework of Li 038 Barber (2015). While accumulation tests including ForwardStop can be quite powerful when the ordering is very informative, they are asymptotically powerless when the ordering is weaker. By contrast, Selective SeqStep, proposed by Barber 038 Candes (2015), is much less sensitive to the quality of the ordering. We compare the power of these procedures in different regimes, concluding that Selective SeqStep dominates accumulation tests if either the ordering is weak or non-null hypotheses are sparse or weak. Motivated by our asymptotic analysis, we derive an improved version of Selective SeqStep which we call Adaptive SeqStep, analogous to Storeys improvement on the Benjamini-Hochberg proce - dure. We compare these methods using the GEO-Query data set analyzed by (Li 038 Barber, 2015) and find Adaptive SeqStep has favorable performance for both good and bad prior orderings. PHOG: Probabilistic Model for Code Pavol Bielik ETH Zurich . Veselin Raychev ETH Zurich . Martin Vechev ETH Zurich Paper AbstractWe introduce a new generative model for code called probabilistic higher order grammar (PHOG). PHOG generalizes probabilistic context free grammars (PCFGs) by allowing conditioning of a production rule beyond the parent non-terminal, thus capturing rich contexts relevant to programs. Even though PHOG is more powerful than a PCFG, it can be learned from data just as efficiently. We trained a PHOG model on a large JavaScript code corpus and show that it is more precise than existing models, while similarly fast. As a result, PHOG can immediately benefit existing programming tools based on probabilistic models of code. We consider the problem of online prediction in changing environments. In this framework the performance of a predictor is evaluated as the loss relative to an arbitrarily changing predictor, whose individual components come from a base class of predictors. Typical results in the literature consider different base classes (experts, linear predictors on the simplex, etc.) separately. Introducing an arbitrary mapping inside the mirror decent algorithm, we provide a framework that unifies and extends existing results. As an example, we prove new shifting regret bounds for matrix prediction problems. Hyperparameter selection generally relies on running multiple full training trials, with selection based on validation set performance. We propose a gradient-based approach for locally adjusting hyperparameters during training of the model. Hyperparameters are adjusted so as to make the model parameter gradients, and hence updates, more advantageous for the validation cost. We explore the approach for tuning regularization hyperparameters and find that in experiments on MNIST, SVHN and CIFAR-10, the resulting regularization levels are within the optimal regions. The additional computational cost depends on how frequently the hyperparameters are trained, but the tested scheme adds only 30 computational overhead regardless of the model size. Since the method is significantly less computationally demanding compared to similar gradient-based approaches to hyperparameter optimization, and consistently finds good hyperparameter values, it can be a useful tool for training neural network models. Many of the recent Trajectory Optimization algorithms alternate between local approximation of the dynamics and conservative policy update. However, linearly approximating the dynamics in order to derive the new policy can bias the update and prevent convergence to the optimal policy. In this article, we propose a new model-free algorithm that backpropagates a local quadratic time-dependent Q-Function, allowing the derivation of the policy update in closed form. Our policy update ensures exact KL-constraint satisfaction without simplifying assumptions on the system dynamics demonstrating improved performance in comparison to related Trajectory Optimization algorithms linearizing the dynamics. Due to its numerous applications, rank aggregation has become a problem of major interest across many fields of the computer science literature. In the vast majority of situations, Kemeny consensus(es) are considered as the ideal solutions. It is however well known that their computation is NP-hard. Many contributions have thus established various results to apprehend this complexity. In this paper we introduce a practical method to predict, for a ranking and a dataset, how close the Kemeny consensus(es) are to this ranking. A major strength of this method is its generality: it does not require any assumption on the dataset nor the ranking. Furthermore, it relies on a new geometric interpretation of Kemeny aggregation that, we believe, could lead to many other results. Horizontally Scalable Submodular Maximization Mario Lucic ETH Zurich . Olivier Bachem ETH Zurich . Morteza Zadimoghaddam Google Research . Andreas Krause Paper AbstractA variety of large-scale machine learning problems can be cast as instances of constrained submodular maximization. Existing approaches for distributed submodular maximization have a critical drawback: The capacity 8211 number of instances that can fit in memory 8211 must grow with the data set size. In practice, while one can provision many machines, the capacity of each machine is limited by physical constraints. We propose a truly scalable approach for distributed submodular maximization under fixed capacity. The proposed framework applies to a broad class of algorithms and constraints and provides theoretical guarantees on the approximation factor for any available capacity. We empirically evaluate the proposed algorithm on a variety of data sets and demonstrate that it achieves performance competitive with the centralized greedy solution. Group Equivariant Convolutional Networks Taco Cohen University of Amsterdam . Max Welling University of Amsterdam CIFAR Paper AbstractWe introduce Group equivariant Convolutional Neural Networks (G-CNNs), a natural generalization of convolutional neural networks that reduces sample complexity by exploiting symmetries. G-CNNs use G-convolutions, a new type of layer that enjoys a substantially higher degree of weight sharing than regular convolution layers. G-convolutions increase the expressive capacity of the network without increasing the number of parameters. Group convolution layers are easy to use and can be implemented with negligible computational overhead for discrete groups generated by translations, reflections and rotations. G-CNNs achieve state of the art results on CIFAR10 and rotated MNIST. The partition function is fundamental for probabilistic graphical models8212it is required for inference, parameter estimation, and model selection. Evaluating this function corresponds to discrete integration, namely a weighted sum over an exponentially large set. This task quickly becomes intractable as the dimensionality of the problem increases. We propose an approximation scheme that, for any discrete graphical model whose parameter vector has bounded norm, estimates the partition function with arbitrarily small error. Our algorithm relies on a near minimax optimal polynomial approximation to the potential function and a Clenshaw-Curtis style quadrature. Furthermore, we show that this algorithm can be randomized to split the computation into a high-complexity part and a low-complexity part, where the latter may be carried out on small computational devices. Experiments confirm that the new randomized algorithm is highly accurate if the parameter norm is small, and is otherwise comparable to methods with unbounded error. Correcting Forecasts with Multifactor Neural Attention Matthew Riemer IBM . Aditya Vempaty IBM . Flavio Calmon IBM . Fenno Heath IBM . Richard Hull IBM . Elham Khabiri IBM Paper AbstractAutomatic forecasting of time series data is a challenging problem in many industries. Current forecast models adopted by businesses do not provide adequate means for including data representing external factors that may have a significant impact on the time series, such as weather, national events, local events, social media trends, promotions, etc. This paper introduces a novel neural network attention mechanism that naturally incorporates data from multiple external sources without the feature engineering needed to get other techniques to work. We demonstrate empirically that the proposed model achieves superior performance for predicting the demand of 20 commodities across 107 stores of one of America8217s largest retailers when compared to other baseline models, including neural networks, linear models, certain kernel methods, Bayesian regression, and decision trees. Our method ultimately accounts for a 23.9 relative improvement as a result of the incorporation of external data sources, and provides an unprecedented level of descriptive ability for a neural network forecasting model. Observational studies are rising in importance due to the widespread accumulation of data in fields such as healthcare, education, employment and ecology. We consider the task of answering counterfactual questions such as, 8220Would this patient have lower blood sugar had she received a different medication8221. We propose a new algorithmic framework for counterfactual inference which brings together ideas from domain adaptation and representation learning. In addition to a theoretical justification, we perform an empirical comparison with previous approaches to causal inference from observational data. Our deep learning algorithm significantly outperforms the previous state-of-the-art. Gaussian Processes (GPs) provide a general and analytically tractable way of modeling complex time-varying, nonparametric functions. The Automatic Bayesian Covariance Discovery (ABCD) system constructs natural-language description of time-series data by treating unknown time-series data nonparametrically using GP with a composite covariance kernel function. Unfortunately, learning a composite covariance kernel with a single time-series data set often results in less informative kernel that may not give qualitative, distinctive descriptions of data. We address this challenge by proposing two relational kernel learning methods which can model multiple time-series data sets by finding common, shared causes of changes. We show that the relational kernel learning methods find more accurate models for regression problems on several real-world data sets US stock data, US house price index data and currency exchange rate data. We introduce a new approach for amortizing inference in directed graphical models by learning heuristic approximations to stochastic inverses, designed specifically for use as proposal distributions in sequential Monte Carlo methods. We describe a procedure for constructing and learning a structured neural network which represents an inverse factorization of the graphical model, resulting in a conditional density estimator that takes as input particular values of the observed random variables, and returns an approximation to the distribution of the latent variables. This recognition model can be learned offline, independent from any particular dataset, prior to performing inference. The output of these networks can be used as automatically-learned high-quality proposal distributions to accelerate sequential Monte Carlo across a diverse range of problem settings. Slice Sampling on Hamiltonian Trajectories Benjamin Bloem-Reddy Columbia University . John Cunningham Columbia University Paper AbstractHamiltonian Monte Carlo and slice sampling are amongst the most widely used and studied classes of Markov Chain Monte Carlo samplers. We connect these two methods and present Hamiltonian slice sampling, which allows slice sampling to be carried out along Hamiltonian trajectories, or transformations thereof. Hamiltonian slice sampling clarifies a class of model priors that induce closed-form slice samplers. More pragmatically, inheriting properties of slice samplers, it offers advantages over Hamiltonian Monte Carlo, in that it has fewer tunable hyperparameters and does not require gradient information. We demonstrate the utility of Hamiltonian slice sampling out of the box on problems ranging from Gaussian process regression to Pitman-Yor based mixture models. Noisy Activation Functions Caglar Glehre . Marcin Moczulski . Misha Denil . Yoshua Bengio U. of Montreal Paper AbstractCommon nonlinear activation functions used in neural networks can cause training difficulties due to the saturation behavior of the activation function, which may hide dependencies that are not visible to vanilla-SGD (using first order gradients only). Gating mechanisms that use softly saturating activation functions to emulate the discrete switching of digital logic circuits are good examples of this. We propose to exploit the injection of appropriate noise so that the gradients may flow easily, even if the noiseless application of the activation function would yield zero gradients. Large noise will dominate the noise-free gradient and allow stochastic gradient descent to explore more. By adding noise only to the problematic parts of the activation function, we allow the optimization procedure to explore the boundary between the degenerate saturating) and the well-behaved parts of the activation function. We also establish connections to simulated annealing, when the amount of noise is annealed down, making it easier to optimize hard objective functions. We find experimentally that replacing such saturating activation functions by noisy variants helps optimization in many contexts, yielding state-of-the-art or competitive results on different datasets and task, especially when training seems to be the most difficult, e. g. when curriculum learning is necessary to obtain good results. PD-Sparse. A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification Ian En-Hsu Yen University of Texas at Austin . Xiangru Huang UTaustin . Pradeep Ravikumar UT Austin . Kai Zhong ICES department, University of Texas at Austin . Inderjit Paper AbstractWe consider Multiclass and Multilabel classification with extremely large number of classes, of which only few are labeled to each instance. In such setting, standard methods that have training, prediction cost linear to the number of classes become intractable. State-of-the-art methods thus aim to reduce the complexity by exploiting correlation between labels under assumption that the similarity between labels can be captured by structures such as low-rank matrix or balanced tree. However, as the diversity of labels increases in the feature space, structural assumption can be easily violated, which leads to degrade in the testing performance. In this work, we show that a margin-maximizing loss with l1 penalty, in case of Extreme Classification, yields extremely sparse solution both in primal and in dual without sacrificing the expressive power of predictor. We thus propose a Fully-Corrective Block-Coordinate Frank-Wolfe (FC-BCFW) algorithm that exploits both primal and dual sparsity to achieve a complexity sublinear to the number of primal and dual variables. A bi-stochastic search method is proposed to further improve the efficiency. In our experiments on both Multiclass and Multilabel problems, the proposed method achieves significant higher accuracy than existing approaches of Extreme Classification with very competitive training and prediction time.

Comments