Sentiment Analysis for English, German, French and Italian Texts

We have published a new version of our REST Service for Sentiment Analysis. The machine learning model predicts whether the sentiment of a text is positive or negative. The model can predict sentiment for English, German, French and Italian texts. In conjunction with our Language Detection Service, the language of a text can be determined prior to the sentiment analysis.

The machine learning model was implemented with Keras and trained with 50’000 english, 30’000 german, 50’000 french and 30’000 italian film reviews. The models are tested with TensorFlow as backend.

Accuracy

The measured accuracy of the predictions is 88.6% for English, 81.3% for German, 84.4% for French and 81.6% for Italian.

The model consists of an embedding layer as input, a hidden layer with 8 LSTM units and an output layer with 1 unit. The maximum length of an input text is 400 words. For the word embedding the pre-trained word vectors of FacebookResearch were used.

Scope of Application

The model was trained with film reviews in English, German, French and Italian. We also tested the model at random with product reviews and reader comments in those languages. The results were generally good. However, we recommend testing when using the models in new domains.

Several texts can be posted to the REST service at once for analysis. You can start the sentiment analysis service to access a language detection service. The service then first determines the language of the texts and then passes them on to the corresponding sentiment analysis model (see the examples below).

Example Query

Here an example with several languages. Queries use the following JSON format:

{
  "texts": [
    "I found this movie really hard to sit through, my attention kept wandering off the tv.",
    "Dieser Film ist vom Anfang bis am Ende spannend! Die Schauspieler sind wirklich gut!",
    "Dieser Film ist vom Anfang bis am Ende langweilig! Die Schauspieler sind mässig bis schlecht!",
    "J'aime ce film. Les acteurs jouent vraiment bien!",
    "Non mi piace affatto questo film. Gli attori sono cattivi e la storia è noiosa!"
  ]
}

The service first determines the language of the text and then makes the sentiment analysis on this basis. The answer will look something like this:

{
  "predictions": [
    {
      "lang": {
        "label": "en", 
        "probability": {
          "de": 0.15291942656040192, 
          "en": 0.386442095041275, 
          "fr": 0.1524139940738678, 
          "it": 0.15353836119174957, 
          "rm": 0.1546860933303833
        }
      }, 
      "sentiment": {
        "label": "negativ", 
        "probability": 0.44230401515960693
      }, 
      "text": "I found this movie really hard to sit through, my attention kept wandering off the tv."
    }, 
    {
      "lang": {
        "label": "de", 
        "probability": {
          "de": 0.4036977291107178, 
          "en": 0.1490136682987213, 
          "fr": 0.1489626169204712, 
          "it": 0.14897985756397247, 
          "rm": 0.14934605360031128
        }
      }, 
      "sentiment": {
        "label": "positiv", 
        "probability": 0.9132363200187683
      }, 
      "text": "Dieser Film ist vom Anfang bis am Ende spannend! Die Schauspieler sind wirklich gut!"
    }, 
    {
      "lang": {
        "label": "de", 
        "probability": {
          "de": 0.40277570486068726, 
          "en": 0.1491391956806183, 
          "fr": 0.14907844364643097, 
          "it": 0.1491205096244812, 
          "rm": 0.1498860865831375
        }
      }, 
      "sentiment": {
        "label": "negativ", 
        "probability": 0.0641002506017685
      }, 
      "text": "Dieser Film ist vom Anfang bis am Ende langweilig! Die Schauspieler sind mässig bis schlecht!"
    }, 
    {
      "lang": {
        "label": "fr", 
        "probability": {
          "de": 0.14898933470249176, 
          "en": 0.1492728441953659, 
          "fr": 0.40348654985427856, 
          "it": 0.14923687279224396, 
          "rm": 0.14901447296142578
        }
      }, 
      "sentiment": {
        "label": "positiv", 
        "probability": 0.6126847267150879
      }, 
      "text": "J'aime ce film. Les acteurs jouent vraiment bien!"
    }, 
    {
      "lang": {
        "label": "it", 
        "probability": {
          "de": 0.15224331617355347, 
          "en": 0.15012496709823608, 
          "fr": 0.15062405169010162, 
          "it": 0.39602142572402954, 
          "rm": 0.15098623931407928
        }
      }, 
      "sentiment": {
        "label": "negativ", 
        "probability": 0.20823518931865692
      }, 
      "text": "Non mi piace affatto questo film. Gli attori sono cattivi e la storia è noiosa!"
    }
  ], 
  "success": true
}

A probability towards 0 means negative, one towards 1 means positive.

Installation and Usage

Refer to the documentation on GitHub for installation and more application examples of the software.

Download

Download the Software including Premium Support. You can also download the Software under the Apache 2.0 License on GitHub for free.

About the author: Thomas studied computer linguistics and philosophy and graduated with a PhD in computer science. He has worked as a consultant for natural language processing and application development for major Swiss banks. Thomas is founder of ipublia. He lives with his family in Zürich.

Leave a Reply