Text Analysis using Machine Learning

MLTXT-EN-D

Who should attend?

  • In the Oil & Gas industry, like in other domains, most of the engineering knowledge and experience is saved in unstructured documents such as PDF files, MS Office files or just scans of paper reports. Extracting information from this file required a tremendous effort from domain experts and data officers since deterministic data mining tools have limited capabilities.
  • Today some new machine learning systems proposes some new ways to extract more information from unstructured documents. This course will review the more common used algorithms in the domain of text analysis and is a good introduction to the domain of machine learning.
Public :
  • This training has been designed as a boot camp for geoscientists who have to use machine learning and/or text analysis tools to reach their business objectives. The notebook used in this course is designed for people having no or limited python experience, but some general knowledge of coding is required.

Level :Awareness

Prerequisite :
  • No pre-requisistes are necessary to follow this course.

Course Content

  • INTRODUCTION

      • Problem statement.
  • TEXT CLASSIFICATION USING A ML ALGORITHM IN A NOTEBOOK

      • Exploring the document with EDA.
      • Features engineering using the scikit-learn library.
      • Model training:
      • Logistic regression.
      • Passive aggressive classifier.
      • Random forest classifier.
      • Gradient boosting classifier.
      • Model validation:
      • Precision, recall, F1 score.
      • Confusion matrix.
      • Lime, ELI5.
  • NATURAL SPEECH PROCESSING

      • Semantic analysis.
      • Markow chains.
      • Sentiment analysis.
  • DETECTING KEY-WORDS/METADATA VALUES IN TEXT

      • Detecting candidate values with deterministic method.
      • ML to associate a confidence factor to each candidate.
      • Improving the model thanks to the user experience.

Learning Objectives

  • Attendees will be able to implement the following skills:
  • Understand the main algorithms used for text mining and text analysis, and run a text classifier using a Colab notebook.

Ways & Means

  • This course can be delivered as a presential or virtual classroom. Each training module contains lectures, hands-on practices and/or case studies.

More

Coordinator :IFP Training trainers (permanent or contracted) having a good expertise and/or experience of the related topics, trained to adult teaching methods, and whose competencies are kept up-to-date.

To French entities : IFP Training is referenced to DataDock ; you may contact your OPCO about potential funding. Please contact our disabled persons referent to check the accessibility of this training program : referent.handicap@ifptraining.com