Oil & Gas Training
and Competency Development
Competency Management system SLB NEXT

Essential Data Science for Petroleum Geoscientists and Engineers - (Remote Instructor-Led Series)

Interest in data science and machine learning is rapidly expanding, offering the promise of increased efficiency in E&P, and holding the potential to analyze and extract value from vast amounts of under-utilized legacy data.

Combined with petroleum geoscience and engineering domain knowledge, the key elements underlying the successful application of the technology are: data, code, and algorithms. This course builds on public datasets, code examples written in Python, and algorithms from popular data science packages to provide a practical introduction to the subject and its application in the E&P domain.

This course will delivered entirely online, for 4 hours each day.

Module 1. Overview (1hr) 

    • What is Data Science - Overview of the course, and an outline of the scope of data science.
    • Data Science for E&P - Addressing the role of data science in E&P and an example application to log data quality control and reconstruction using machine learning.

Module 2. Data Science Toolkit - Notebooks,Visualization, and Communication (2hr) 

    • Overview of the data science toolkit.
    • Hands-on workshop introducing the toolkit and getting started with Python scripts and notebooks.
    • Overview of how to manage and use Python packages. 
    • Hands-on workshop on Python packages covering how to install and manage packages, and how to use packages from your Python notebooks
    • Overview of data visualization with SandDance.
    • Hands-on workshop introducing SandDance for interactive data visualization using a dataset of offshore wells from the UK Continental Shelf.
    • Overview of Markdown - a lightweight markup language for adding simple formatting to plain text documents, and documenting Python notebooks.
    • Hands-on workshop on Markdown for formatting text documents and annotating Python notebooks.


Module 3.Computational Thinking (1hr) 

    • Introduction to Computational Thinking - the analytical and logical processes of decomposing a complex task and expressing it in a form that can be performed by a computer.
    • Hands-on workshop on Computational Thinking applied to the design and implementation an interactive base map for UK E&P data.

Module 4. Python Fundamentals (2hr) 

    • Python 101 - Introduction to Python fundamentals including variables, types, statements, expressions, control flow, and functions.
    • Hands-on workshop on Python 101.
    • Python 102 - More Python fundamentals including data structures, modules, files and folders, JavaScript Object Notation (JSON), serialization.
    • Hands-on workshop on Python 102.


Module 5.Exploratory Data Analysis (2hr) 

    • Exploratory Data Analysis - Introduction to the Exploratory Data Analysis process and key Python packages: pandas for data analysis and plotly for statistical graphics.
    • Hands-on workshop on exploratory data analysis - reading data into pandas data frames, handling dates, merging datasets, creating statistical graphics figures with plotly, exporting figures.
    • Statistical Graphics - Demonstration of a gallery of statistical graphics samples 
    • Descriptive Statistics - Introduction to univariate and multivariate statistics.

Module 6. Exploring E&P Data (4hr) 

    • Well header data - Introduction to handling well header data (surface location and attributes) using the pandas and plotly packages.
    • Hands-on workshop on well header data - including import, data cleaning, date handling, posting well data on cultural/satellite base map and visualizing historical trends.
    • Production data - Introduction to handling field production data using the pandas and plotly packages.
    • Hands-on workshop on field production data - including import, data cleaning, date handling, queries, visualizing hierarchical and  time series data. 
    • Well log data - Introduction to handling wireline logs from LAS files using the lasio, pandas, and plotly packages. 
    • Hands-on workshop on well log and tops data - including LAS (.las) file import, merging tops, and data visualization.
    • Seismic data - Introduction to handling seismic SEG-Y data using the segyio, and plotly packages.
    • Hands-on workshop onseismic data - including SEG-Y (.segy) file import, extracting binary and traceheaders, visualizing seismic trace data, and calculating seismic attributes

Module 7. Machine Learning Fundamentals(4hr) 

    • Machine Learning - introduction to the fundamentals of machine learning including background concepts from probability theory, the different types of machine learning, and the basic workflow to build and evaluate models from data. 
    • Python scikit-learn - introduction to the Python scikit-learn package for machine learning, including a demonstration of typical pipelines and workflows. 
    • Supervised learning with regression - introduction to regression including traditional linear regression, random forest regression, hyperparameter optimization, and performance evaluation.
    • Hands-on workshop on regression for reconstructing wireline logs using random forests.
    • Unsupervised Learning - introduction to unsupervised learning for dimensionality reduction, clustering and outlier detection. 
    • Hands-on workshop on unsupervised learning for outlier detection and clustering of wireline logs.
    • Explainable Machine Learning - introduction to explainable machine learning: techniques for looking inside the so-called black box models of machine learning to understand why particular predictions are made and which variables are important.

This is an introductory course for reservoir geologists, reservoir geophysicists, reservoir engineers, and technical staff who want to learn the key concepts of data science.

  • An introduction to data science and fundamentals of Python programming.
  • Exploratory data analysis, visualization tools, and descriptive statistics.
  • Supervised machine learning, including algorithms for classification and regression, and their advantages and limitations.
  • Unsupervised machine learning, including algorithms for outlier detection and clustering, and their advantages and limitations.

No prior experience of statistics, coding or machine learning is required, although knowledge of basic maths and statistics is useful.

Hands-on computer workshops form a significant part of this course, and participants must come equipped with a laptop computer running Windows (7, 8, 10) or MacOS (10.10 or above) with sufficient free storage (4 Gb).

David Psaila
Remote, Centralized Class
January 18-21, 2021

Set a training goal, and easily track your progress

Customize your own learning journey and track your progress when you start using a defined learning path.

In just few simple steps, you can customize your own learning journey in the discipline of your interest based on your immediate, intermediate and transitional goals. Once done, you can save it in NExTpert, the digital learning ecosystem, and track your progress.
© 2021 Schlumberger Limited. All rights reserved.