# Introduction to SDAR

## Stratigraphic Data Analysis (SDAR)

#### John Ortiz 1,2, Carlos Jaramillo 2

1 Smithsonian Tropical Research Institute,Balboa, Ancon, Republic of Panama, 2 Corporación Geológica ARES, Bogotá, Colombia.

SDAR is a fast and consistent tool for plotting and facilitating the analysis of stratigraphic and sedimentological data, designed to plot detailed stratigraphic sections and to perform quantitative stratigraphic analyses.

### Abstract

Stratigraphic Columns (SC) are the most useful and common ways to represent the field descriptions (e.g., grain size, the thickness of rock packages, fossil content and lithological components) of rock sequences and well logs. In these representations, the width of SC vary according to the grain size (i.e., the wider the strata, the coarser the rocks (Miall 1990; Tucker 2011)), and the thickness of each layer is represented at the vertical axis of the diagram. Typically these representations are drawn ‘manually’ using vector graphic editors (e.g., Adobe Illustrator®, CorelDRAW®, Inskape). Nowadays there are various software packages which automatically plots SCs, but there are not versatile open-source tools and it is very difficult to both store and analyse stratigraphic information.

This document presents Stratigraphic Data Analysis in R (SDAR), an analytical package designed for both plotting and facilitate the analysis of Stratigraphic Data in R (R Core Team 2019). SDAR, uses simple stratigraphic data and takes advantage of the flexible plotting tools available in R to produce detailed SCs. The main benefits of SDAR are:

• used to generate accurate and complete SC plot including multiple features (e.g., sedimentary structures, samples, fossil content, color, structural data, contacts between beds)
• developed in a free software environment for statistical computing and graphics
• run on a wide variety of platforms (i.e., UNIX, Windows, and MacOS)
• both plotting and analysing functions can be executed directly on R’s command-line interface (CLI), consequently this feature enables users to integrate SDAR’s functions with several other add-on packages available for R from The Comprehensive R Archive Network (CRAN). Keywords: Quantitative Stratigraphy, Stratigraphic log, R package, Geosciences software

#### Acknowledgments

This project has been sponsored by Carlos Jaramillo (Smithsonian Tropical Research Institute (STRI)), COLCIENCIAS con Fondos para la Investigación de la Ciencia y la Tecnológia del Banco de la República, and Corporación Geológica ARES.

### Data model

This chapter presents a summary of the specific types of data required by SDAR package. The representation schemes and standard formats that should be satisfied by the input data set to be integrated on SDAR are mentioned.

Many of the sedimentological, stratigraphical and paleontological features share common properties. For example, beds, intervals and lithostratigraphic and chronostratigraphic units are all defined over a stratigraphic range (i.e., each of them has to be defined by a base and a top). Moreover, features as samples, structural data, and geochemical and geochronological analysis, are usually collected, or correspond to a specific stratigraphic position (i.e., each of them represents a unique depth into the SC). On the other hand, other features as fossil occurrences, bioturbation, and sedimentary structures, could be described by both, stratigraphic range, or a specific stratigraphic position (e.g., a fossil occurrence could be presented in a specific depth, throughout a bed, or throughout a set of beds). Therefore, a flexible data model able to store and to integrate all the previous descriptions was implemented. SDAR allows users to provide stratigraphic information in these three main types: beds, intervals and punctual features. A description and an example of each data type are provided below.

### Data format to integrate rock layers (beds)

A layer of rock is the fundamental unit in an SC representation. It describes the thickness, composition, and texture of a rock. In order to integrate a stratigraphic layer in SDAR, the information required for each layer is bed number, thickness (i.e, it is defined by a base and a top), rock type, primary lithology, and grain size. To improve communication between geoscientists, some conventions, defined by sedimentologists to draw lithology patterns, and to describe grain size and color, are implemented. Details on the information required to define a layer and the sources for the conventions implemented are provided below.

• bed_number: numeric. It is often useful to give to each bed, or rock unit, a number so as to facilitate later reference; begins at the stratigraphically lowest bed (tucker2011).
• base and top: numeric. They define the bed thickness of each layer. Overlapping is not allowed between layers.
• rock_type: string or numeric. This field must include only one of the values listed in Table 1.
###### Table 1: Rock type.
id name
1 sedimentary
2 igneous
3 covered
• prim_litho: string or numeric. This field must include only the values listed in Table 2. (To draw lithologic patterns, conventions suggested by the Federal Geographic Data Committee FGDC (fgdc2006) are implemented).
###### Table 2: Primary lithology.
id name id. name.
1 claystone 7 breccia
2 siltstone 8 limestone
3 mudstone 9 dolomite
4 shale 13 coal
5 sandstone 24 tuff
6 conglomerate 26 granite
• grain_size: string or numeric. This field must include only the values listed in Table 3. Grain size is a fundamental attribute of siliciclastic sedimentary and pyroclastic rocks, and thus one of the most important descriptive properties of such rocks. Grain size is represented by the width of the SC (miall 1990), and it is indicated by the graphic scale at the header of the SC (see, Figure 1). The conventions clay, mud, silt; vf, f, m, c and vc very fine, fine, medium, coarse, and very coarse sand; gr granule; pe pebble; co cobble; bo boulder are based on the Wentworth classification system for siliciclastic rocks (wentworth 1922), for pyroclastic rocks (wentworth 1932), and for classification of carbonate rocks (dunham 1962) are implemented.
###### Table 3: Grain size table.
id name id. name.
1 clay 21 boulder
2 clay / silt 22 mudstone
3 silt 23 wackestone
4 silt / very fine sand 24 packstone
5 very fine sand 25 grainstone
6 very fine / fine sand 26 boundstone
7 fine sand 27 floatstone
8 fine / medium sand 28 rudstone
9 medium sand 29 bafflestone
10 medium / coarse sand 30 bindstone
11 coarse sand 31 framestone
12 coarse / very coarse sand 32 crystalline
13 very coarse sand 33 fine ash
14 very coarse / granule 34 medium ash
15 granule 35 coarse ash
16 granule / pebble 36 fine lapilli
17 pebble 37 medium lapilli
18 pebble / cobble 38 coarse lapilli
19 cobble 39 fine block
20 cobble / boulder 40 coarse block

In summary, a table with the structure presented in table 4 must be provided. Each row in this data array describes a stratigraphic bed/layer.

##### Table 4: Example of beds/layers table.
bed_number base top rock_type prim_litho grain_size
1 671 670.2 sedimentary claystone clay
2 670.2 669.4 sedimentary siltstone silt
3 669.4 669.18 sedimentary sandstone medium sand
4 669.18 667.6 sedimentary limestone wackestone
5 667.6 667.2 sedimentary conglomerate boulder
6 667.2 666.2 sedimentary shale silt

On the SDAR repositoy there is a template (excel spreadsheet) with the suggested format by SDAR to store thickness, and texture description of rock layers (beds).

### Data format to integrate interval features

An interval is defined over a stratigraphic range; it has to be defined by a base and a top. The main requirement to set an interval is that the recorded geological feature (e.g., sedimentary structures, bioturbation, unit name, fossil content) is presented throughout the defined stratigraphic range. Furthermore, users can define an interval by the stratigraphic thickness contained into a given bed. For example; if bioturbation is present just in the top of the bed, users could define the base and top of the bioturbed interval and store it in this format. Following this approach users can store different features in this format, as sedimentary structures, miscellaneous elements, bioturbation index, and oil stain. In this data type overlapping is allowed.

In the data structure to define intervals, the user must define a stratigraphical base, top, and the recorded feature of each interval as is presented in Table 5. Each row in this data array describes a stratigraphic interval with the feature described on it.

##### Table 5: Examples of interval tables.
###### Oil stain
base top intensity
8.3 3 very strong
31.5 28 moderate strong
35.3 33 moderate
41.3 37 weak
53.3 44 strong
base top index
7.1 6 4
9.9 7.2 1
12.1 12 2
24.7 24 6
29.8 28.8 5
###### Sedimentary structures
base top sed_structure
12.1 12.4 through cross-stratification
22.1 22.8 wavy lamination
22.1 23.4 planar lamination
27.2 28.4 ripple lamination
52.7 58.1 cross-lamination

On the SDAR repositoy there is a template (excel spreadsheet) with the suggested format by SDAR to store “intervals information” (e.g., metadata, samples, oil stain, bioturbation, sedimentary structures, fossil and trace fossil content).

### Example dataset (Saltarin 1A Well)

This dataset gives a lithologic description for borehole Saltarin 1A, located in the Llanos Basin in eastern Colombia (4.612 N, 70.495 W). The stratigraphic well Saltarin 1A drilled 671 meters of the Miocene succession of the eastern Llanos basin, corresponding to the Carbonera (124.1 m; 407.1 ft), Leon (105.1 m; 344.8 ft), and Guayabo Formations (441.8 m; 1449.5 ft) (Bayona, et al. 2008). The Saltarin core was described at a scale of 1:50 for identification of grain-size trends, sedimentary structures, clast composition, the thickness of lamination, bioturbation patterns, and macrofossil identification, all of which are used for identifying individual lithofacies and for sedimentological and stratigraphic analyses.

The Saltarin dataset provided by this package is a dataset comprising the lithological description of 686 rock layers described along of 671 meters of the Saltarin 1A borehole; it is a data frame object with 686 layers (rows), including thickness, composition and texture description of each layer, stored following the suggested format by SDAR (27 columns).

library(SDAR)
nrow(saltarin)   # number of rock layers
#> [1] 686
names(saltarin)  # variable names of composition and texture description of each layer
#>  [1] "locality_id"        "bed_number"         "base"
#>  [4] "top"                "rock_type"          "prim_litho"
#>  [7] "grain_size"         "prim_pct"           "sec_litho"
#> [10] "sec_pct"            "ter_litho"          "ter_pct"
#> [16] "grain_size_top"     "grain_size_point_A" "strat_pos_point_A"
#> [19] "grain_size_int_B"   "strat_pos_int_B"    "sorting"
#> [22] "roundness"          "matrix"             "cement"
#> [25] "fabric"             "munsell_code"       "notes"

### The strata class

R has three object-oriented (OO) systems: [[S3]], [[S4]] and [[R5]]. The S4 object system is much stricter and much closer to other OO systems (wickham2014). The SDAR package introduces a new S4 object class called strata to store stratigraphy objects. Here S4 classes and methods style (chambers1998) to allow validation of the created objects, were used. This S4 class gives a rigorous definition of a strata object. The valid object of this S4 class will meet all the requirements specified in the definition (e.g., the names of the columns must be called: bed_number, base, top, rock_type, prim_litho, grain_size, also base and top must be of a numeric type). The definition of this S4 class reduces errors. It recognizes the type of information that the object contains, and the validity of it (wickham2014.)

# strata function automatically validates the inputted dataset
# and returns a stratigraphy class object.

val.beds.salt <- strata(saltarin)
#>    'beds data has been validated successfully'

# check the class of the object generated by the strata function
class(val.beds.salt)
#> [1] "strata"
#> attr(,"package")
#> [1] "SDAR"

The validity of the saltarin object, related to its class definition, is tested using the function strata. The object must to satisfy the conditions required by this class. The previous process is evaluated running the command salarin_val <- strata(saltarin). It automatically validates the inputted dataset (saltarin) and returns an strata class object.

### Methods within the strata class

In this version of SDAR package, the methods associated with the strata class are summary and plot. The summary method displays standard information about the strata class object. The summary function displays a synopsis of the content in the strata object including the total number of layers, the thickness of the study section and the number of layers by lithology type. Once the stratigraphy data is loaded into R, we are able to plot strata class objects to visualise the information. The method provides different outputs depending on the parameter settings.

### Plotting method for strata class

The methods associated with the strata class in the currently version of SDAR are plot and summary. The minimal information required to plot a SC using SDAR is a table with the structure presented in table 4. Having a defined and a validated dataset, as a strata class, the plot method plot.strata is accessed automatically. The next portion of code produces Figure 1.

# Code to generate example presented in Figure 1.

val.beds.salt <- strata(saltarin)       # validated input data
plot(val.beds.salt, datum="top")        # plot a stratigraphic log with the SDAR default options 

# Code to generate example presented in Figure 2.

plot(val.beds.salt, datum="top", d.scale=500, d.barscale=5, data.units="meters",
subset.base=80, subset.top=0, file.name="saltarin_well_SDAR_demo",
oil.stain=data.oilstain, bioturbation=data.bio, GR.log=data.log, xlim.GR = c(0, 350),
main="Graphic log of Saltarin-1A well", sub="Description scale 1:500")

Figure 2: Borehole log adding features as Gamma-ray log, bioturbation index, and oil stain.

# Code to generate example presented in Figure 3.

plot(val.beds.salt, datum="top", d.scale=500, d.barscale=5, data.units="meters",
subset.base=50, subset.top=0, file.name="saltarin_well_SDAR_symbols",
oil.stain=oil.stain, bioturbation=data.bio, GR.log=registro, xlim.GR = c(0, 350),
main="Graphic log of Saltarin-1A well", sub="Description scale 1:500",
fossils=data.fos, sed.structures=data.sed, tracefossils=data.trace,
other.sym=data.other, samples=data.samples, ncore=data.ncore,
lithostrat=data.lithostrat, struct.data=data.struct)
Figure 3: Borehole log adding symbol features representation (e.g, sedimentary structures,
body and trace fossils, and samples)

### SDAR output

Figures 1, 2, and 3 present examples of graphic logs generated automatically using SDAR packages after the stratigraphic information has been correctly loaded and validated into R. Graphic log generated by SDAR is exported as PDF files (completely editable with any vector drawing applications). It will present on a single page, and the paper size will automatically be updated by changes in the vertical scale, or when different sets of attributes are plotted on the right or left side of the lithological column.

NOTE: Data as gamma-ray, oil stain, bioturbation, sedimentary structures, and body and trace fossils represented in Figures 2 and 3 are simulated dataset generated for these examples, the graphic log does not have a geological meaning".

### Summary method for strata class data

In this section, the functionality of the summary method is presented. When summary function is executed with a strata class object, the results are printed in the R console. The summary function displays a synopsis of the content in the strata object. It includes the total number of layers, the thickness of the SC, the thickness of covered intervals, thickness percent and the number of layers by lithology type, into the study SC. The results of running summary function with the example data set are printed below.

library(SDAR)
data(saltarin)
saltarin_val <- strata(saltarin)  # input data validation
#>    'beds data has been validated successfully'
summary(saltarin_val)
#>
#>  Number of beds:              610
#>  Number of covered intervals   76
#>
#>  Thickness of the section:        671.0
#>  Thickness of covered intervals:   77.9
#>
#> Summary by lithology:
#>
#>                 Thickness  Percent (%)  Number beds
#> sandstone           233.3        34.77          330
#> claystone           211.6        31.53          130
#> siltstone           143.4        21.37          138
#> coal                  3.1         0.46            8
#> conglomerate          1.8         0.27            4
#> covered              77.9        11.61           76
summary(saltarin_val, grain.size=TRUE)
#>
#>  Number of beds:              610
#>  Number of covered intervals   76
#>
#>  Thickness of the section:        671.0
#>  Thickness of covered intervals:   77.9
#>
#> Summary by lithology:
#>
#>                 Thickness  Percent (%)  Number beds
#> sandstone           233.3        34.77          330
#> claystone           211.6        31.53          130
#> siltstone           143.4        21.37          138
#> coal                  3.1         0.46            8
#> conglomerate          1.8         0.27            4
#> covered              77.9        11.61           76
#>
#> Summary by Grain Size:
#>
#>                              Thickness  Percent (%)  Number beds
#> clay                             194.0        28.92          123
#> clay / silt                       43.7         6.51           28
#> silt                              88.6        13.21           89
#> silt / very fine sand             88.3        13.16          101
#> very fine sand                    71.6        10.68          122
#> very fine / fine sand             32.4         4.83           49
#> fine sand                         27.5         4.10           37
#> fine / medium sand                20.3         3.03           18
#> medium sand                        9.2         1.37           11
#> medium / coarse sand               5.6         0.83            8
#> coarse sand                        5.5         0.82           15
#> coarse / very coarse sand          3.7         0.55            3
#> very coarse / granule              1.5         0.22            3
#> granule                            1.1         0.16            3
#> covered                           77.9        11.61           76

### Bibliography

• Committee, F. G. D. (2006). Fgdc digital cartographic standard for geologic map symbolization.

• Dunham, R. (1962). American association of petroleum geologists memoir. Classification of carbonate rocks according to depositional texture. In Ham, W.E. Classification of carbonate rocks, 1:108-121.

• Johnson, M. R. (1992). A proposed format for general-purpose comprehensive graphic logs. Sedimentary Geology, 81(3-4):289-298.

• Miall, A. D. (1990). Principles of Sedimentary Basin Analysis. Springer-Verlag.

• R Core Team (2014). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.

• Tucker, M. E. (2011). Sedimentary Rocks in the Field: A Practical Guide. Geological Field Guide. Wiley.

• Wentworth, C. K. (1922). A scale of grade and class terms for clastic sediments. Journal of Geology, 30:377-392.

• Wentworth, C. K. and Williams, H. (1932). Classification and terminology of pyroclastic rocks. National Research Council Bulletin, 89:19-53.