ArchDataPy Tutorial

This short tutorial will walk you through using the archdatapy package to access archaeological datasets from the R archdata package hosted on CRAN. The tutorial covers how to download these datasets and load them into a Python session as pandas DataFrames for easy analysis.

Libraries

First, import the necessary libraries. The archdatapy package provides two main functions: - get_archdata: Downloads and extracts the R datasets. - load_archdata: Loads a specified .rda file into Python as a pandas DataFrame.

[1]:

from archdatapy import get_archdata, load_archdata
from pprint import pprint

Step 1: Download and List Available Datasets

The first step is to use get_archdata to download the archdata package from CRAN, extract the datasets, and return a dictionary containing the dataset names and file paths.

[2]:

# Download datasets and get the file paths
file_paths = get_archdata()
print("Available datasets and their file paths:")
for dataset_name, path in file_paths.items():
    print(f"{dataset_name}: {path}")

Available datasets and their file paths:
Acheulean: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\Acheulean.rda
Arnhofen: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\Arnhofen.rda
BACups: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\BACups.rda
BarmoseI.grid: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\BarmoseI.grid.rda
BarmoseI.pp: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\BarmoseI.pp.rda
Bornholm: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\Bornholm.rda
DartPoints: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\DartPoints.rda
EIAGraves: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\EIAGraves.rda
EndScrapers: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\EndScrapers.rda
EngrBone: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\EngrBone.rda
ESASites: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\ESASites.rda
EWBurials: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\EWBurials.rda
Fibulae: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\Fibulae.rda
Handaxes: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\Handaxes.rda
MaskSite: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\MaskSite.rda
Mesolithic: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\Mesolithic.rda
Michelsberg: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\Michelsberg.rda
Nelson: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\Nelson.rda
Olorgesailie.maj: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\Olorgesailie.maj.rda
Olorgesailie.sub: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\Olorgesailie.sub.rda
OxfordPots: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\OxfordPots.rda
PitHouses: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\PitHouses.rda
RBGlass1: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\RBGlass1.rda
RBGlass2: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\RBGlass2.rda
RBPottery: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\RBPottery.rda
Snodgrass: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\Snodgrass.rda
TRBPottery: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\TRBPottery.rda

After running this, you should see a list of dataset names along with their file paths. Each key in the file_paths dictionary is a dataset name (based on the file name without extension), and each value is the full path to the corresponding .rda file.

Step 2: Loading a Specific Dataset

Now that we have the paths, we can use load_archdata to load a specific dataset. Choose one of the dataset names from the output above, then pass its file path to load_archdata. This function will load the contents of the .rda file and return a dictionary of objects.

[3]:

# Choose a dataset to load (replace 'YourDatasetName' with an actual dataset name from file_paths keys)
dataset_name = 'Acheulean'  # e.g., "MesaVerde"
data = load_archdata(file_paths[dataset_name])

# Inspect the loaded data
print(f"Contents of the dataset '{dataset_name}':")
for obj_name, obj_value in data.items():
    print(f"Object name: {obj_name}, Object type: {type(obj_value)}")

Contents of the dataset 'Acheulean':
Object name: Acheulean, Object type: <class 'pandas.core.frame.DataFrame'>

For datasets that are simple tables or dataframes, we can immediately see them in the notebook by referencing the appropriate key in the dictionary returned by the underlying call to pyreadr:

[4]:

data[dataset_name]

[4]:

	Lat	Long	HA	CL	KN	FS	D	CS	P	CH	SP	OLT	SS	OST
rownames
Olorgesailie	-1.58	36.45	197	96	58	17	5	11	3	32	52	6	213	218
Isimila	-7.90	35.61	246	208	30	28	6	30	16	62	17	15	98	64
Kalambo Falls	-8.60	31.24	337	264	59	96	8	124	18	69	6	17	303	48
Lochard	-19.92	29.02	45	13	3	2	12	1	0	32	3	8	46	22
Kariandusi	-0.45	36.26	132	56	47	23	3	5	7	6	5	8	17	25
Broken Hill	-14.43	28.45	1	8	1	1	0	1	0	4	25	0	35	18
Nsongezi	-1.03	30.78	15	19	2	9	1	28	1	19	0	10	17	70

And, that’s it! Time to analyze the data you downloaded.