{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "(datospp)=\n", "# Datos preprocesados\n", "En la sección anterior, analizamos los datos publicados para las LHCO 2020 sin alteración. En esta sección, analizaremos los datos preprocesados de estos conjuntos. \n", "\n", "Los datos preprocesados se obtienen utilizando `build_features` del módulo `clustering`, que utiliza la librería `pyjet` para agrupar los jets y obtener variables cinemáticas de los dos jets con mayor $pT$ de cada evento, puesto que se espera que estos correspondan a los jets provenientes de las partículas $X$ y $Y$ de la señal descrita en la {numref}`datos`. Los detalles del preprocesamiento se encuentran en la {numref}`bench-pre`. Los primeros cinco eventos del conjunto R&D preprocesados por `benchtools` se pueden ver en la {numref}`df-RnD`." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "# Importamos las librerías principales\n", "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from myst_nb import glue\n", "from PIL import Image\n", "import os\n", "\n", "# Funciones de benchtools\n", "from benchtools.src.plotools import create_png, image_grid\n", "from benchtools.src.clustering import build_features\n", "\n", "# Definimos variables globales\n", "PATH_IMAGES = '../../figuras/'" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "# Definimos donde están los datos\n", "path_data_RnD = \"../../../datos/events_anomalydetection.h5\"\n", "path_data_BB1 = \"../../../datos/events_LHCO2020_BlackBox1.h5\"\n", "path_key_BB1 = \"../../../datos/events_LHCO2020_BlackBox1.masterkey\"" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "A file with that name already exists\n" ] } ], "source": [ "# preprocesamiento R&D\n", "# Esta celda se corre una vez para preprocesar los datos\n", "# Una vez que el archivo existe no vuelve a correr\n", "build_features(path_data=path_data_RnD, nbatch=11, outname='RnD-1100000', outdir='../../../datos/', chunksize=100000)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "A file with that name already exists\n" ] } ], "source": [ "# preprocesamiento BB1\n", "# Esta celda se corre una vez para preprocesar los datos\n", "# Una vez que el archivo existe no vuelve a correr\n", "build_features(path_data=path_data_BB1, nbatch=10, outname='BB1-1000000', path_label=path_key_BB1, outdir='../../../datos/', chunksize=100000)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "tags": [ "remove-cell" ] }, "outputs": [ { "data": { "application/papermill.record/text/html": "
\n | pT_j1 | \nm_j1 | \neta_j1 | \nphi_j1 | \nE_j1 | \ntau_21_j1 | \nnhadrons_j1 | \npT_j2 | \nm_j2 | \neta_j2 | \nphi_j2 | \nE_j2 | \ntau_21_j2 | \nnhadrons_j2 | \nm_jj | \ndeltaR_j12 | \nn_hadrons | \nlabel | \n
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n1286.727685 | \n106.912129 | \n0.185508 | \n-2.763676 | \n1313.290435 | \n0.624659 | \n36 | \n1283.220733 | \n63.164215 | \n0.064989 | \n0.393688 | \n1287.481934 | \n0.713248 | \n33 | \n2580.489568 | \n3.159663 | \n109.0 | \n0.0 | \n
1 | \n1354.394070 | \n614.269108 | \n0.826505 | \n1.365524 | \n1943.559886 | \n0.311688 | \n84 | \n1325.613761 | \n439.064150 | \n-0.874319 | \n-1.786248 | \n1916.370744 | \n0.276881 | \n97 | \n3859.315047 | \n3.581406 | \n208.0 | \n0.0 | \n
2 | \n1214.955723 | \n645.865619 | \n-0.196786 | \n2.040545 | \n1396.840654 | \n0.238205 | \n119 | \n1072.462085 | \n113.768840 | \n0.143831 | \n-1.090330 | \n1089.530630 | \n0.726963 | \n59 | \n2480.769725 | \n3.149348 | \n196.0 | \n0.0 | \n
3 | \n1285.227873 | \n516.835248 | \n0.328693 | \n2.975321 | \n1450.485926 | \n0.013429 | \n65 | \n1220.251279 | \n174.796077 | \n0.294854 | \n-0.322661 | \n1285.618789 | \n0.706361 | \n89 | \n2609.893413 | \n3.298155 | \n183.0 | \n0.0 | \n
4 | \n1210.415787 | \n129.499352 | \n-0.744836 | \n-2.883347 | \n1567.345300 | \n0.423550 | \n54 | \n1091.785816 | \n155.362262 | \n1.060534 | \n0.264977 | \n1772.340209 | \n0.787662 | \n57 | \n3313.488835 | \n3.629229 | \n169.0 | \n1.0 | \n