{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Urgences - Image CC0 - pexels.com](img/pexels-pixabay-263402.jpg \"Urgences\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Challenge - [ED Lab Prediction]\n",
"_Nom à trouver_"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ce notebook est à destination de l'équipe travaillant autours du projet. \n",
"L'objectif est de présenter sommairement le projet et les données."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Rationel"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Chaque jour, environ 50 000 personnes se présentent dans un service d'accueil des urgences (SAU) en France. En moyenne, 75% des patients retournent à domicile, et 20% sont hospitalisés. La durée moyenne de présence au SAU est longue. On estime que seulement 20% attendront moins d'une heure, tandis que ~30% attendront entre 1h et 2H et ~30% attendront en 2 et 4H. Enfin, un peu plus de 10% resteront au SAU entre 4 et 6H. Dans un contexte de pénurie de soignants, le recours à la consultation au SAU est en constante augmentation depuis plusieurs années. L'optimisation du circuit des urgences est une problématique centrale. Le cout humain et financier des dysfonctionnements du circuit et de l'offre de soin est important. \n",
"\n",
"Le parcours classique du circuit des urgences est le suivant : \n",
"1. **Premier contact d'ordre administratif**\n",
"2. **Premier contact soignant avec une infirmière d'accueil et d'orientation (IAO) (~M30) avec** :\n",
" - Recueil du motif de consultation\n",
" - Prise des constantes\n",
" - Recueil de quelques antécédents et de l'ordonnance du patient \n",
" - Eventuellement ECG \n",
"\n",
"\n",
"Le patient est classé selon un score de gravité (bleu, vert, jaune, orange, rouge, ou 1-2-3-4-5)\n",
"\n",
"3. **Premier contact médical avec un médecin (~H1)** :\n",
" - Interrogatoire\n",
" - Examen clinique\n",
"\n",
" \n",
"A la suite de cette consultation, plusieurs cas de figures selon la situation.\n",
"Le patient peut sortir avec ou sans ordonnance si le diagnostic est posé par l'examen clinique et ne nécéssite ni examen, ni hospitalisation.\n",
"Le patient peut nécéssiter la réalisation d'examens (prise de sang, radiographie, scanner) ou motiver un avis d'un spécialiste. Auquel cas il doit attendre\n",
"\n",
"4. **Réalisation des examens complémentaire ou d'un avis (prescription, réalisation, récupération)**\n",
"5. **Décision finale : Conclusion une fois les examens récupérés. (~H3)**\n",
"\n",
"Entre chaque étape, le patient attend pendant une durée plus ou moins longue. Le médecin lui « jongle » avec plusieurs patients à la fois à des étapes différentes. \n",
"\n",
"Nous proposons d'aider à raccourcir le temps entre l'arrivée du patient et sa sortie, en ne subordonnant pas la décision de réaliser un examen biologique à l'examen clinique du médecin. Nous savons que le temps entre l'arrivée au SAU et la première visite avec le médecin est le temps le plus long et le plus mal vécu par les patients. \n",
"\n",
"Nous proposons à l'aide d'un algorithme d'apprentissage statistique de prédire, dès les données fournies par l'IAO, la nécéssité de réaliser un examen de biologie médicale, afin de permettre aux IDE de prélever cet examen juste après l'IAO, de sorte que le médecin dès sa première visite peut conclure avec les résultats de la biologie, qu'il aurait sans cela, demandé et attendu de récuperer avant de conclure et de prendre en charge le patient. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Méthode"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Nous faisons l'hypothèse que la réalisation ou non (variable qualitative binaire) d'une biologie médicale est (en grande partie) déterminée par les variables disponibles dès l'IAO."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
" \n",
" Données d'entrée | \n",
" Algorithme | \n",
" Données de sortie | \n",
"
\n",
" \n",
" | \n",
" | \n",
" Vecteur {0,1}^d d'examens de biologie associée à sa réalisation (1) ou non (0) | \n",
"
\n",
" \n",
" Age | \n",
" MLP NLP (Embeddings, Word2Vec ...) Autres | \n",
" Sodium (Na) - {0,1} | \n",
"
\n",
" \n",
" Sexe | \n",
" Potassium (K) - {0,1} | \n",
"
\n",
" \n",
" Motif de consultation | \n",
" Numération sanguine (NFS) - {0,1} | \n",
"
\n",
" \n",
" Paramètres vitaux (FC, SpO2, PA, T°, FR, EVA) | \n",
" Hémocultures - {0,1} | \n",
"
\n",
" \n",
" Ordonnance d'entrée du patient | \n",
" INR - {0,1} | \n",
"
\n",
" \n",
" | \n",
" ... | \n",
"
\n",
"
\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Les étapes clés de ce projet sont les suivantes :\n",
"\n",
"- I. Identifier une base de données exploitable – Verrouiller la base \n",
"Base MIMIC-IV ED – Verrouillée et disponible via le présent dépôt\n",
"- II. Explorer et visualiser les données \n",
"Recherche de corrélations, visualisation des données, description et analyse des inputs et output features\n",
"- III. Pre-processer les variables d'interêts\n",
"- IV. Entrainer un algorithme de machine learning avec une metric d'interet à définir\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Données"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Les données sont issus du projet MIMIC-IV. \n",
"Le projet MIMIC est un projet d'open-data médical initié par l'hopital _Beth Israel Deaconess_ à Boston. \n",
"Initialement, seul des données de réanimation été accessible.\n",
"\n",
"Pour sa 4ème édition, a été mis à disposition un jeu de données couvrant un spectre bien plus large :\n",
"- Données relatives aux passages aux urgences\n",
"- Données relatives aux hospitalisations\n",
"- Données relatives aux séjour en réanimation\n",
"- Données de radiographie thoracique avec compte rendu associé"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"L'ensemble de ces données ont été mis à disposition dans le cadre de projets complémentaires :\n",
"- [MIMIC-IV](https://physionet.org/content/mimiciv/0.4/) : hospitalisation et réanimation\n",
"- [MIMIC-IV-ED](https://physionet.org/content/mimic-iv-ed/1.0/) : urgences\n",
"- [MIMIC-IV-CXR](https://physionet.org/content/mimic-cxr/2.0.0/) : radiographie thoracique\n",
"\n",
"Ces bases sont complémentaires dans le sens où chaque collecte a été faite durant une période temporelle spécifique, qui se recoupe plus où moins. \n",
"Certains éléments nécessaires à l'exploitation de MIMIC-IV-ED sont présent dans MIMIC-IV. \n",
"La lecture de la documentation de MIMIC-IV et de MIMIC-IV-ED est vivement recommandé (lien ci-dessus).\n",
"\n",
"En complément, un certains nombre de ressources est disponible sur le site du projet [MIMIC-IV](https://mimic.mit.edu/)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Nous proposons de travailler autours de ce jeu de données avec pour objectif la tache suivante : **prédire les examens biologiques qui seront réalisés lors de l'arrivé d'un patient aux urgences**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Téléchargement des données"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"La base de données de biologie étant volumineuse (nous y reviendrons plus bas), un pré-traitement des données a été effectué. \n",
"Le pré-traitement est le suivant :\n",
"- Intégration de l'ensemble des données utiles au sein d'une base de données SQLITE\n",
"- Tri des lignes de biologies afin de ne conserver que celles répondant au critères suivants :\n",
" - Date de réalisation >= date de début du passage aux urgences\n",
" - Date de réalisation <= date de fin du passage aux urgences"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Un token de téléchargement des données vous a normallement été mis à disposition."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```\n",
" pip install -r requirements.txt\n",
" python download_data.py [TOKEN]\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Le script de transformation est disponible dans `database_constitution/database_constitution.py`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Chargement et exploration des données"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Les données sont téléchargés dans le dossier data. \n",
"Il est facile de requêter une base sqlite à l'aide de pandas."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import sqlite3"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" index | \n",
" subject_id | \n",
" stay_id | \n",
" temperature | \n",
" heartrate | \n",
" resprate | \n",
" o2sat | \n",
" sbp | \n",
" dbp | \n",
" pain | \n",
" acuity | \n",
" chiefcomplaint | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0 | \n",
" 15585360 | \n",
" 37573921 | \n",
" 97.0 | \n",
" 87.0 | \n",
" 18.0 | \n",
" 100.0 | \n",
" 150.0 | \n",
" 71.0 | \n",
" 10.0 | \n",
" 3.0 | \n",
" | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" 17192424 | \n",
" 34160628 | \n",
" 98.6 | \n",
" 82.0 | \n",
" NaN | \n",
" 100.0 | \n",
" 111.0 | \n",
" 81.0 | \n",
" 3.0 | \n",
" 3.0 | \n",
" | \n",
"
\n",
" \n",
" 2 | \n",
" 2 | \n",
" 15248757 | \n",
" 32172727 | \n",
" 97.1 | \n",
" 112.0 | \n",
" 20.0 | \n",
" 100.0 | \n",
" 147.0 | \n",
" 97.0 | \n",
" 8.0 | \n",
" 4.0 | \n",
" | \n",
"
\n",
" \n",
" 3 | \n",
" 3 | \n",
" 16648037 | \n",
" 38946064 | \n",
" 98.5 | \n",
" 59.0 | \n",
" 18.0 | \n",
" 99.0 | \n",
" 160.0 | \n",
" 86.0 | \n",
" 2.0 | \n",
" 2.0 | \n",
" | \n",
"
\n",
" \n",
" 4 | \n",
" 4 | \n",
" 13492931 | \n",
" 39828574 | \n",
" 100.6 | \n",
" 90.0 | \n",
" 16.0 | \n",
" 96.0 | \n",
" 107.0 | \n",
" 55.0 | \n",
" 0.0 | \n",
" 3.0 | \n",
" ' | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 448967 | \n",
" 448967 | \n",
" 17192157 | \n",
" 31192693 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" None | \n",
"
\n",
" \n",
" 448968 | \n",
" 448968 | \n",
" 15632852 | \n",
" 37860864 | \n",
" 98.2 | \n",
" 115.0 | \n",
" 18.0 | \n",
" 100.0 | \n",
" 124.0 | \n",
" 68.0 | \n",
" 0.0 | \n",
" 3.0 | \n",
" None | \n",
"
\n",
" \n",
" 448969 | \n",
" 448969 | \n",
" 17526143 | \n",
" 30744048 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 2.0 | \n",
" None | \n",
"
\n",
" \n",
" 448970 | \n",
" 448970 | \n",
" 18321345 | \n",
" 37024607 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" None | \n",
"
\n",
" \n",
" 448971 | \n",
" 448971 | \n",
" 11894181 | \n",
" 39631961 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 1.0 | \n",
" None | \n",
"
\n",
" \n",
"
\n",
"
448972 rows × 12 columns
\n",
"
"
],
"text/plain": [
" index subject_id stay_id temperature heartrate resprate o2sat \\\n",
"0 0 15585360 37573921 97.0 87.0 18.0 100.0 \n",
"1 1 17192424 34160628 98.6 82.0 NaN 100.0 \n",
"2 2 15248757 32172727 97.1 112.0 20.0 100.0 \n",
"3 3 16648037 38946064 98.5 59.0 18.0 99.0 \n",
"4 4 13492931 39828574 100.6 90.0 16.0 96.0 \n",
"... ... ... ... ... ... ... ... \n",
"448967 448967 17192157 31192693 NaN NaN NaN NaN \n",
"448968 448968 15632852 37860864 98.2 115.0 18.0 100.0 \n",
"448969 448969 17526143 30744048 NaN NaN NaN NaN \n",
"448970 448970 18321345 37024607 NaN NaN NaN NaN \n",
"448971 448971 11894181 39631961 NaN NaN NaN NaN \n",
"\n",
" sbp dbp pain acuity chiefcomplaint \n",
"0 150.0 71.0 10.0 3.0 \n",
"1 111.0 81.0 3.0 3.0 \n",
"2 147.0 97.0 8.0 4.0 \n",
"3 160.0 86.0 2.0 2.0 \n",
"4 107.0 55.0 0.0 3.0 ' \n",
"... ... ... ... ... ... \n",
"448967 NaN NaN NaN NaN None \n",
"448968 124.0 68.0 0.0 3.0 None \n",
"448969 NaN NaN NaN 2.0 None \n",
"448970 NaN NaN NaN NaN None \n",
"448971 NaN NaN NaN 1.0 None \n",
"\n",
"[448972 rows x 12 columns]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# D'abord, il fait établir une connection avec la base\n",
"conn = sqlite3.connect(\"./data/mimic-iv.sqlite\")\n",
"\n",
"# Ensuite, on effectue des requête à l'aide de la fonction read_sql de pandas, pour récupérer tous les éléments de la table, il suffit de faire un SELECT * FROM nom_de_la_table\n",
"triage = pd.read_sql(\"SELECT * FROM triage\", conn)\n",
"triage"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"L'avantage de travailler avec une base sqlite, c'est qu'il nous permet de manipuler et transformer des données qui ne peuvent pas toujours tenir en mémoire vive. \n",
"Quelques exemples de traitements :"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dimension de labevents_100 : (100, 17)\n",
"Nombre de lignes dans labevents : 11594360\n",
"Contenu de labevents pour le patient n°10000764 : (27, 17)\n"
]
}
],
"source": [
"# Sélectionner un nombre limité de lignes\n",
"labevents_100 = pd.read_sql(\"SELECT * FROM labevents LIMIT 100\", conn)\n",
"print(\"Dimension de labevents_100 : \", labevents_100.shape)\n",
"\n",
"# Compter le nombre de ligne\n",
"labevents_n = pd.read_sql(\"SELECT COUNT(0) FROM labevents\", conn)\n",
"print(\"Nombre de lignes dans labevents :\", labevents_n.values[0][0])\n",
"\n",
"# Filter des données\n",
"labevents_10000764 = pd.read_sql(\"SELECT * FROM labevents WHERE subject_id = 10000764\", conn)\n",
"print(\"Contenu de labevents pour le patient n°10000764 :\", labevents_10000764.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Grâce à un système d'index, les transformations de données via sqlite se font relativement rapidement. \n",
"C'est notamment le cas pour les filtre (WHERE) et les jointure.\n",
"Un exemple de jointure :"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" index | \n",
" labevent_id | \n",
" subject_id | \n",
" hadm_id | \n",
" specimen_id | \n",
" itemid | \n",
" charttime | \n",
" storetime | \n",
" value | \n",
" valuenum | \n",
" ... | \n",
" flag | \n",
" priority | \n",
" comments | \n",
" stay_id | \n",
" index | \n",
" itemid | \n",
" label | \n",
" fluid | \n",
" category | \n",
" loinc_code | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0 | \n",
" 2599 | \n",
" 10000764 | \n",
" NaN | \n",
" 76601642 | \n",
" 50971 | \n",
" 2132-10-14 20:15:00 | \n",
" 2132-10-14 21:04:00 | \n",
" 4.1 | \n",
" 4.1 | \n",
" ... | \n",
" None | \n",
" STAT | \n",
" None | \n",
" 35420907 | \n",
" 1236 | \n",
" 50971 | \n",
" Potassium | \n",
" Blood | \n",
" Chemistry | \n",
" 2823-3 | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" 2578 | \n",
" 10000764 | \n",
" NaN | \n",
" 6228584 | \n",
" 51146 | \n",
" 2132-10-14 20:15:00 | \n",
" 2132-10-14 20:33:00 | \n",
" 0.1 | \n",
" 0.1 | \n",
" ... | \n",
" None | \n",
" STAT | \n",
" None | \n",
" 35420907 | \n",
" 192 | \n",
" 51146 | \n",
" Basophils | \n",
" Blood | \n",
" Hematology | \n",
" 704-7 | \n",
"
\n",
" \n",
" 2 | \n",
" 2 | \n",
" 2579 | \n",
" 10000764 | \n",
" NaN | \n",
" 6228584 | \n",
" 51200 | \n",
" 2132-10-14 20:15:00 | \n",
" 2132-10-14 20:33:00 | \n",
" 0.3 | \n",
" 0.3 | \n",
" ... | \n",
" None | \n",
" STAT | \n",
" None | \n",
" 35420907 | \n",
" 619 | \n",
" 51200 | \n",
" Eosinophils | \n",
" Blood | \n",
" Hematology | \n",
" 711-2 | \n",
"
\n",
" \n",
" 3 | \n",
" 3 | \n",
" 2580 | \n",
" 10000764 | \n",
" NaN | \n",
" 6228584 | \n",
" 51221 | \n",
" 2132-10-14 20:15:00 | \n",
" 2132-10-14 20:33:00 | \n",
" 40.2 | \n",
" 40.2 | \n",
" ... | \n",
" None | \n",
" STAT | \n",
" None | \n",
" 35420907 | \n",
" 765 | \n",
" 51221 | \n",
" Hematocrit | \n",
" Blood | \n",
" Hematology | \n",
" 4544-3 | \n",
"
\n",
" \n",
" 4 | \n",
" 4 | \n",
" 2581 | \n",
" 10000764 | \n",
" NaN | \n",
" 6228584 | \n",
" 51222 | \n",
" 2132-10-14 20:15:00 | \n",
" 2132-10-14 20:33:00 | \n",
" 14.1 | \n",
" 14.1 | \n",
" ... | \n",
" None | \n",
" STAT | \n",
" None | \n",
" 35420907 | \n",
" 775 | \n",
" 51222 | \n",
" Hemoglobin | \n",
" Blood | \n",
" Hematology | \n",
" 718-7 | \n",
"
\n",
" \n",
" 5 | \n",
" 5 | \n",
" 2582 | \n",
" 10000764 | \n",
" NaN | \n",
" 6228584 | \n",
" 51244 | \n",
" 2132-10-14 20:15:00 | \n",
" 2132-10-14 20:33:00 | \n",
" 4.7 | \n",
" 4.7 | \n",
" ... | \n",
" abnormal | \n",
" STAT | \n",
" None | \n",
" 35420907 | \n",
" 991 | \n",
" 51244 | \n",
" Lymphocytes | \n",
" Blood | \n",
" Hematology | \n",
" 731-0 | \n",
"
\n",
" \n",
" 6 | \n",
" 6 | \n",
" 2583 | \n",
" 10000764 | \n",
" NaN | \n",
" 6228584 | \n",
" 51248 | \n",
" 2132-10-14 20:15:00 | \n",
" 2132-10-14 20:33:00 | \n",
" 32.8 | \n",
" 32.8 | \n",
" ... | \n",
" abnormal | \n",
" STAT | \n",
" None | \n",
" 35420907 | \n",
" 1016 | \n",
" 51248 | \n",
" MCH | \n",
" Blood | \n",
" Hematology | \n",
" 785-6 | \n",
"
\n",
" \n",
" 7 | \n",
" 7 | \n",
" 2584 | \n",
" 10000764 | \n",
" NaN | \n",
" 6228584 | \n",
" 51249 | \n",
" 2132-10-14 20:15:00 | \n",
" 2132-10-14 20:33:00 | \n",
" 35.1 | \n",
" 35.1 | \n",
" ... | \n",
" abnormal | \n",
" STAT | \n",
" None | \n",
" 35420907 | \n",
" 1017 | \n",
" 51249 | \n",
" MCHC | \n",
" Blood | \n",
" Hematology | \n",
" 786-4 | \n",
"
\n",
" \n",
" 8 | \n",
" 8 | \n",
" 2585 | \n",
" 10000764 | \n",
" NaN | \n",
" 6228584 | \n",
" 51250 | \n",
" 2132-10-14 20:15:00 | \n",
" 2132-10-14 20:33:00 | \n",
" 93 | \n",
" 93.0 | \n",
" ... | \n",
" None | \n",
" STAT | \n",
" None | \n",
" 35420907 | \n",
" 1019 | \n",
" 51250 | \n",
" MCV | \n",
" Blood | \n",
" Hematology | \n",
" 787-2 | \n",
"
\n",
" \n",
" 9 | \n",
" 9 | \n",
" 2586 | \n",
" 10000764 | \n",
" NaN | \n",
" 6228584 | \n",
" 51254 | \n",
" 2132-10-14 20:15:00 | \n",
" 2132-10-14 20:33:00 | \n",
" 3.8 | \n",
" 3.8 | \n",
" ... | \n",
" None | \n",
" STAT | \n",
" None | \n",
" 35420907 | \n",
" 1051 | \n",
" 51254 | \n",
" Monocytes | \n",
" Blood | \n",
" Hematology | \n",
" 742-7 | \n",
"
\n",
" \n",
"
\n",
"
10 rows × 23 columns
\n",
"
"
],
"text/plain": [
" index labevent_id subject_id hadm_id specimen_id itemid \\\n",
"0 0 2599 10000764 NaN 76601642 50971 \n",
"1 1 2578 10000764 NaN 6228584 51146 \n",
"2 2 2579 10000764 NaN 6228584 51200 \n",
"3 3 2580 10000764 NaN 6228584 51221 \n",
"4 4 2581 10000764 NaN 6228584 51222 \n",
"5 5 2582 10000764 NaN 6228584 51244 \n",
"6 6 2583 10000764 NaN 6228584 51248 \n",
"7 7 2584 10000764 NaN 6228584 51249 \n",
"8 8 2585 10000764 NaN 6228584 51250 \n",
"9 9 2586 10000764 NaN 6228584 51254 \n",
"\n",
" charttime storetime value valuenum ... flag \\\n",
"0 2132-10-14 20:15:00 2132-10-14 21:04:00 4.1 4.1 ... None \n",
"1 2132-10-14 20:15:00 2132-10-14 20:33:00 0.1 0.1 ... None \n",
"2 2132-10-14 20:15:00 2132-10-14 20:33:00 0.3 0.3 ... None \n",
"3 2132-10-14 20:15:00 2132-10-14 20:33:00 40.2 40.2 ... None \n",
"4 2132-10-14 20:15:00 2132-10-14 20:33:00 14.1 14.1 ... None \n",
"5 2132-10-14 20:15:00 2132-10-14 20:33:00 4.7 4.7 ... abnormal \n",
"6 2132-10-14 20:15:00 2132-10-14 20:33:00 32.8 32.8 ... abnormal \n",
"7 2132-10-14 20:15:00 2132-10-14 20:33:00 35.1 35.1 ... abnormal \n",
"8 2132-10-14 20:15:00 2132-10-14 20:33:00 93 93.0 ... None \n",
"9 2132-10-14 20:15:00 2132-10-14 20:33:00 3.8 3.8 ... None \n",
"\n",
" priority comments stay_id index itemid label fluid category \\\n",
"0 STAT None 35420907 1236 50971 Potassium Blood Chemistry \n",
"1 STAT None 35420907 192 51146 Basophils Blood Hematology \n",
"2 STAT None 35420907 619 51200 Eosinophils Blood Hematology \n",
"3 STAT None 35420907 765 51221 Hematocrit Blood Hematology \n",
"4 STAT None 35420907 775 51222 Hemoglobin Blood Hematology \n",
"5 STAT None 35420907 991 51244 Lymphocytes Blood Hematology \n",
"6 STAT None 35420907 1016 51248 MCH Blood Hematology \n",
"7 STAT None 35420907 1017 51249 MCHC Blood Hematology \n",
"8 STAT None 35420907 1019 51250 MCV Blood Hematology \n",
"9 STAT None 35420907 1051 51254 Monocytes Blood Hematology \n",
"\n",
" loinc_code \n",
"0 2823-3 \n",
"1 704-7 \n",
"2 711-2 \n",
"3 4544-3 \n",
"4 718-7 \n",
"5 731-0 \n",
"6 785-6 \n",
"7 786-4 \n",
"8 787-2 \n",
"9 742-7 \n",
"\n",
"[10 rows x 23 columns]"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# On récupère le libellé des examens\n",
"labevents_with_item = pd.read_sql(\"\"\"\n",
" SELECT * \n",
" FROM labevents as le\n",
" LEFT JOIN d_labitems as li\n",
" ON le.itemid = li.itemid\n",
" LIMIT 1000\n",
"\"\"\", conn)\n",
"# Tous les types de jointures : LEFT, RIGHT, INNER, OUTER sont possible\n",
"\n",
"# L'object retourné est toujours un pandas dataframe sur lequel on peux faire les transformations habituelles\n",
"labevents_with_item.head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tables disponibles"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"L'ensemble des tables disponible est le suivant :"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" nom_de_la_table | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" d_labitems | \n",
"
\n",
" \n",
" 1 | \n",
" diagnosis | \n",
"
\n",
" \n",
" 2 | \n",
" edstays | \n",
"
\n",
" \n",
" 3 | \n",
" labevents | \n",
"
\n",
" \n",
" 4 | \n",
" medrecon | \n",
"
\n",
" \n",
" 5 | \n",
" microbiologyevents | \n",
"
\n",
" \n",
" 6 | \n",
" patients | \n",
"
\n",
" \n",
" 7 | \n",
" triage | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" nom_de_la_table\n",
"0 d_labitems\n",
"1 diagnosis\n",
"2 edstays\n",
"3 labevents\n",
"4 medrecon\n",
"5 microbiologyevents\n",
"6 patients\n",
"7 triage"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.read_sql(\"\"\"\n",
" SELECT tbl_name AS nom_de_la_table\n",
" FROM sqlite_master\n",
" GROUP BY tbl_name\n",
"\"\"\", conn)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"L'ensemble des tables correspondent aux fichiers correspondant au sein de MIMIC-IV-ED. \n",
"Il y a quelques exceptions qui proviennent de MIMIC-IV :\n",
"- labevents\n",
"- microbiologyevents\n",
"- d_labitems"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### d_labitems"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"d_labitems contient le libellé des examens de biologie. \n",
"Chaque libellé a un identifiant unique `itemid`. Il est utile à l'identification des examens dans la table labevents.\n",
"\n",
"Plus de détail ici : https://mimic.mit.edu/docs/iv/modules/hosp/d_labitems/"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" index | \n",
" itemid | \n",
" label | \n",
" fluid | \n",
" category | \n",
" loinc_code | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0 | \n",
" 51905 | \n",
" | \n",
" Other Body Fluid | \n",
" Chemistry | \n",
" None | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" 51532 | \n",
" 11-Deoxycorticosterone | \n",
" Blood | \n",
" Chemistry | \n",
" None | \n",
"
\n",
" \n",
" 2 | \n",
" 2 | \n",
" 51957 | \n",
" 17-Hydroxycorticosteroids | \n",
" Urine | \n",
" Chemistry | \n",
" None | \n",
"
\n",
" \n",
" 3 | \n",
" 3 | \n",
" 51958 | \n",
" 17-Ketosteroids, Urine | \n",
" Urine | \n",
" Chemistry | \n",
" None | \n",
"
\n",
" \n",
" 4 | \n",
" 4 | \n",
" 52068 | \n",
" 24 Hr | \n",
" Blood | \n",
" Hematology | \n",
" None | \n",
"
\n",
" \n",
" 5 | \n",
" 5 | \n",
" 51066 | \n",
" 24 hr Calcium | \n",
" Urine | \n",
" Chemistry | \n",
" None | \n",
"
\n",
" \n",
" 6 | \n",
" 6 | \n",
" 51067 | \n",
" 24 hr Creatinine | \n",
" Urine | \n",
" Chemistry | \n",
" None | \n",
"
\n",
" \n",
" 7 | \n",
" 7 | \n",
" 51068 | \n",
" 24 hr Protein | \n",
" Urine | \n",
" Chemistry | \n",
" None | \n",
"
\n",
" \n",
" 8 | \n",
" 8 | \n",
" 50853 | \n",
" 25-OH Vitamin D | \n",
" Blood | \n",
" Chemistry | \n",
" None | \n",
"
\n",
" \n",
" 9 | \n",
" 9 | \n",
" 51533 | \n",
" 3t | \n",
" Blood | \n",
" Chemistry | \n",
" None | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" index itemid label fluid category \\\n",
"0 0 51905 Other Body Fluid Chemistry \n",
"1 1 51532 11-Deoxycorticosterone Blood Chemistry \n",
"2 2 51957 17-Hydroxycorticosteroids Urine Chemistry \n",
"3 3 51958 17-Ketosteroids, Urine Urine Chemistry \n",
"4 4 52068 24 Hr Blood Hematology \n",
"5 5 51066 24 hr Calcium Urine Chemistry \n",
"6 6 51067 24 hr Creatinine Urine Chemistry \n",
"7 7 51068 24 hr Protein Urine Chemistry \n",
"8 8 50853 25-OH Vitamin D Blood Chemistry \n",
"9 9 51533 3t Blood Chemistry \n",
"\n",
" loinc_code \n",
"0 None \n",
"1 None \n",
"2 None \n",
"3 None \n",
"4 None \n",
"5 None \n",
"6 None \n",
"7 None \n",
"8 None \n",
"9 None "
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.read_sql(\"SELECT * FROM d_labitems LIMIT 10\", conn)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### labevents"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"labevents contient les examents de biologie ainsi que leurs résultat.\n",
"Chaque examen a deux dates associés :\n",
"- charttime qui est la date de réception de l'examen, on peux la considérer comme une approximation de la date de prélèvement\n",
"- storetime date de mise à disposition du résultat de l'examen, selon l'examen celle-ci est plus où moins espacé de la date de prélèvement\n",
"\n",
"Chaque ligne correspond au résultat d'un examen (par exemple, dosage du potassium, identifié par `itemid`) et est identifié par un identifiant unique `labevent_id`. Plusieurs examens peuvent être effectué sur le même prélèvement sanguin. \n",
"Chaque prélèvement est identifié par un identifiant unique `specimen_id`. \n",
"Enfin, `subject_id` identifie le patient, `hadm_id` l'identifiant de séjour lorsque le patient a été hospitalisé au décours de son passage aux urgences et `stay_id` l'identifiant du passage aux urgences.\n",
"\n",
"Plus de détail ici : https://mimic.mit.edu/docs/iv/modules/hosp/labevents/"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" index | \n",
" labevent_id | \n",
" subject_id | \n",
" hadm_id | \n",
" specimen_id | \n",
" itemid | \n",
" charttime | \n",
" storetime | \n",
" value | \n",
" valuenum | \n",
" valueuom | \n",
" ref_range_lower | \n",
" ref_range_upper | \n",
" flag | \n",
" priority | \n",
" comments | \n",
" stay_id | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0 | \n",
" 2599 | \n",
" 10000764 | \n",
" None | \n",
" 76601642 | \n",
" 50971 | \n",
" 2132-10-14 20:15:00 | \n",
" 2132-10-14 21:04:00 | \n",
" 4.1 | \n",
" 4.1 | \n",
" mEq/L | \n",
" 3.3 | \n",
" 5.1 | \n",
" None | \n",
" STAT | \n",
" None | \n",
" 35420907 | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" 2578 | \n",
" 10000764 | \n",
" None | \n",
" 6228584 | \n",
" 51146 | \n",
" 2132-10-14 20:15:00 | \n",
" 2132-10-14 20:33:00 | \n",
" 0.1 | \n",
" 0.1 | \n",
" % | \n",
" 0.0 | \n",
" 2.0 | \n",
" None | \n",
" STAT | \n",
" None | \n",
" 35420907 | \n",
"
\n",
" \n",
" 2 | \n",
" 2 | \n",
" 2579 | \n",
" 10000764 | \n",
" None | \n",
" 6228584 | \n",
" 51200 | \n",
" 2132-10-14 20:15:00 | \n",
" 2132-10-14 20:33:00 | \n",
" 0.3 | \n",
" 0.3 | \n",
" % | \n",
" 0.0 | \n",
" 4.0 | \n",
" None | \n",
" STAT | \n",
" None | \n",
" 35420907 | \n",
"
\n",
" \n",
" 3 | \n",
" 3 | \n",
" 2580 | \n",
" 10000764 | \n",
" None | \n",
" 6228584 | \n",
" 51221 | \n",
" 2132-10-14 20:15:00 | \n",
" 2132-10-14 20:33:00 | \n",
" 40.2 | \n",
" 40.2 | \n",
" % | \n",
" 40.0 | \n",
" 52.0 | \n",
" None | \n",
" STAT | \n",
" None | \n",
" 35420907 | \n",
"
\n",
" \n",
" 4 | \n",
" 4 | \n",
" 2581 | \n",
" 10000764 | \n",
" None | \n",
" 6228584 | \n",
" 51222 | \n",
" 2132-10-14 20:15:00 | \n",
" 2132-10-14 20:33:00 | \n",
" 14.1 | \n",
" 14.1 | \n",
" g/dL | \n",
" 14.0 | \n",
" 18.0 | \n",
" None | \n",
" STAT | \n",
" None | \n",
" 35420907 | \n",
"
\n",
" \n",
" 5 | \n",
" 5 | \n",
" 2582 | \n",
" 10000764 | \n",
" None | \n",
" 6228584 | \n",
" 51244 | \n",
" 2132-10-14 20:15:00 | \n",
" 2132-10-14 20:33:00 | \n",
" 4.7 | \n",
" 4.7 | \n",
" % | \n",
" 18.0 | \n",
" 42.0 | \n",
" abnormal | \n",
" STAT | \n",
" None | \n",
" 35420907 | \n",
"
\n",
" \n",
" 6 | \n",
" 6 | \n",
" 2583 | \n",
" 10000764 | \n",
" None | \n",
" 6228584 | \n",
" 51248 | \n",
" 2132-10-14 20:15:00 | \n",
" 2132-10-14 20:33:00 | \n",
" 32.8 | \n",
" 32.8 | \n",
" pg | \n",
" 27.0 | \n",
" 32.0 | \n",
" abnormal | \n",
" STAT | \n",
" None | \n",
" 35420907 | \n",
"
\n",
" \n",
" 7 | \n",
" 7 | \n",
" 2584 | \n",
" 10000764 | \n",
" None | \n",
" 6228584 | \n",
" 51249 | \n",
" 2132-10-14 20:15:00 | \n",
" 2132-10-14 20:33:00 | \n",
" 35.1 | \n",
" 35.1 | \n",
" % | \n",
" 31.0 | \n",
" 35.0 | \n",
" abnormal | \n",
" STAT | \n",
" None | \n",
" 35420907 | \n",
"
\n",
" \n",
" 8 | \n",
" 8 | \n",
" 2585 | \n",
" 10000764 | \n",
" None | \n",
" 6228584 | \n",
" 51250 | \n",
" 2132-10-14 20:15:00 | \n",
" 2132-10-14 20:33:00 | \n",
" 93 | \n",
" 93.0 | \n",
" fL | \n",
" 82.0 | \n",
" 98.0 | \n",
" None | \n",
" STAT | \n",
" None | \n",
" 35420907 | \n",
"
\n",
" \n",
" 9 | \n",
" 9 | \n",
" 2586 | \n",
" 10000764 | \n",
" None | \n",
" 6228584 | \n",
" 51254 | \n",
" 2132-10-14 20:15:00 | \n",
" 2132-10-14 20:33:00 | \n",
" 3.8 | \n",
" 3.8 | \n",
" % | \n",
" 2.0 | \n",
" 11.0 | \n",
" None | \n",
" STAT | \n",
" None | \n",
" 35420907 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" index labevent_id subject_id hadm_id specimen_id itemid \\\n",
"0 0 2599 10000764 None 76601642 50971 \n",
"1 1 2578 10000764 None 6228584 51146 \n",
"2 2 2579 10000764 None 6228584 51200 \n",
"3 3 2580 10000764 None 6228584 51221 \n",
"4 4 2581 10000764 None 6228584 51222 \n",
"5 5 2582 10000764 None 6228584 51244 \n",
"6 6 2583 10000764 None 6228584 51248 \n",
"7 7 2584 10000764 None 6228584 51249 \n",
"8 8 2585 10000764 None 6228584 51250 \n",
"9 9 2586 10000764 None 6228584 51254 \n",
"\n",
" charttime storetime value valuenum valueuom \\\n",
"0 2132-10-14 20:15:00 2132-10-14 21:04:00 4.1 4.1 mEq/L \n",
"1 2132-10-14 20:15:00 2132-10-14 20:33:00 0.1 0.1 % \n",
"2 2132-10-14 20:15:00 2132-10-14 20:33:00 0.3 0.3 % \n",
"3 2132-10-14 20:15:00 2132-10-14 20:33:00 40.2 40.2 % \n",
"4 2132-10-14 20:15:00 2132-10-14 20:33:00 14.1 14.1 g/dL \n",
"5 2132-10-14 20:15:00 2132-10-14 20:33:00 4.7 4.7 % \n",
"6 2132-10-14 20:15:00 2132-10-14 20:33:00 32.8 32.8 pg \n",
"7 2132-10-14 20:15:00 2132-10-14 20:33:00 35.1 35.1 % \n",
"8 2132-10-14 20:15:00 2132-10-14 20:33:00 93 93.0 fL \n",
"9 2132-10-14 20:15:00 2132-10-14 20:33:00 3.8 3.8 % \n",
"\n",
" ref_range_lower ref_range_upper flag priority comments stay_id \n",
"0 3.3 5.1 None STAT None 35420907 \n",
"1 0.0 2.0 None STAT None 35420907 \n",
"2 0.0 4.0 None STAT None 35420907 \n",
"3 40.0 52.0 None STAT None 35420907 \n",
"4 14.0 18.0 None STAT None 35420907 \n",
"5 18.0 42.0 abnormal STAT None 35420907 \n",
"6 27.0 32.0 abnormal STAT None 35420907 \n",
"7 31.0 35.0 abnormal STAT None 35420907 \n",
"8 82.0 98.0 None STAT None 35420907 \n",
"9 2.0 11.0 None STAT None 35420907 "
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.read_sql(\"SELECT * FROM labevents LIMIT 10\", conn)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### microbiologyevents"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Parmis les examens de biologie, certains revêtent d'un statut particulier, il s'agit des examens de microbiologie, c'est à dire les examens recherchant la présence de bactérie dans un prélèvement biologique (sang, urine, selle, liquide céphalo-rachidien ...).\n",
"\n",
"Ils sont enregistrés dans une table dédié : microbiologyevents \n",
"Le champs \"spec_type_desc\" contient le site de prélèvement, souvent le sang (BLOOD CULTURE) où les urines (URINES) (cf plus bas).\n",
"\n",
"L'interprétation de ces examens faisant l'objet de spécificté, on retrouve de nombreuses colonnes spécifiant : la valeur quantitative de l'analyse, son interprétation, le nimbre de résistance de la bactérie identifié ... En première intention, il semble qu'il s'agit de champs que nous n'aurons pas à exploiter.\n",
"\n",
"Plus de détail ici : https://mimic.mit.edu/docs/iv/modules/hosp/microbiologyevents/"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" index | \n",
" microevent_id | \n",
" subject_id | \n",
" hadm_id | \n",
" micro_specimen_id | \n",
" chartdate | \n",
" charttime | \n",
" spec_itemid | \n",
" spec_type_desc | \n",
" test_seq | \n",
" ... | \n",
" isolate_num | \n",
" quantity | \n",
" ab_itemid | \n",
" ab_name | \n",
" dilution_text | \n",
" dilution_comparison | \n",
" dilution_value | \n",
" interpretation | \n",
" comments | \n",
" stay_id | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0 | \n",
" 1562 | \n",
" 10004235 | \n",
" None | \n",
" 139903 | \n",
" 2196-02-24 00:00:00 | \n",
" 2196-02-24 12:40:00 | \n",
" 70012 | \n",
" BLOOD CULTURE | \n",
" 1 | \n",
" ... | \n",
" 1.0 | \n",
" None | \n",
" 90013.0 | \n",
" TOBRAMYCIN | \n",
" <=1 | \n",
" <= | \n",
" 1.00 | \n",
" S | \n",
" None | \n",
" 38926302 | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" 1558 | \n",
" 10004235 | \n",
" None | \n",
" 139903 | \n",
" 2196-02-24 00:00:00 | \n",
" 2196-02-24 12:40:00 | \n",
" 70012 | \n",
" BLOOD CULTURE | \n",
" 1 | \n",
" ... | \n",
" 1.0 | \n",
" None | \n",
" 90004.0 | \n",
" AMPICILLIN | \n",
" 8 | \n",
" = | \n",
" 8.00 | \n",
" S | \n",
" None | \n",
" 38926302 | \n",
"
\n",
" \n",
" 2 | \n",
" 2 | \n",
" 1559 | \n",
" 10004235 | \n",
" None | \n",
" 139903 | \n",
" 2196-02-24 00:00:00 | \n",
" 2196-02-24 12:40:00 | \n",
" 70012 | \n",
" BLOOD CULTURE | \n",
" 1 | \n",
" ... | \n",
" 1.0 | \n",
" None | \n",
" 90005.0 | \n",
" CEFAZOLIN | \n",
" <=4 | \n",
" <= | \n",
" 4.00 | \n",
" S | \n",
" None | \n",
" 38926302 | \n",
"
\n",
" \n",
" 3 | \n",
" 3 | \n",
" 1560 | \n",
" 10004235 | \n",
" None | \n",
" 139903 | \n",
" 2196-02-24 00:00:00 | \n",
" 2196-02-24 12:40:00 | \n",
" 70012 | \n",
" BLOOD CULTURE | \n",
" 1 | \n",
" ... | \n",
" 1.0 | \n",
" None | \n",
" 90008.0 | \n",
" TRIMETHOPRIM/SULFA | \n",
" <=1 | \n",
" <= | \n",
" 1.00 | \n",
" S | \n",
" None | \n",
" 38926302 | \n",
"
\n",
" \n",
" 4 | \n",
" 4 | \n",
" 1561 | \n",
" 10004235 | \n",
" None | \n",
" 139903 | \n",
" 2196-02-24 00:00:00 | \n",
" 2196-02-24 12:40:00 | \n",
" 70012 | \n",
" BLOOD CULTURE | \n",
" 1 | \n",
" ... | \n",
" 1.0 | \n",
" None | \n",
" 90012.0 | \n",
" GENTAMICIN | \n",
" <=1 | \n",
" <= | \n",
" 1.00 | \n",
" S | \n",
" None | \n",
" 38926302 | \n",
"
\n",
" \n",
" 5 | \n",
" 5 | \n",
" 1563 | \n",
" 10004235 | \n",
" None | \n",
" 139903 | \n",
" 2196-02-24 00:00:00 | \n",
" 2196-02-24 12:40:00 | \n",
" 70012 | \n",
" BLOOD CULTURE | \n",
" 1 | \n",
" ... | \n",
" 1.0 | \n",
" None | \n",
" 90017.0 | \n",
" CEFTAZIDIME | \n",
" <=1 | \n",
" <= | \n",
" 1.00 | \n",
" S | \n",
" None | \n",
" 38926302 | \n",
"
\n",
" \n",
" 6 | \n",
" 6 | \n",
" 1564 | \n",
" 10004235 | \n",
" None | \n",
" 139903 | \n",
" 2196-02-24 00:00:00 | \n",
" 2196-02-24 12:40:00 | \n",
" 70012 | \n",
" BLOOD CULTURE | \n",
" 1 | \n",
" ... | \n",
" 1.0 | \n",
" None | \n",
" 90018.0 | \n",
" CEFTRIAXONE | \n",
" <=1 | \n",
" <= | \n",
" 1.00 | \n",
" S | \n",
" None | \n",
" 38926302 | \n",
"
\n",
" \n",
" 7 | \n",
" 7 | \n",
" 1565 | \n",
" 10004235 | \n",
" None | \n",
" 139903 | \n",
" 2196-02-24 00:00:00 | \n",
" 2196-02-24 12:40:00 | \n",
" 70012 | \n",
" BLOOD CULTURE | \n",
" 1 | \n",
" ... | \n",
" 1.0 | \n",
" None | \n",
" 90019.0 | \n",
" CIPROFLOXACIN | \n",
" <=0.25 | \n",
" <= | \n",
" 0.25 | \n",
" S | \n",
" None | \n",
" 38926302 | \n",
"
\n",
" \n",
" 8 | \n",
" 8 | \n",
" 1566 | \n",
" 10004235 | \n",
" None | \n",
" 139903 | \n",
" 2196-02-24 00:00:00 | \n",
" 2196-02-24 12:40:00 | \n",
" 70012 | \n",
" BLOOD CULTURE | \n",
" 1 | \n",
" ... | \n",
" 1.0 | \n",
" None | \n",
" 90022.0 | \n",
" AMPICILLIN/SULBACTAM | \n",
" <=2 | \n",
" <= | \n",
" 2.00 | \n",
" S | \n",
" None | \n",
" 38926302 | \n",
"
\n",
" \n",
" 9 | \n",
" 9 | \n",
" 1567 | \n",
" 10004235 | \n",
" None | \n",
" 139903 | \n",
" 2196-02-24 00:00:00 | \n",
" 2196-02-24 12:40:00 | \n",
" 70012 | \n",
" BLOOD CULTURE | \n",
" 1 | \n",
" ... | \n",
" 1.0 | \n",
" None | \n",
" 90026.0 | \n",
" PIPERACILLIN/TAZO | \n",
" <=4 | \n",
" <= | \n",
" 4.00 | \n",
" S | \n",
" None | \n",
" 38926302 | \n",
"
\n",
" \n",
"
\n",
"
10 rows × 26 columns
\n",
"
"
],
"text/plain": [
" index microevent_id subject_id hadm_id micro_specimen_id \\\n",
"0 0 1562 10004235 None 139903 \n",
"1 1 1558 10004235 None 139903 \n",
"2 2 1559 10004235 None 139903 \n",
"3 3 1560 10004235 None 139903 \n",
"4 4 1561 10004235 None 139903 \n",
"5 5 1563 10004235 None 139903 \n",
"6 6 1564 10004235 None 139903 \n",
"7 7 1565 10004235 None 139903 \n",
"8 8 1566 10004235 None 139903 \n",
"9 9 1567 10004235 None 139903 \n",
"\n",
" chartdate charttime spec_itemid spec_type_desc \\\n",
"0 2196-02-24 00:00:00 2196-02-24 12:40:00 70012 BLOOD CULTURE \n",
"1 2196-02-24 00:00:00 2196-02-24 12:40:00 70012 BLOOD CULTURE \n",
"2 2196-02-24 00:00:00 2196-02-24 12:40:00 70012 BLOOD CULTURE \n",
"3 2196-02-24 00:00:00 2196-02-24 12:40:00 70012 BLOOD CULTURE \n",
"4 2196-02-24 00:00:00 2196-02-24 12:40:00 70012 BLOOD CULTURE \n",
"5 2196-02-24 00:00:00 2196-02-24 12:40:00 70012 BLOOD CULTURE \n",
"6 2196-02-24 00:00:00 2196-02-24 12:40:00 70012 BLOOD CULTURE \n",
"7 2196-02-24 00:00:00 2196-02-24 12:40:00 70012 BLOOD CULTURE \n",
"8 2196-02-24 00:00:00 2196-02-24 12:40:00 70012 BLOOD CULTURE \n",
"9 2196-02-24 00:00:00 2196-02-24 12:40:00 70012 BLOOD CULTURE \n",
"\n",
" test_seq ... isolate_num quantity ab_itemid ab_name \\\n",
"0 1 ... 1.0 None 90013.0 TOBRAMYCIN \n",
"1 1 ... 1.0 None 90004.0 AMPICILLIN \n",
"2 1 ... 1.0 None 90005.0 CEFAZOLIN \n",
"3 1 ... 1.0 None 90008.0 TRIMETHOPRIM/SULFA \n",
"4 1 ... 1.0 None 90012.0 GENTAMICIN \n",
"5 1 ... 1.0 None 90017.0 CEFTAZIDIME \n",
"6 1 ... 1.0 None 90018.0 CEFTRIAXONE \n",
"7 1 ... 1.0 None 90019.0 CIPROFLOXACIN \n",
"8 1 ... 1.0 None 90022.0 AMPICILLIN/SULBACTAM \n",
"9 1 ... 1.0 None 90026.0 PIPERACILLIN/TAZO \n",
"\n",
" dilution_text dilution_comparison dilution_value interpretation comments \\\n",
"0 <=1 <= 1.00 S None \n",
"1 8 = 8.00 S None \n",
"2 <=4 <= 4.00 S None \n",
"3 <=1 <= 1.00 S None \n",
"4 <=1 <= 1.00 S None \n",
"5 <=1 <= 1.00 S None \n",
"6 <=1 <= 1.00 S None \n",
"7 <=0.25 <= 0.25 S None \n",
"8 <=2 <= 2.00 S None \n",
"9 <=4 <= 4.00 S None \n",
"\n",
" stay_id \n",
"0 38926302 \n",
"1 38926302 \n",
"2 38926302 \n",
"3 38926302 \n",
"4 38926302 \n",
"5 38926302 \n",
"6 38926302 \n",
"7 38926302 \n",
"8 38926302 \n",
"9 38926302 \n",
"\n",
"[10 rows x 26 columns]"
]
},
"execution_count": 69,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.read_sql(\"SELECT * FROM microbiologyevents LIMIT 10\", conn)"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" spec_type_desc | \n",
" total | \n",
"
\n",
" \n",
" \n",
" \n",
" 53 | \n",
" URINE | \n",
" 186509 | \n",
"
\n",
" \n",
" 6 | \n",
" BLOOD CULTURE | \n",
" 177616 | \n",
"
\n",
" \n",
" 47 | \n",
" SWAB | \n",
" 15476 | \n",
"
\n",
" \n",
" 35 | \n",
" PERITONEAL FLUID | \n",
" 5923 | \n",
"
\n",
" \n",
" 45 | \n",
" STOOL | \n",
" 5426 | \n",
"
\n",
" \n",
" 18 | \n",
" CSF;SPINAL FLUID | \n",
" 3847 | \n",
"
\n",
" \n",
" 1 | \n",
" ABSCESS | \n",
" 3011 | \n",
"
\n",
" \n",
" 32 | \n",
" JOINT FLUID | \n",
" 2341 | \n",
"
\n",
" \n",
" 52 | \n",
" TISSUE | \n",
" 1825 | \n",
"
\n",
" \n",
" 44 | \n",
" SPUTUM | \n",
" 1587 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" spec_type_desc total\n",
"53 URINE 186509\n",
"6 BLOOD CULTURE 177616\n",
"47 SWAB 15476\n",
"35 PERITONEAL FLUID 5923\n",
"45 STOOL 5426\n",
"18 CSF;SPINAL FLUID 3847\n",
"1 ABSCESS 3011\n",
"32 JOINT FLUID 2341\n",
"52 TISSUE 1825\n",
"44 SPUTUM 1587"
]
},
"execution_count": 70,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.read_sql(\"SELECT spec_type_desc, COUNT(0) total FROM microbiologyevents GROUP BY spec_type_desc\", conn).sort_values(\"total\", ascending=False).head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### edstays"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"La table edstays contient la liste de tout les passages aux urgences. \n",
"Sont enregistré : l'identifiant du patient, l'identifiant du séjour si le patient est hospitalisé en aval de son passage aux urgences, l'identifiant du passage aux urgences, la date et heure d'entrée et sortie.\n",
"\n",
"Plus de détail ici : https://mimic.mit.edu/docs/iv/modules/ed/edstays/"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" index | \n",
" subject_id | \n",
" hadm_id | \n",
" stay_id | \n",
" intime | \n",
" outtime | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0 | \n",
" 10000115 | \n",
" NaN | \n",
" 38081480 | \n",
" 2154-12-10 02:04:00 | \n",
" 2154-12-10 02:16:00 | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" 10000115 | \n",
" NaN | \n",
" 30295111 | \n",
" 2154-12-17 16:37:00 | \n",
" 2154-12-17 17:38:00 | \n",
"
\n",
" \n",
" 2 | \n",
" 2 | \n",
" 10000473 | \n",
" NaN | \n",
" 33267868 | \n",
" 2138-03-15 20:07:00 | \n",
" 2138-03-15 20:26:00 | \n",
"
\n",
" \n",
" 3 | \n",
" 3 | \n",
" 10000764 | \n",
" 27897940.0 | \n",
" 35420907 | \n",
" 2132-10-14 19:31:00 | \n",
" 2132-10-14 23:32:59 | \n",
"
\n",
" \n",
" 4 | \n",
" 4 | \n",
" 10001038 | \n",
" NaN | \n",
" 34301067 | \n",
" 2149-08-07 08:51:00 | \n",
" 2149-08-07 08:55:00 | \n",
"
\n",
" \n",
" 5 | \n",
" 5 | \n",
" 10001757 | \n",
" NaN | \n",
" 30596003 | \n",
" 2170-10-15 11:06:00 | \n",
" 2170-10-15 11:26:00 | \n",
"
\n",
" \n",
" 6 | \n",
" 6 | \n",
" 10002563 | \n",
" NaN | \n",
" 36308013 | \n",
" 2192-11-08 14:00:00 | \n",
" 2192-11-08 14:23:00 | \n",
"
\n",
" \n",
" 7 | \n",
" 7 | \n",
" 10003299 | \n",
" 28891311.0 | \n",
" 33780565 | \n",
" 2178-12-10 17:04:00 | \n",
" 2178-12-10 21:36:00 | \n",
"
\n",
" \n",
" 8 | \n",
" 8 | \n",
" 10003299 | \n",
" 21404960.0 | \n",
" 39680380 | \n",
" 2179-06-30 09:18:00 | \n",
" 2179-06-30 16:35:45 | \n",
"
\n",
" \n",
" 9 | \n",
" 9 | \n",
" 10003299 | \n",
" 29323205.0 | \n",
" 32908139 | \n",
" 2181-10-22 11:46:00 | \n",
" 2181-10-22 19:09:15 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" index subject_id hadm_id stay_id intime \\\n",
"0 0 10000115 NaN 38081480 2154-12-10 02:04:00 \n",
"1 1 10000115 NaN 30295111 2154-12-17 16:37:00 \n",
"2 2 10000473 NaN 33267868 2138-03-15 20:07:00 \n",
"3 3 10000764 27897940.0 35420907 2132-10-14 19:31:00 \n",
"4 4 10001038 NaN 34301067 2149-08-07 08:51:00 \n",
"5 5 10001757 NaN 30596003 2170-10-15 11:06:00 \n",
"6 6 10002563 NaN 36308013 2192-11-08 14:00:00 \n",
"7 7 10003299 28891311.0 33780565 2178-12-10 17:04:00 \n",
"8 8 10003299 21404960.0 39680380 2179-06-30 09:18:00 \n",
"9 9 10003299 29323205.0 32908139 2181-10-22 11:46:00 \n",
"\n",
" outtime \n",
"0 2154-12-10 02:16:00 \n",
"1 2154-12-17 17:38:00 \n",
"2 2138-03-15 20:26:00 \n",
"3 2132-10-14 23:32:59 \n",
"4 2149-08-07 08:55:00 \n",
"5 2170-10-15 11:26:00 \n",
"6 2192-11-08 14:23:00 \n",
"7 2178-12-10 21:36:00 \n",
"8 2179-06-30 16:35:45 \n",
"9 2181-10-22 19:09:15 "
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.read_sql(\"SELECT * FROM edstays LIMIT 10\", conn)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### patients"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"La table patient contient la liste des patients ainsi que les information relative à leurs age et genre. \n",
"La base MIMIC-IV étant pseudonymisé, toutes les dates ont été falsifié, la table patient nous donne une approximation de la véritable date des enregistrement dans le champs anchor_year_group.\n",
"\n",
"Plus d'informations ici : https://mimic.mit.edu/docs/iv/modules/core/patients/"
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" index | \n",
" subject_id | \n",
" gender | \n",
" anchor_age | \n",
" anchor_year | \n",
" anchor_year_group | \n",
" dod | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0 | \n",
" 10021917 | \n",
" M | \n",
" 54 | \n",
" 2147 | \n",
" 2017 - 2019 | \n",
" None | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" 10033879 | \n",
" F | \n",
" 28 | \n",
" 2173 | \n",
" 2011 - 2013 | \n",
" None | \n",
"
\n",
" \n",
" 2 | \n",
" 2 | \n",
" 10036909 | \n",
" M | \n",
" 50 | \n",
" 2167 | \n",
" 2011 - 2013 | \n",
" None | \n",
"
\n",
" \n",
" 3 | \n",
" 3 | \n",
" 10050355 | \n",
" F | \n",
" 55 | \n",
" 2168 | \n",
" 2014 - 2016 | \n",
" None | \n",
"
\n",
" \n",
" 4 | \n",
" 4 | \n",
" 10050389 | \n",
" M | \n",
" 43 | \n",
" 2133 | \n",
" 2014 - 2016 | \n",
" None | \n",
"
\n",
" \n",
" 5 | \n",
" 5 | \n",
" 10050411 | \n",
" F | \n",
" 52 | \n",
" 2150 | \n",
" 2017 - 2019 | \n",
" None | \n",
"
\n",
" \n",
" 6 | \n",
" 6 | \n",
" 10068290 | \n",
" M | \n",
" 29 | \n",
" 2151 | \n",
" 2014 - 2016 | \n",
" None | \n",
"
\n",
" \n",
" 7 | \n",
" 7 | \n",
" 10127166 | \n",
" F | \n",
" 51 | \n",
" 2149 | \n",
" 2008 - 2010 | \n",
" None | \n",
"
\n",
" \n",
" 8 | \n",
" 8 | \n",
" 10143090 | \n",
" M | \n",
" 48 | \n",
" 2155 | \n",
" 2011 - 2013 | \n",
" None | \n",
"
\n",
" \n",
" 9 | \n",
" 9 | \n",
" 10148710 | \n",
" M | \n",
" 67 | \n",
" 2137 | \n",
" 2008 - 2010 | \n",
" None | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" index subject_id gender anchor_age anchor_year anchor_year_group dod\n",
"0 0 10021917 M 54 2147 2017 - 2019 None\n",
"1 1 10033879 F 28 2173 2011 - 2013 None\n",
"2 2 10036909 M 50 2167 2011 - 2013 None\n",
"3 3 10050355 F 55 2168 2014 - 2016 None\n",
"4 4 10050389 M 43 2133 2014 - 2016 None\n",
"5 5 10050411 F 52 2150 2017 - 2019 None\n",
"6 6 10068290 M 29 2151 2014 - 2016 None\n",
"7 7 10127166 F 51 2149 2008 - 2010 None\n",
"8 8 10143090 M 48 2155 2011 - 2013 None\n",
"9 9 10148710 M 67 2137 2008 - 2010 None"
]
},
"execution_count": 74,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.read_sql(\"SELECT * FROM patients LIMIT 10\", conn)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### triage"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"La table triage contient les données de \"tri\" des patients. \n",
"Le circuit classique d'un passage aux urgences est le suivant :\n",
"- Enregistrement administratif du patient\n",
"- Patient reçu rapidement par un infirmier (en France on parle d'infirmier d'accueil et d'orientation, où IAO)\n",
"- Un niveau de sévèrité est attribué au patient par l'IAO\n",
"- Le médecin voit ensuite le patient, dès qu'il le peut, par ordre de priorité et d'attente\n",
"- Si le médecin a besoin de plus d'éléments, il prescrit des examens complémentaires au patient\n",
"\n",
"Ainsi, l'ensemble des informations disponible à l'arrivé du patient sont enregistré dans cette table."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Le champs chiefcomplaint est un champs de texte libre, le reste contient des variables quantitative de paramètre vitaux recueillis par l'IAO. \n",
"Ce champs peux contenir une liste, qui est alors séparé par une virgule. \n",
"Il s'agit d'un champs qui a fait l'objet d'une désidentification, ainsi si une information personnelle était présente dans la liste, elle est remplacé par le caractère \"___\" (3 underscore à la suite).\n"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" index | \n",
" subject_id | \n",
" stay_id | \n",
" temperature | \n",
" heartrate | \n",
" resprate | \n",
" o2sat | \n",
" sbp | \n",
" dbp | \n",
" pain | \n",
" acuity | \n",
" chiefcomplaint | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0 | \n",
" 15585360 | \n",
" 37573921 | \n",
" 97.0 | \n",
" 87.0 | \n",
" 18.0 | \n",
" 100.0 | \n",
" 150.0 | \n",
" 71.0 | \n",
" 10.0 | \n",
" 3.0 | \n",
" | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" 17192424 | \n",
" 34160628 | \n",
" 98.6 | \n",
" 82.0 | \n",
" NaN | \n",
" 100.0 | \n",
" 111.0 | \n",
" 81.0 | \n",
" 3.0 | \n",
" 3.0 | \n",
" | \n",
"
\n",
" \n",
" 2 | \n",
" 2 | \n",
" 15248757 | \n",
" 32172727 | \n",
" 97.1 | \n",
" 112.0 | \n",
" 20.0 | \n",
" 100.0 | \n",
" 147.0 | \n",
" 97.0 | \n",
" 8.0 | \n",
" 4.0 | \n",
" | \n",
"
\n",
" \n",
" 3 | \n",
" 3 | \n",
" 16648037 | \n",
" 38946064 | \n",
" 98.5 | \n",
" 59.0 | \n",
" 18.0 | \n",
" 99.0 | \n",
" 160.0 | \n",
" 86.0 | \n",
" 2.0 | \n",
" 2.0 | \n",
" | \n",
"
\n",
" \n",
" 4 | \n",
" 4 | \n",
" 13492931 | \n",
" 39828574 | \n",
" 100.6 | \n",
" 90.0 | \n",
" 16.0 | \n",
" 96.0 | \n",
" 107.0 | \n",
" 55.0 | \n",
" 0.0 | \n",
" 3.0 | \n",
" ' | \n",
"
\n",
" \n",
" 5 | \n",
" 5 | \n",
" 11475777 | \n",
" 38193311 | \n",
" 97.1 | \n",
" 85.0 | \n",
" 16.0 | \n",
" 100.0 | \n",
" 138.0 | \n",
" 86.0 | \n",
" 7.0 | \n",
" 3.0 | \n",
" - | \n",
"
\n",
" \n",
" 6 | \n",
" 6 | \n",
" 16615356 | \n",
" 38143467 | \n",
" 98.5 | \n",
" 100.0 | \n",
" 16.0 | \n",
" 100.0 | \n",
" 130.0 | \n",
" 94.0 | \n",
" 8.0 | \n",
" 3.0 | \n",
" - | \n",
"
\n",
" \n",
" 7 | \n",
" 7 | \n",
" 17748455 | \n",
" 33063682 | \n",
" 99.4 | \n",
" 152.0 | \n",
" 18.0 | \n",
" 96.0 | \n",
" 136.0 | \n",
" 77.0 | \n",
" 0.0 | \n",
" 1.0 | \n",
" - | \n",
"
\n",
" \n",
" 8 | \n",
" 8 | \n",
" 13718556 | \n",
" 30494050 | \n",
" 98.5 | \n",
" 84.0 | \n",
" 14.0 | \n",
" 100.0 | \n",
" 124.0 | \n",
" 92.0 | \n",
" 4.0 | \n",
" 3.0 | \n",
" - | \n",
"
\n",
" \n",
" 9 | \n",
" 9 | \n",
" 13908077 | \n",
" 39580901 | \n",
" 97.0 | \n",
" 96.0 | \n",
" 18.0 | \n",
" 98.0 | \n",
" 129.0 | \n",
" 44.0 | \n",
" 4.0 | \n",
" 2.0 | \n",
" - | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" index subject_id stay_id temperature heartrate resprate o2sat \\\n",
"0 0 15585360 37573921 97.0 87.0 18.0 100.0 \n",
"1 1 17192424 34160628 98.6 82.0 NaN 100.0 \n",
"2 2 15248757 32172727 97.1 112.0 20.0 100.0 \n",
"3 3 16648037 38946064 98.5 59.0 18.0 99.0 \n",
"4 4 13492931 39828574 100.6 90.0 16.0 96.0 \n",
"5 5 11475777 38193311 97.1 85.0 16.0 100.0 \n",
"6 6 16615356 38143467 98.5 100.0 16.0 100.0 \n",
"7 7 17748455 33063682 99.4 152.0 18.0 96.0 \n",
"8 8 13718556 30494050 98.5 84.0 14.0 100.0 \n",
"9 9 13908077 39580901 97.0 96.0 18.0 98.0 \n",
"\n",
" sbp dbp pain acuity chiefcomplaint \n",
"0 150.0 71.0 10.0 3.0 \n",
"1 111.0 81.0 3.0 3.0 \n",
"2 147.0 97.0 8.0 4.0 \n",
"3 160.0 86.0 2.0 2.0 \n",
"4 107.0 55.0 0.0 3.0 ' \n",
"5 138.0 86.0 7.0 3.0 - \n",
"6 130.0 94.0 8.0 3.0 - \n",
"7 136.0 77.0 0.0 1.0 - \n",
"8 124.0 92.0 4.0 3.0 - \n",
"9 129.0 44.0 4.0 2.0 - "
]
},
"execution_count": 77,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.read_sql(\"SELECT * FROM triage LIMIT 10\", conn)"
]
},
{
"cell_type": "code",
"execution_count": 93,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" index | \n",
" subject_id | \n",
" stay_id | \n",
" temperature | \n",
" heartrate | \n",
" resprate | \n",
" o2sat | \n",
" sbp | \n",
" dbp | \n",
" pain | \n",
" acuity | \n",
" chiefcomplaint | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1305 | \n",
" 12601251 | \n",
" 34627547 | \n",
" 98.3 | \n",
" 81.0 | \n",
" 16.0 | \n",
" 100.0 | \n",
" 172.0 | \n",
" 65.0 | \n",
" 3.0 | \n",
" 3.0 | \n",
" Abdominal distention, Abd pain, Dyspnea on exe... | \n",
"
\n",
" \n",
" 1 | \n",
" 1308 | \n",
" 16663465 | \n",
" 36113926 | \n",
" 98.8 | \n",
" 58.0 | \n",
" 18.0 | \n",
" 98.0 | \n",
" 176.0 | \n",
" 71.0 | \n",
" 7.0 | \n",
" 2.0 | \n",
" Abdominal distention, Abd pain, S/P KIDNEY TRA... | \n",
"
\n",
" \n",
" 2 | \n",
" 1331 | \n",
" 14248362 | \n",
" 34323484 | \n",
" 98.3 | \n",
" 90.0 | \n",
" 17.0 | \n",
" 98.0 | \n",
" 140.0 | \n",
" 70.0 | \n",
" 0.0 | \n",
" 2.0 | \n",
" Abdominal distention, Abnormal labs, Jaundice,... | \n",
"
\n",
" \n",
" 3 | \n",
" 1332 | \n",
" 11493670 | \n",
" 35944304 | \n",
" 96.4 | \n",
" 80.0 | \n",
" 17.0 | \n",
" 99.0 | \n",
" 105.0 | \n",
" 65.0 | \n",
" 0.0 | \n",
" 2.0 | \n",
" Abdominal distention, Abnormal labs, Jaundice,... | \n",
"
\n",
" \n",
" 4 | \n",
" 1340 | \n",
" 15032149 | \n",
" 34358091 | \n",
" 96.6 | \n",
" 51.0 | \n",
" 20.0 | \n",
" 100.0 | \n",
" 128.0 | \n",
" 77.0 | \n",
" 2.0 | \n",
" 2.0 | \n",
" Abdominal distention, Abnormal sodium level, A... | \n",
"
\n",
" \n",
" 5 | \n",
" 1341 | \n",
" 18160217 | \n",
" 37565547 | \n",
" 98.1 | \n",
" 82.0 | \n",
" 16.0 | \n",
" 100.0 | \n",
" 116.0 | \n",
" 55.0 | \n",
" 0.0 | \n",
" 3.0 | \n",
" Abdominal distention, Abnormal sodium level, T... | \n",
"
\n",
" \n",
" 6 | \n",
" 1345 | \n",
" 14539683 | \n",
" 30348440 | \n",
" 96.9 | \n",
" 80.0 | \n",
" 14.0 | \n",
" 100.0 | \n",
" 128.0 | \n",
" 58.0 | \n",
" NaN | \n",
" 2.0 | \n",
" Abdominal distention, Altered mental status, J... | \n",
"
\n",
" \n",
" 7 | \n",
" 1346 | \n",
" 12491283 | \n",
" 39692564 | \n",
" 97.7 | \n",
" 88.0 | \n",
" 18.0 | \n",
" 99.0 | \n",
" 116.0 | \n",
" 70.0 | \n",
" 7.0 | \n",
" 2.0 | \n",
" Abdominal distention, Altered mental status, J... | \n",
"
\n",
" \n",
" 8 | \n",
" 1350 | \n",
" 16220647 | \n",
" 39977711 | \n",
" 98.3 | \n",
" 94.0 | \n",
" 16.0 | \n",
" 100.0 | \n",
" 121.0 | \n",
" 63.0 | \n",
" 6.0 | \n",
" 3.0 | \n",
" Abdominal distention, Anemia, Dyspnea, RUQ abd... | \n",
"
\n",
" \n",
" 9 | \n",
" 1368 | \n",
" 14912902 | \n",
" 36669324 | \n",
" 98.4 | \n",
" 119.0 | \n",
" 18.0 | \n",
" 96.0 | \n",
" 124.0 | \n",
" 76.0 | \n",
" 7.0 | \n",
" 2.0 | \n",
" Abdominal distention, Back pain, L Shoulder pa... | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" index subject_id stay_id temperature heartrate resprate o2sat \\\n",
"0 1305 12601251 34627547 98.3 81.0 16.0 100.0 \n",
"1 1308 16663465 36113926 98.8 58.0 18.0 98.0 \n",
"2 1331 14248362 34323484 98.3 90.0 17.0 98.0 \n",
"3 1332 11493670 35944304 96.4 80.0 17.0 99.0 \n",
"4 1340 15032149 34358091 96.6 51.0 20.0 100.0 \n",
"5 1341 18160217 37565547 98.1 82.0 16.0 100.0 \n",
"6 1345 14539683 30348440 96.9 80.0 14.0 100.0 \n",
"7 1346 12491283 39692564 97.7 88.0 18.0 99.0 \n",
"8 1350 16220647 39977711 98.3 94.0 16.0 100.0 \n",
"9 1368 14912902 36669324 98.4 119.0 18.0 96.0 \n",
"\n",
" sbp dbp pain acuity \\\n",
"0 172.0 65.0 3.0 3.0 \n",
"1 176.0 71.0 7.0 2.0 \n",
"2 140.0 70.0 0.0 2.0 \n",
"3 105.0 65.0 0.0 2.0 \n",
"4 128.0 77.0 2.0 2.0 \n",
"5 116.0 55.0 0.0 3.0 \n",
"6 128.0 58.0 NaN 2.0 \n",
"7 116.0 70.0 7.0 2.0 \n",
"8 121.0 63.0 6.0 3.0 \n",
"9 124.0 76.0 7.0 2.0 \n",
"\n",
" chiefcomplaint \n",
"0 Abdominal distention, Abd pain, Dyspnea on exe... \n",
"1 Abdominal distention, Abd pain, S/P KIDNEY TRA... \n",
"2 Abdominal distention, Abnormal labs, Jaundice,... \n",
"3 Abdominal distention, Abnormal labs, Jaundice,... \n",
"4 Abdominal distention, Abnormal sodium level, A... \n",
"5 Abdominal distention, Abnormal sodium level, T... \n",
"6 Abdominal distention, Altered mental status, J... \n",
"7 Abdominal distention, Altered mental status, J... \n",
"8 Abdominal distention, Anemia, Dyspnea, RUQ abd... \n",
"9 Abdominal distention, Back pain, L Shoulder pa... "
]
},
"execution_count": 93,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.read_sql(\"SELECT * FROM triage WHERE LENGTH(chiefcomplaint) > 50 LIMIT 10\", conn)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### medrecon"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"La table medrecon contient la liste des traitements habituels du patient. \n",
"Par traitement habituel, on entends les traitement que le patient prenait avant même son passage aux urgences.\n",
"\n",
"Ces traitements sont tous identifiés à l'aide d'identifiant produit :\n",
"- NDC - National Drug Classification (de la NDA)\n",
"- GSN - Generic sequence number\n",
"\n",
"Les traitements ont été regroupés à l'aide d'une classification nommé \"Enhanced Therapeutic Class\", chaque groupe est identifié par un code `etccode` associé au libelé du groupe `etcdescription`.\n",
"\n",
"Une analyse plus poussée des traitements pourrait être nécessaires, elle imposerait d'exploiter le code produit NDC. \n",
"L'idéal est d'obtenir un code ATC, qui est une classification fonctionnelle très pratique pour identifier l'indication des médicaments. \n",
"Une approche pourrait être de se reposer sur l'API de la `National Library of Medicine` (https://lhncbc.nlm.nih.gov/RxNav/). \n",
"Une travail similaire a été effectué par d'autres personnes : https://github.com/fabkury/ndc_map"
]
},
{
"cell_type": "code",
"execution_count": 97,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" index | \n",
" subject_id | \n",
" stay_id | \n",
" charttime | \n",
" name | \n",
" gsn | \n",
" ndc | \n",
" etc_rn | \n",
" etccode | \n",
" etcdescription | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0 | \n",
" 12363835 | \n",
" 34078373 | \n",
" 2177-02-13 03:31:00 | \n",
" Caltrate 600 + D | \n",
" 63395 | \n",
" 11845014031 | \n",
" 1 | \n",
" 6143.0 | \n",
" Minerals and Electrolytes - Calcium Replacemen... | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" 12363835 | \n",
" 34078373 | \n",
" 2177-02-13 03:31:00 | \n",
" Coumadin | \n",
" 6562 | \n",
" 15330026801 | \n",
" 1 | \n",
" 806.0 | \n",
" Anticoagulants - Coumarin | \n",
"
\n",
" \n",
" 2 | \n",
" 2 | \n",
" 12363835 | \n",
" 34078373 | \n",
" 2177-02-13 03:31:00 | \n",
" Flonase | \n",
" 18368 | \n",
" 12280027016 | \n",
" 1 | \n",
" 301.0 | \n",
" Nasal Corticosteroids | \n",
"
\n",
" \n",
" 3 | \n",
" 3 | \n",
" 12363835 | \n",
" 34078373 | \n",
" 2177-02-13 03:31:00 | \n",
" lidocaine | \n",
" 43256 | \n",
" 10544037630 | \n",
" 1 | \n",
" 3932.0 | \n",
" Dermatological - Topical Local Anesthetic Amides | \n",
"
\n",
" \n",
" 4 | \n",
" 4 | \n",
" 12363835 | \n",
" 34078373 | \n",
" 2177-02-13 03:31:00 | \n",
" omeprazole | \n",
" 43137 | \n",
" 16714074801 | \n",
" 1 | \n",
" 445.0 | \n",
" Gastric Acid Secretion Reducing Agents - Proto... | \n",
"
\n",
" \n",
" 5 | \n",
" 5 | \n",
" 12363835 | \n",
" 34078373 | \n",
" 2177-02-13 03:31:00 | \n",
" ranitidine HCl | \n",
" 11674 | \n",
" 10544036030 | \n",
" 1 | \n",
" 443.0 | \n",
" Gastric Acid Secretion Reducers - Histamine H2... | \n",
"
\n",
" \n",
" 6 | \n",
" 6 | \n",
" 12363835 | \n",
" 34078373 | \n",
" 2177-02-13 03:31:00 | \n",
" Tylenol Extra Strength | \n",
" 4490 | \n",
" 10003011501 | \n",
" 1 | \n",
" 577.0 | \n",
" Analgesic or Antipyretic Non-Opioid | \n",
"
\n",
" \n",
" 7 | \n",
" 7 | \n",
" 15128994 | \n",
" 30058281 | \n",
" 2167-08-29 04:27:00 | \n",
" Dialyvite | \n",
" 60020 | \n",
" 10542000010 | \n",
" 1 | \n",
" 690.0 | \n",
" B-Complex Vitamin Combinations | \n",
"
\n",
" \n",
" 8 | \n",
" 8 | \n",
" 15128994 | \n",
" 30058281 | \n",
" 2167-08-29 04:27:00 | \n",
" allopurinol | \n",
" 2535 | \n",
" 11289101302 | \n",
" 1 | \n",
" 2602.0 | \n",
" Hyperuricemia Therapy - Xanthine Oxidase Inhib... | \n",
"
\n",
" \n",
" 9 | \n",
" 9 | \n",
" 15128994 | \n",
" 30058281 | \n",
" 2167-08-29 04:27:00 | \n",
" albuterol sulfate | \n",
" 28090 | \n",
" 21695042308 | \n",
" 1 | \n",
" 5970.0 | \n",
" Asthma/COPD Therapy - Beta 2-Adrenergic Agents... | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" index subject_id stay_id charttime name \\\n",
"0 0 12363835 34078373 2177-02-13 03:31:00 Caltrate 600 + D \n",
"1 1 12363835 34078373 2177-02-13 03:31:00 Coumadin \n",
"2 2 12363835 34078373 2177-02-13 03:31:00 Flonase \n",
"3 3 12363835 34078373 2177-02-13 03:31:00 lidocaine \n",
"4 4 12363835 34078373 2177-02-13 03:31:00 omeprazole \n",
"5 5 12363835 34078373 2177-02-13 03:31:00 ranitidine HCl \n",
"6 6 12363835 34078373 2177-02-13 03:31:00 Tylenol Extra Strength \n",
"7 7 15128994 30058281 2167-08-29 04:27:00 Dialyvite \n",
"8 8 15128994 30058281 2167-08-29 04:27:00 allopurinol \n",
"9 9 15128994 30058281 2167-08-29 04:27:00 albuterol sulfate \n",
"\n",
" gsn ndc etc_rn etccode \\\n",
"0 63395 11845014031 1 6143.0 \n",
"1 6562 15330026801 1 806.0 \n",
"2 18368 12280027016 1 301.0 \n",
"3 43256 10544037630 1 3932.0 \n",
"4 43137 16714074801 1 445.0 \n",
"5 11674 10544036030 1 443.0 \n",
"6 4490 10003011501 1 577.0 \n",
"7 60020 10542000010 1 690.0 \n",
"8 2535 11289101302 1 2602.0 \n",
"9 28090 21695042308 1 5970.0 \n",
"\n",
" etcdescription \n",
"0 Minerals and Electrolytes - Calcium Replacemen... \n",
"1 Anticoagulants - Coumarin \n",
"2 Nasal Corticosteroids \n",
"3 Dermatological - Topical Local Anesthetic Amides \n",
"4 Gastric Acid Secretion Reducing Agents - Proto... \n",
"5 Gastric Acid Secretion Reducers - Histamine H2... \n",
"6 Analgesic or Antipyretic Non-Opioid \n",
"7 B-Complex Vitamin Combinations \n",
"8 Hyperuricemia Therapy - Xanthine Oxidase Inhib... \n",
"9 Asthma/COPD Therapy - Beta 2-Adrenergic Agents... "
]
},
"execution_count": 97,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.read_sql(\"SELECT * FROM medrecon LIMIT 10\", conn)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### diagnosis"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Bien que ne faisant pas partit de l'objet du projet, la table `diagnosis` a été inclut dans l'éventualité où nous aurions besoin de l'explorer. \n",
"L'ensemble des diagnostic sont identifié à l'aide de la classification internationale des maladie dans sa version 9 (ICD-9)."
]
},
{
"cell_type": "code",
"execution_count": 98,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" index | \n",
" subject_id | \n",
" stay_id | \n",
" seq_num | \n",
" icd_code | \n",
" icd_version | \n",
" icd_title | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0 | \n",
" 15825222 | \n",
" 39090953 | \n",
" 1 | \n",
" 486 | \n",
" 9 | \n",
" PNEUMONIA,ORGANISM UNSPECIFIED | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" 15825222 | \n",
" 39090953 | \n",
" 2 | \n",
" 4254 | \n",
" 9 | \n",
" PRIM CARDIOMYOPATHY NEC | \n",
"
\n",
" \n",
" 2 | \n",
" 2 | \n",
" 11554870 | \n",
" 37245764 | \n",
" 1 | \n",
" 5609 | \n",
" 9 | \n",
" INTESTINAL OBSTRUCT NOS | \n",
"
\n",
" \n",
" 3 | \n",
" 3 | \n",
" 19748558 | \n",
" 30511202 | \n",
" 1 | \n",
" 49392 | \n",
" 9 | \n",
" ASTHMA, UNSPECIFIED, WITH ACUTE EXACERBATION | \n",
"
\n",
" \n",
" 4 | \n",
" 4 | \n",
" 18008347 | \n",
" 34907903 | \n",
" 1 | \n",
" 7842 | \n",
" 9 | \n",
" SWELLING IN HEAD & NECK | \n",
"
\n",
" \n",
" 5 | \n",
" 5 | \n",
" 15475846 | \n",
" 30308014 | \n",
" 1 | \n",
" 78079 | \n",
" 9 | \n",
" OTHER MALAISE AND FATIGUE | \n",
"
\n",
" \n",
" 6 | \n",
" 6 | \n",
" 15475846 | \n",
" 30308014 | \n",
" 2 | \n",
" 27651 | \n",
" 9 | \n",
" DEHYDRATION | \n",
"
\n",
" \n",
" 7 | \n",
" 7 | \n",
" 16538197 | \n",
" 33556881 | \n",
" 1 | \n",
" 6084 | \n",
" 9 | \n",
" MALE GEN INFLAM DIS NEC | \n",
"
\n",
" \n",
" 8 | \n",
" 8 | \n",
" 18438529 | \n",
" 35200231 | \n",
" 1 | \n",
" 30500 | \n",
" 9 | \n",
" ALCOHOL ABUSE-UNSPEC | \n",
"
\n",
" \n",
" 9 | \n",
" 9 | \n",
" 17185991 | \n",
" 37532781 | \n",
" 1 | \n",
" 7840 | \n",
" 9 | \n",
" HEADACHE | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" index subject_id stay_id seq_num icd_code icd_version \\\n",
"0 0 15825222 39090953 1 486 9 \n",
"1 1 15825222 39090953 2 4254 9 \n",
"2 2 11554870 37245764 1 5609 9 \n",
"3 3 19748558 30511202 1 49392 9 \n",
"4 4 18008347 34907903 1 7842 9 \n",
"5 5 15475846 30308014 1 78079 9 \n",
"6 6 15475846 30308014 2 27651 9 \n",
"7 7 16538197 33556881 1 6084 9 \n",
"8 8 18438529 35200231 1 30500 9 \n",
"9 9 17185991 37532781 1 7840 9 \n",
"\n",
" icd_title \n",
"0 PNEUMONIA,ORGANISM UNSPECIFIED \n",
"1 PRIM CARDIOMYOPATHY NEC \n",
"2 INTESTINAL OBSTRUCT NOS \n",
"3 ASTHMA, UNSPECIFIED, WITH ACUTE EXACERBATION \n",
"4 SWELLING IN HEAD & NECK \n",
"5 OTHER MALAISE AND FATIGUE \n",
"6 DEHYDRATION \n",
"7 MALE GEN INFLAM DIS NEC \n",
"8 ALCOHOL ABUSE-UNSPEC \n",
"9 HEADACHE "
]
},
"execution_count": 98,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.read_sql(\"SELECT * FROM diagnosis LIMIT 10\", conn)"
]
}
],
"metadata": {
"interpreter": {
"hash": "c304935560631f5a20c1bdabb506947800ccd82d813704000c078f0735b9b818"
},
"kernelspec": {
"display_name": "Python 3.9.9 64-bit ('base': conda)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}