+.. highlight:: shell
+Contributions are welcome, and they are greatly appreciated! Every little bit
+helps, and credit will always be given.
+You can contribute in many ways:
+Types of Contributions
+Report Bugs
+Report bugs at
+If you are reporting a bug, please include:
+* Your operating system name and version.
+* Any details about your local setup that might be helpful in troubleshooting.
+* Detailed steps to reproduce the bug.
+Fix Bugs
+Look through the GitHub issues for bugs. Anything tagged with "bug" and "help
+wanted" is open to whoever wants to implement it.
+Implement Features
+Look through the GitHub issues for features. Anything tagged with "enhancement"
+and "help wanted" is open to whoever wants to implement it.
+Write Documentation
+py_thesis_toolbox could always use more documentation, whether as part of the
+official py_thesis_toolbox docs, in docstrings, or even on the web in blog posts,
+articles, and such.
+Submit Feedback
+The best way to send feedback is to file an issue at
+If you are proposing a feature:
+* Explain in detail how it would work.
+* Keep the scope as narrow as possible, to make it easier to implement.
+* Remember that this is a volunteer-driven project, and that contributions
+  are welcome :)
+Get Started!
+Ready to contribute? Here's how to set up `py_thesis_toolbox` for local development.
+1. Fork the `py_thesis_toolbox` repo on GitHub.
+2. Clone your fork locally::
+    $ git clone
+3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development::
+    $ mkvirtualenv py_thesis_toolbox
+    $ cd py_thesis_toolbox/
+    $ python develop
+4. Create a branch for local development::
+    $ git checkout -b name-of-your-bugfix-or-feature
+   Now you can make your changes locally.
+5. When you're done making changes, check that your changes pass flake8 and the
+   tests, including testing other Python versions with tox::
+    $ flake8 py_thesis_toolbox tests
+    $ python test or pytest
+    $ tox
+   To get flake8 and tox, just pip install them into your virtualenv.
+6. Commit your changes and push your branch to GitHub::
+    $ git add .
+    $ git commit -m "Your detailed description of your changes."
+    $ git push origin name-of-your-bugfix-or-feature
+7. Submit a pull request through the GitHub website.
+Pull Request Guidelines
+Before you submit a pull request, check that it meets these guidelines:
+1. The pull request should include tests.
+2. If the pull request adds functionality, the docs should be updated. Put
+   your new functionality into a function with a docstring, and add the
+   feature to the list in README.rst.
+3. The pull request should work for Python 3.5, 3.6, 3.7 and 3.8, and for PyPy. Check
+   and make sure that the tests pass for all supported Python versions.
+To run a subset of tests::
+    $ python -m unittest tests.test_py_thesis_toolbox
+A reminder for the maintainers on how to deploy.
+Make sure all your changes are committed (including an entry in HISTORY.rst).
+Then run::
+$ bump2version patch # possible: major / minor / patch
+$ git push
+$ git push --tags
+Travis will then deploy to PyPI if tests pass.

+0.0.1 (2021-03-30)
+* First release on PyPI.

 # py_thesis_toolbox
-Outils d'analyse des données de thèses
+Outils d'analyse des données de thèses.
+Applique une description des données suivi d'une série de tests univariés.
+## Installation du plugin
+    git clone
+    cd py_thesis_toolbox
+    pip install -r requirements.txt
+    pip install .
+## Utilisation
+### Analyses d'un jeu de donnée
+L'analyse d'un jeu de données procède aux traitements suivant :
+- Analyse descriptive : moyenne, nombre de sujet, médiane, intervales inter-quartiles
+- Analyse explicative descriptive
+- Application de test lors de l'analyse explicative descriptive
+**@TODO : Implémenter la création d'un modèle multivariés**
+    from thesis_analysis import analyseStatistiques
+    analyses = analyseStatistiques(df)
+    analyses.analyse_univarie(
+        variable_interet,
+        variables_explicatives
+    )
+#### variable_interet
+La variable **variable d'intérêt** comprend un dictionnaire décrivant une liste de variables qualitatives ou quantitative.
+Le dictionnaire doit être de la forme :
+    nom_de_variable:type_de_variable["qualitative","quantitative"]
+    ...
+#### variables_explicatives
+Liste de variables explicatives.
+L'ensemble des variables doit être de type qualitatif.
+Il s'agit d'une liste :
+    nom_de_variable
+    ...
+### Application d'un test spécifique
+    from thesis_analysis.test import testQualitatif, testQuantitatif
+    test = testQualitatif(df, y, x)
+    test.best_test()
+Applique pour un jeu x et y le meilleur test possible.
+Paramètres :
+- df : DataFrame contenant l'ensemble du jeu de données
+- y : variable d'intérêt sur laquelle on mesure l'impact de la variable x
+- x : variable dont on mesure l'impact sur y
+La fonction **best_test** détermine le meilleur test applicable aux données.
+Il est possible d'executer une série de test manuellement.
+La liste des test peut être obtenue en éxecutant :
+    dir(test)

+.. include:: ../AUTHORS.rst

+from .test import testQualitatif, testQuantitatif
+class analyseStatistiques ():
+    """
+        Permet l'analyse d'un jeu de données.
+        Analyses univariés :
+            Description des données
+            Application des tests
+        Input : dataset
+    """
+    def __init__ (self, df):
+        # Chargement du dataframe
+        self.df = df
+    def _describe_qualitative (self, data):
+        """
+            Calculate n and p of each modalitie of the qualitative value
+            Input : data, Pandas Series containing data to describe
+        """
+        table = data.value_counts()
+        description = pd.DataFrame({'n':table, 'p':table/table.sum()}) \
+            .to_dict("index")
+        description["total"] = table.sum()
+        return(description)
+    def _get_sub_table(self, variable, axes):
+        # On sélectionne les données à analyse
+        if (axes is None):
+            temp_data = self.df[[variable]]
+        else:
+            temp_data = self.df[[variable]+axes]
+        temp_data = temp_data.dropna()
+        return(temp_data)
+    def _analyse_univarie_qualitative (self, variable, axes = None):
+        temp_data = self._get_sub_table(variable, [])
+        # On charge un dictionnaire vide
+        analyse = {}
+        analyse["n"] = temp_data.shape[0]
+        ## Globale : en dehors de l'axe d'analyse
+        analyse["global"] = self._describe_qualitative(temp_data[variable])
+        ## Spécifique : Dans les axes d'analyse
+        if (axes is not None):
+            analyse["sous_groupes"] = {}
+            analyse["test"] = {}
+            for axe in axes:  
+                temp_data = self._get_sub_table(variable, [axe])
+                # Axe values
+                axe_values = temp_data[axe] \
+                    .drop_duplicates().values.tolist()
+                # Description
+                analyse["sous_groupes"][axe] = {}
+                for values in axe_values:
+                    analyse["sous_groupes"][axe][values] = self._describe_qualitative(
+                        temp_data[
+                            temp_data[axe] == values
+                        ] \
+                        .reset_index(drop = True)[variable]
+                    )
+                # Test statistique
+                analyse["test"][axe] = testQualitatif(temp_data, variable,axe).best_test()
+        analyse["type"] = "qualitative"
+        return analyse
+    def _describe_quantitative (self, data):
+        """
+            Calculate mean, median, Q25, 50, 27, std, std_mean and CI for quantitative data
+            Input : data, Pandas Series containing data to describe
+        """
+        # Dict containing data
+        description = {}
+        description["n"] = data.shape[0]
+        description["mean"] = data.mean()
+        description["median"] = data.median()
+        description["Q25"] = data.quantile(0.25)
+        description["Q75"] = data.quantile(0.75)
+        description["std"] = data.std()
+        description["std_mean"] = description["std"]/math.sqrt(description["n"])
+        description["ci_95"] = [description["mean"]-1.96*description["std_mean"], 
+                                description["mean"]+1.96*description["std_mean"]]        
+        return description
+    def _analyse_univarie_quantitative (self, variable, axes = None):
+        # On sélectionne les données à analyse
+        temp_data = self._get_sub_table(variable, [])
+        # On charge un dictionnaire vide
+        analyse = {}
+        analyse["n"] = temp_data.shape[0]
+        ## Globale : en dehors de l'axe d'analyse
+        analyse["global"] = self._describe_quantitative(temp_data[variable])
+        ## Spécifique : Dans les axes d'analyse
+        if (axes is not None):
+            analyse["sous_groupes"] = {}
+            analyse["test"] = {}
+            for axe in axes:  
+                temp_data = self._get_sub_table(variable, [axe])
+                # Axe values
+                axe_values = temp_data[axe] \
+                    .drop_duplicates().values.tolist()
+                # Description
+                analyse["sous_groupes"][axe] = {}
+                for values in axe_values:
+                    analyse["sous_groupes"][axe][values] = self._describe_quantitative(
+                        temp_data[
+                            temp_data[axe] == values
+                        ] \
+                        .reset_index(drop = True)[variable]
+                    )
+                # Test statistique
+                analyse["test"][axe] = testQuantitatif(temp_data, variable,axe).best_test()
+        analyse["type"] = "quantitative"
+        return analyse
+    def analyse_univarie (self, variables, axes = None):
+        """
+            Analyse descriptive univariée
+                variable : dictionnaire contenant la liste des variables à analyser de la forme :
+                    Key = Nom de la variable, Value : type de variable : quantitative ou qualitative
+                axes : axes d'analyse de la variable, doivent être de type qualitative. Liste de variables.                
+        """
+        # Sortie des résultats
+        resultats = {}
+        for variable, type_variable in variables.items():
+            if type_variable == 'qualitative':
+                resultats[variable] = self._analyse_univarie_qualitative(variable, axes)
+            elif type_variable == 'quantitative':
+                resultats[variable] = self._analyse_univarie_quantitative(variable, axes)                
+        return(resultats)

+import pandas as pd
+import numpy as np
+from scipy.stats import chi2_contingency, fisher_exact
+from scipy.stats import normaltest, kstest, levene, ttest_ind, mannwhitneyu, f_oneway, kruskal
+import statsmodels.stats.weightstats as ws
+# Qualitatif
+class testQualitatif ():
+    """
+        Applique le test qualitatif le plus adapté à partir d'un jeu de données
+            df : jeu de données
+            y : variable à tester
+            x : variable dont on souhaite mesurer l'impact
+    """
+    def __init__ (self, df, y, x):
+        # Calcul du tableau de contingence
+        self.contingency = self._get_contingency(df, y, x)
+    def _get_contingency (self, df, y, x):
+        contingency = pd.DataFrame(
+            {"n":df.groupby(x)[y].value_counts()}
+        ).reset_index() \
+        .pivot_table(index = [x], columns = [y]) \
+        .fillna(0)
+        contingency.columns = contingency.columns.droplevel(0)
+        return(contingency.values.astype(int))
+    def best_test (self):
+        """
+            Selectionne le meilleur test possible
+        """
+        # Ordre de priorité
+        ## 1. Khi2
+        ## 2. Khi2 - Yates
+        ## 3. Student test
+        ## 4. Absence de test
+        order_test = {
+            "khi2":[self.khi2,[False]],
+            "khi2_yates":[self.khi2,[True]],
+            "fisher":[self.fisher,[]],
+            "no_test":[self._no_test, []]
+        }
+        ## Application des tests dans l'ordre
+        for test_name, test in order_test.items():
+            test_result = test[0](*test[1])
+            if (test_result["valid"] == True):
+                return (test_name, test_result)
+    def khi2 (self, yates_correction = False):
+        """
+            Test du Khi-2
+            Paramètre :
+                yates_correction : Si True, effectue la correction de Yates
+        """
+        # Application du test
+        test_result = chi2_contingency(self.contingency, correction=yates_correction)
+        # Validité du test
+        if yates_correction == False:
+            khi2_valid = len([True for x in test_result[3] for y in x if y < 5]) == 0
+        else:
+            khi2_valid = (len([True for x in test_result[3] for y in x if y < 3]) == 0) \
+                        & (test_result[2] == 1)
+        # Structuration du résultat
+        output_result = dict(zip(
+            ["statistic","p_value", "dof", "theorical_values","observed_values","yates_correction", "valid"],
+            list(test_result)+[self.contingency, yates_correction, khi2_valid]
+        ))
+        return output_result
+    def fisher (self):
+        """
+            Test de Fisher
+        """
+        # Vérification des CI
+        valid = (self.contingency.shape == (2,2))
+        if (valid == True):
+            fisher_result = fisher_exact(self.contingency)
+            output_result = dict(zip(
+                ["statistic", "p_value", "observed_values","valid"],
+                list(fisher_result)+[self.contingency, valid]
+            ))
+        else:
+            output_result = {"valid":valid}
+        return output_result
+    def _no_test (self):
+        """
+            Retourne l'absence de test
+        """
+        output_result = {"valid":True}
+        return output_result

+import pandas as pd
+import numpy as np
+from scipy.stats import chi2_contingency, fisher_exact
+from scipy.stats import normaltest, kstest, levene, ttest_ind, mannwhitneyu, f_oneway, kruskal
+import statsmodels.stats.weightstats as ws
+class testQuantitatif ():
+    """
+        Applique le test qualitatif le plus adapté à partir d'un jeu de données
+            df : dataframe contenant les données
+            y : variable à tester
+            x : variable dont on souhaite mesurer l'impact
+    """
+    def __init__ (self, df, y, x):
+        # Calcul du tableau de contingence
+        self.df = df
+        self.x = x
+        self.y = y
+        # On détermines les modalités de x
+        self.x_shapes = self.df[x].drop_duplicates()
+        # On détermine les valeurs de y pour chaque x
+        self.y_values = {}
+        for x_value in self.x_shapes:
+            self.y_values[x_value] = df[
+                            df[x] == x_value
+                        ][y] \
+                        .values
+        # On détermine les éléments de validités
+        self.x_shape = self.x_shapes.shape[0]
+        self.n_sup_30 = ((df.groupby(x).count()[y] >= 30).sum() == 2)
+    def _check_all_group_normal(self):
+        # Application du test de normalité
+        normal_test_result = self.normal_distribution();
+        # On vérifie que tous les groupes sont distribués normalement
+        valid = len([False for x in normal_test_result.values() if x["p_value"] < 0.05]) == 0
+        return valid
+    def best_test (self, normal = None):
+        """
+            Selectionne le meilleur test possible
+            Variable :
+                normal : boolean, si True, on suppose une distribution normale sans faire de test de normalité
+        """
+        # Arbre decisionnel
+        ## x_shape :
+        ### 2 :
+        #### N1, N2 >= 30 : Z-test
+        #### N1, N2 <= 30 : 
+        ##### Distribution de chaque groupe normale ?
+        ###### Egalite variance : t-test
+        ###### Absence égalité variance : test de welch
+        ##### Absence de distribution normale : MWWilcoxon
+        ### > 2 :
+        #### Egalite variance et distribution normal : ANOVA
+        #### Autrement un test de Kruskal Wallis
+        if self.x_shape == 2:
+            if self.n_sup_30:
+                # Application du Z-test
+                result = self.z_test()
+                test_applied = "z_test"
+            else:
+                # Vérification de la normalité
+                if normal or self._check_all_group_normal():
+                    # Vérification de l'égalité des variance
+                    if self.variance_equity["p_value"] >= 0.05:
+                        # On suppose l'égalité de variance : t-test
+                        welch = False
+                        test_applied = "t_test"
+                    else:
+                        # On ne suppose pas l'égalité de variance : test de welch
+                        welch = True
+                        test_applied = "t_test Welch"
+                    result = self.t_test(welch)
+                else:
+                    test_applied = "Mann-Whitney Wilcoxon"
+                    result = self.mwwilcoxon()
+        else:
+            # Vérification de la normalité et de l'égalité de variance
+            if (normal or self._check_all_group_normal()) and (self.variance_equity["p_value"] >= 0.05):
+                # Anova
+                test_applied = "ANOVA_1W"
+                result = self.anova_1w()
+            else:
+                test_applied = "Kurskal_Wallis"
+                result = self.kruskal_wallis()
+        return (test_applied, result)
+    def _apply_test (self, test_function, params = {}):
+        # Application du test
+        test_result = test_function(*list(self.y_values.values()),
+                                **params)
+        output_result = dict(zip(
+            ["statistic","p_value"],
+            [test_result.statistic, test_result.pvalue]
+        ))
+        return(output_result)
+    def normal_distribution (self):
+        # On applique le test de normalité à chaque catégorie
+        norm_test_result = dict(zip(
+            self.y_values.keys(),
+            [   dict(zip(
+                    ["statistic", "p_value"],
+                    [x.statistic, x.pvalue]
+                ))
+                for x in
+                [kstest(y_values, "norm") for y_values in self.y_values.values()]         
+            ]
+        ))
+        return (norm_test_result)
+    def z_test (self):
+        # Application du test
+        test_result = ws.ztest(*list(self.y_values.values()))
+        output_result = dict(zip(
+            ["statistic","p_value"],
+            list(test_result)
+        ))
+        return(output_result)
+    def t_test (self, welch = False):
+        # Application du test
+        output_result = self._apply_test(ttest_ind, {"equal_var":welch})
+        return(output_result)
+    def mwwilcoxon (self):
+        # Application du test
+        output_result = self._apply_test(mannwhitneyu)
+        return(output_result)
+    def variance_equity (self):
+        """
+            Test de Levene mesuré sur la moyenne
+        """
+        # Application du test
+        output_result = self._apply_test(levene,{"center":"mean"})
+        return(output_result)
+    def anova_1w (self):
+        """
+            ANOVA - One way : compare la variance de tous les groupes
+        """
+         # On applique le test 
+        output_result = self._apply_test(f_oneway)
+        return(output_result)
+    def kruskal_wallis (self):
+         # On applique le test 
+        output_result = self._apply_test(kruskal)
+        return(output_result)
+    def _no_test (self):
+        """
+            Retourne l'absence de test
+        """
+        output_result = {"valid":True}
+        return output_result

