PKG-INFO 8.4 KB


  1. Metadata-Version: 1.0
  2. Name: pandasToBrat
  3. Version: 1.0
  4. Summary: Function for Brat folder administration from Python and Pandas object.
  5. Home-page: UNKNOWN
  6. Author: Ali BELLAMINE
  7. Author-email: contact@alibellamine.me
  8. License: UNKNOWN
  9. Description: # pandasToBrat
  10. Ali BELLAMINE - contact@alibellamine.me
  11. _Last version : 1.0 - 28/10/2020_
  12. ## What is pandasToBrat ?
  13. pandasToBrat is a library to manage brat configuration and brat data from a Python interface.
  14. ### What can it do ?
  15. - Reading brat annotations and relations configuration to python dictionnary
  16. - Writting brat annotations and relations configuration from python dictionnary
  17. - Reading brat text data to python pandas dataframe
  18. - Writting brat text file from python pandas Series
  19. - Reading brat annotations and relations
  20. - Writting brat annotations and relations from python pandas DataFrame
  21. ### What it doesn't support ?
  22. - Keyboard shortcut configuration
  23. - Event, Attribution, Modification, Normalization and Notes annotations
  24. - Relation type in relations configuration
  25. ## How to use it ?
  26. ### Installation
  27. Clone the current repository :
  28. ```
  29. git clone [LIBRARY_PATH]
  30. ```
  31. Install dependencies with pip.
  32. ```
  33. pip install -r requirements.txt
  34. ```
  35. Then install the library :
  36. ```
  37. pip install -e .
  38. ```
  39. ### Loading a brat folder
  40. Instantiate the brat library with the folder path :
  41. ```
  42. from pandasToBrat import pandasToBrat
  43. brat_data = pandasToBrat(FOLDER_PATH)
  44. ```
  45. ### Parameters
  46. Parameters are stored in a dictionnary :
  47. ```
  48. {
  49. "entities":ENTITIES_CONFIGURATION_DATA,
  50. "relations":RELATIONS_CONFIGURATION_DATA
  51. }
  52. ```
  53. #### Entities configuration data
  54. Dictionnary formated as :
  55. ```
  56. {
  57. LABEL_NAME:{
  58. LABEL_NAME_CHILD1:True,
  59. LABEL_NAME_CHILD2:True,
  60. LABEL_NAME_CHILD3:{
  61. LABEL_NAME_CHILD3_CHILD1:True
  62. }
  63. }
  64. }
  65. ```
  66. Each entry is an entitie.
  67. An entitie can either be setted as True, it have no child, or have on or many childrens in which case is contains a dictionnary.
  68. #### Relations configuration data
  69. Dictionnary formated as :
  70. ```
  71. {
  72. RELATION_NAME:{
  73. "args":[ENTITIES_NAME,...]
  74. }
  75. }
  76. ```
  77. Each entrie of the dictionnary is a relation.
  78. Each relation have a relation name and defined with a sub-dictionnary containing an args entrie.
  79. The args entrie contains a list of entities that are concerned by the relation.
  80. #### Read and write parameters
  81. ##### Getting parameters
  82. You can read the current parameters using the dedicated method :
  83. ```
  84. bratData.read_conf()
  85. ```
  86. ##### Writtings parameters
  87. You can write parameters using the dedicated method :
  88. ```
  89. bratData.write_conf(entities = ENTITIES_CONFIGURATION, relations = RELATIONS_CONFIGURATION)
  90. ```
  91. The ENTITIES_CONFIGATION is a dictionnary formated as described in the "Entities configuration data" chapter.
  92. The RELATIONS_CONFIGURATION is a dictionnary formated as described in the "Relations configuration data" chapter.
  93. ### Text
  94. Text is stored in a Pandas Dataframe with two columns :
  95. - id : document id, which is contained in the .txt filename
  96. - text_data : document data
  97. #### Read and write text
  98. ##### Getting text data
  99. ```
  100. bratData.read_text()
  101. ```
  102. #### Sending text data
  103. ```
  104. bratData.write_text(text_id=TEXT_ID_SERIES, text = TEXT_SERIES, empty = EMPTY_PARAMETER, overWriteAnnotations = OVERWRITE_ANNOTATIONS_PARAMETERS)
  105. ```
  106. The required parameters are text_id and text which are Pandas Series, which should be of the same size containing for the first one the document unique id and the second one the document text data.
  107. The empty parameters is used to empty the current folder. If set as True, the Brat folder is emptied of all text and annotations data. Configuration is not erased.
  108. The overwrite annotations parameter is used to overwrite the current annotation (.ann) file with an empty one, it is useful if you want to remove the existing annotations when you are modifiying a text file.
  109. This way, you can :
  110. - Overwrite all data with empty set as True
  111. - Only overwritting new data with empty set as False and overWriteAnnotations set as True : you write new file, if the id already exist it is overwritten, if not is it ignored.
  112. ### Annotations
  113. Parameters are stored in a dictionnary :
  114. ```
  115. {
  116. "annotations":ANNOTATIONS_ANNOTATIONS,
  117. "relations":RELATIONS_ANNOTATIONS
  118. }
  119. ```
  120. #### Annotations format
  121. Annotations are word labeled with entities.
  122. It is formatted as a Pandas DataFrame, containing the following columns :
  123. - id : Document id, one document can have mutiples annotations
  124. - type_id : annotation number inside the same document, from T1 to Tn, with n the number of annotated string, it is used to match annotations with relations
  125. - word : the annotated string
  126. - label : the entitie related to the annotated string
  127. - start : the annotated string start offset
  128. - end : the annotated string end offset
  129. #### Relations format
  130. Annotations are relations between annotations.
  131. It is formatted as a Pandas DataFrame, containing the following columns :
  132. - id : Document id, one document can have mutiples relations
  133. - type_id : relation number inside the same document, from R1 to Rn, with n the number of relations
  134. - relation : The relation Name
  135. - ArgX : The annotated entitie which a linked by the relation, each column refer to an entitie, the entitie id correspond to the annotations DataFrame "type_id" column
  136. #### Read and write annotations
  137. ##### Getting annotations data
  138. ```
  139. bratData.read_annotation()
  140. ```
  141. #### Sending annotations data
  142. ##### Write annotations subpart of annotations
  143. ```
  144. bratData.write_annotations(df, text_id, word, label, start, end, overwrite=OVERWRITE_OPTION)
  145. ```
  146. The first parameter is the datafame containing the annotations.
  147. It should be formated as described in the "Annotations format" subpart.
  148. The text_id, word, label, start and end are the name of the column inside the dataframe which contains the related data.
  149. The overwrite option can be set as True to overwrite existing annotations, otherwise the dataframe's data are added to existing annotations data.
  150. ##### Write relations subpart of annotations
  151. ```
  152. bratData.write_relations(df, relation, overwrite=OVERWRITE_OPTION)
  153. ```
  154. The first parameter is the datafame containing the relations.
  155. It should be formated as described in the "Relations format" subpart.
  156. The text_id and relation are the name of the column inside the dataframe which contains the related data.
  157. The other columns should contains the type_id of related entities, as outputed by the read_annotation method.
  158. The overwrite option can be set as True to overwrite existing annotations, otherwise the dataframe's data are added to existing annotations data.
  159. Platform: UNKNOWN