Corpus builder

The tool lcpcli ships with a helper python class Corpus to prepare LCP corpora.

The tutorial uses the Corpus class to process SRT files and import a video corpus into LCP.

The various tests in the lcpcli repository give concrete examples on how to use the Corpus class.

The following repositories also use the Corpus class to convert existing data sets:

`Corpus`

You need to instantiate the Corpus class to create a new corpus.

Arguments:

name (str, mandatory) is the name of the corpus
document (str, optional, default "Document") is the name of the document-level layer of the corpus
segment (str, optional, default "Segment") is the name of the sentence-level layer of the corpus
token (str, optional, default "Token") is the name of the word-level layer of the corpus
authors (str, optional, default "placeholder") is the name(s) of the author(s) of the corpus
institution (str, optional, default "") is the name of the institution associated with the corpus
description (str, optional, default "") is a description of the corpus, as it will be presented to end users
date (str, optional, default "placeholder") is the date when the corpus was curated
revision (int | float, optional, default 1) is the revision number of the corpus
url (str, optional, default "placeholder") is the source URL of the corpus
license (str | None, optional, default None) is the code of the license of the corpus

The values of authors, institution, description, date, revision, url and license can be modified in LCP after import.

Example

from lcpcli.builder import Corpus

c = Corpus("my great corpus", document="Book", segment="Sentence", token="Word")

Instance methods

An instance of the Corpus class has an open set of methods, which should all start with a capital letter, and which will create and return an entity in the corpus with the passed attributes (an instance of the class Layer)

All corpora should create at least one entity by calling each of the methods named after the values passed as document, segment and token when instantiating the Corpus class.

Example

from lcpcli.builder import Corpus

c = Corpus("my great corpus")

c.Document(
    c.Segment(
        c.Word("hello"),
        c.Word("world")
    )
)

c.make("path/to/output/")

`make`

Writes all the CSV files and the configuration JSON file of the corpus to the passed directory.

The make method is the only valid method that starts with a non-capital letter.

Arguments:

destination (str, mandatory) is a path where to place the output files
is_global (dict, optional, default {}) maps layers to attribute names whose possible values are defined globally, such as the upos on tokens

Corpus builder

Corpus builder

`Corpus`

Instance methods

`make`

`Layer` class

Instance methods

`make`

`add`

`set_media`

`set_char`

`get_char`

`set_time`

`get_time`

`set_xy`

`get_xy`

results matching ""

No results matching ""

Corpus builder

Corpus

Instance methods

make

Layer class

Instance methods

make

add

set_media

set_char

get_char

set_time

get_time

set_xy

get_xy

results matching ""

No results matching ""

`Corpus`

`make`

`Layer` class

`make`

`add`

`set_media`

`set_char`

`get_char`

`set_time`

`get_time`

`set_xy`

`get_xy`