The DQD Query Language
For a list of keywords used in DQD, see DQD keywords
Introduction
LCP's query language DQD (Descriptive Query Definition) lets you look for matches based on a set of constraints and output the results in various formats.

The picture above illustrates a simple query looking for co-occurrences of "cat" and "dog" within the same sentence (the Constraints part of the query) and asking to output them as a plain list of matches (the Results part of the query).
Getting started
The picture below relates a simple query on the BNC corpus (on the left, "look for all occurrences of dogs") to its structure (on the right).

Corpus-specific queries
The first remark about DQD is that it adapts to the specificities of each corpus. The terms Segment, Token and form in the query all occur in the diagram; another corpus could use Word instead of Token, in which case the DQD query would use that term instead.
The same is true of all entities and attributes in the corpus: in the BNC corpus, the part-of-speech of each token was labeled according to two different conventions, hence each token comes with two attributes named xpos1 and xpos2. Queries can define constraints on either attribute (as in, e.g., xpos2 = "VERB")
For this reason, it is important to know the structure of a corpus when writing a DQD query, which is why LCP displays a diagram along with the query editor.
Entities
Entities are instantiated by providing the name of their annotation layer, followed by a (unique) label, which can be used to reference the entity later on. The simple line Segment s instantiates an entity labeled s on the annotation layer Segment.
The query above declares a second entity, labeled t, on the annotation layer named Token. The operator @ requires that it overlap character-wise with another entity, in this case the segment labeled s. As visible in the diagram, each token is fully contained in a segment, so overlapping here means being part of a segment.
@ operator is important: without it, you would be looking for any possible combination of a segment and a token "dogs" in the corpus. So even if "dogs" appeared only once in a single segment across the entire corpus, if it contained a total of 1000 segments, you would end up with 1000 matches, because the query would define no relation constraint between the segment and the token. If "dogs" appeared twice, you would have 2000 matches, and so on, which quickly explodes and can lead to significant querying times.
Constraints
Simple constraints usually use the format left operator right. When the constraint appears in the scope of an entity (as signaled by indentation), left and right can be the name of an attribute of that entity.
The constraint in the example above respects this schema: left is form, operator is =, and right is "dogs". LCP allows corpus curators to define arbitrary attributes for each layer, but form is mandatory at the token level. This constraint states that we are looking for tokens whose surface form must be "dogs".
Annotation layers can come with any number of attributes. For example, the Document layer in this corpus has three attributes named date, title, keyWords and classCode. It is standard for tokens to also define an attribute named lemma. Had we written the constraint lemma = "dog" instead, we would have matched token occurrences whose surface form could be either "dog" or "dogs".
form and lemma are assumed to be strings. Accordingly, we surround the test value with double quotes; DQD does not accept strings surrounded by single quotes. You can also use forward slashes / to define unanchored regular expressions (use ^ and $ for anchoring purposes).Sequences
To look for a sequence of tokens rather than isolated tokens, one can use the keyword sequence.
Sets
By default, each entity produce one match for each occurrence in the corpus. For example, if one sentence contains three occurrences of "dogs", the results will report three distinct hits for that sentence. To prevent that behavior, one can use the keyword set.
Results
DQD queries come with a constraint part, and a results part. There are three types of results: