Writing Glop Queries

Glop is a powerful query engine based on the Prolog programming language. It is intended primarily for users who already know Prolog. If you do not, you will probably find it easier to use Genie instead.

Table of Contents

Using Glop

Each Glop query is essentially a Prolog program. Because Glop allows you to use all of Prolog's functionality, Glop provides very powerful query capabilities for the database of experimental results. However, this flexibility also imposes restrictions. Since users could use Prolog to change the database, we cannot make an executable version of Glop available on our main server. Instead, you must download the necessary source files and data so you can run Glop on your own computer. For instructions on downloading, installing, and using Glop, click here.

How the Data is Stored

Glop uses the same data as Genie, but in a different format. If you have not familiarized yourself with the records, sets, and lists used by this database, please refer to Understanding the Schema. To look up the exact names of tables and fields currently in the database, see the current schema.

Selecting a Tuple from a Table

Each table is stored as a single predicate, with the set of tuples as its only argument. For example, the binding assay table could be abbreviated

binding_assay({[Tuple1|RemainingTuples]}).
The table name is written as a functor, and the set of tuples is stored as a Prolog list inside curly braces ("{ }") commonly used to denote sets. Each tuple in the table has the form
(Reference, Item_No, Description, Corrections, Contributor, Assay)

Query 1: To examine each tuple in the binding assay table, we could use the predicates member and findall like this:

/* 1 */    :-
/* 2 */    	binding_assay({BindingAssaySet}),
/* 3 */    	findall(
/* 4 */    		BindingAssayTuple,
/* 5 */    		member(BindingAssayTuple, BindingAssaySet), 
/* 6 */    		Results
/* 7 */    	),
/* 8 */    	report(binding_assay({Results})).

The line numbers are listed for reference only, and would not need to be included in the query. This unifies the variable BindingAssayTuple with one of the tuples in BindingAssaySet. After examining a tuple, findall uses backtracking to look at the next tuple in the table.

Identifying Fields

Glop identifies fields by both their name and their location within the tuple, as specified in the current schema. For example, Query 2 finds all of the articles published in 1996:

/* 1 */    :-	reference_info({RefSet}),
/* 2 */    	findall(
/* 3 */    		Reference,
/* 4 */    		(
/* 5 */    		    member(Reference, RefSet), 
/* 6 */    		    Reference = (_, _, _, _, _, year(1996), _, _, _)
/* 7 */    		),
/* 8 */    		Results
/* 9 */    	),
/* 10 */    	report(reference_info({Results})).
In line 6, the field name, year, is used as the functor of a structure containing the desired data value. Because year is the sixth of nine fields shown in the schema for the reference_info table, we make the year structure the sixth of nine arguments of an anonymous structure representing a reference_info tuple.

Specifying Data Values

Like Genie, Glop requires any string that contains spaces, carriage returns, or punctuation to be enclosed in single quotes, 'like this'. Any string that begins in a capital letter must also be enclosed in single quotes, since Prolog normally thinks that each word starting with capital letter is a variable. Glop is case sensitive, so 'A BIG Word' is not the same as 'a big word'. Numbers should not be in quotes.

Using Records

A record is simply a collection of subfields which are grouped together because they have some logical association with each other. Many records are defined as types in the Type Definition Macros section of the the current schema. For example, each region is defined as a REGION_DESCRIPTOR, which is defined in the schema as follows:

TYPEDEF REGION_DESCRIPTOR: RECORD (
	     origin: STRING  #FOREIGN_KEY origin_info
	     start:  INTEGER
	     stop:   INTEGER
	)  #RESTRICT stop >= start
Recall that anything following a '#' is a comment and is ignored by the computer. If we want to use the variables Or, St, and Sp to retrieve information about a region in a conserved tuple, our query would include the line:
Tuple = (_, _, _, _, _, region(origin('humhbb'), start(St), stop(Sp)), _),

Notice that region, the name of the field in the conserved table, becomes the functor of a structure representing the record. The type of the record, REGION_DESCRIPTOR, is not mentioned in the query.

Using Sets and Lists

In Glop, a set is represented as a Prolog list enclosed in curly braces, and a list is represented as a Prolog list enclosed in parentheses:

An example of a set: {[ASetElement | RemainingSetElements]}
An example of a list: ([ListItem1 | RemainingListItems])

To access the elements in the set (or list), instantiate a variable to the contents of the curly braces (or parentheses), and then use the variable anywhere that we could use a Prolog list.

Query 3: For example, suppose we want to find all DNA transfer experiments with a reversed orientation on any construct segment. We could write the query like this:

/* 1 */    :- 
/* 2 */    	dna_transfer_experiment({DNASet}),
/* 3 */    	findall(
/* 4 */    		DTExperiment,
/* 5 */    		(
/* 6 */    		    member(DTExperiment, DNASet),
/* 7 */    		    DTExperiment = (_, _, _, _, _, construct({ConstructSet}), _),
/* 8 */    		    member((construct_segment((CSegmentList)), _), ConstructSet),
/* 9 */    		    member(Segment, CSegmentList),
/* 10 */    		    Segment = (segment(_, orientation('reversed'), _))
/* 11 */    		),
/* 12 */    		Results
/* 13 */    	    ),
/* 14 */    	report(dna_transfer_experiment({Results})).  

The line numbers are provided for reference purposes only, and would not need to be included in the actual query. As usual, the predicates findall and member do most of the work in this query. Lines 2 and 6 work together to select a tuple from the set of tuples in the dna_transfer_experiment table. Line 7 retrieves the set of constructs. Each construct in the ConstructSet is a record with two fields, construct_segment and parent. We use the structure representing this record as the first argument of the member predicate in line 8 to instantiate CSegmentList to the list of construct segments. Line 9 selects one segment from this list, and line 10 discards the segment if its orientation is not 'reversed'. Finally, we use the report predicate to display the results.

Using Variants

When referring to a variant in Glop, we use a plus symbol and the variant tag as a functor to a structure containing the fields of the record.

Query 4 finds all binding assay tuples with gelshift assays where the probe overlaps the region corresponding to 8700- 8800 in the human sequence:

/* 1 */       :-
/* 2 */    	binding_assay({BindingAssaySet}),
/* 3 */    	findall(
/* 4 */    		BindingAssay,
/* 5 */    		(
/* 6 */    		    member(BindingAssay,BindingAssaySet),
/* 7 */    		    BindingAssay=(_,_,_,_,_,assay(+Assay)),
/* 8 */    		    Assay = gelshift(probe(+beta_g_region(Region, _)), _, _, _, _, _, _),
/* 9 */    		    overlaps(Region, region(origin(human), start(8700), stop(8800)))
/* 10 */    		),
/* 11 */    		BindingAssayList
/* 12 */    	),
/* 13 */    	report(binding_assay({BindingAssayList})).

This query involves two variant records, assay in lines 7 and 8, and probe in line 8. Notice that, in line 7, the variable Assay is instantiated to the tag name and variant record (gelshift(probe(+beta_g_region(Region, _)), _, _, _, _, _, _)), without the plus sign, so the plus sign is not needed in line 8. The predicates member, overlaps, and report are described below.

Useful General Predicates

When writing Glop queries, you can use any Prolog predicates, Glop predicates, or predicates you have written. This section describes the predicates that are used frequently in our sample queries.

member

The member(Element, List) predicate succeeds if the Element is in the List. The member predicate is defined in the Sicstus Prolog "lists" library. If you are using a different version of Prolog, defining your own member predicate should be straightforward.

findall

The predicate findall(Object, Goal, List) creates a List of all Objects that satisfy the specified Goal. If Object is a variable, it must be used in the Goal. If Object is a structure that contains variables, then at least one of those variables must be used in the Goal. The findall predicate is a standard part of Prolog.

The findall predicate is often used to examine each tuple in a table, ignoring tuples that do not meet certain criteria and adding tuples that do satisfy the criteria to the result set. When findall is used this way, Object is a variable representing a single tuple, Goal expresses the criteria, and List is a variable representing the set of results.

To select one tuple from the set of possibilities, Goal starts with the member predicate to instantiate Object to a specific tuple. Criteria to test the tuple follows member, as shown.

	binding_assay({BindingAssaySet}),
	findall
	    (
		Object,
		(
		    member(Object, BindingAssaySet), 
		    % include criteria here
		),
		List
	    ),

Useful Glop Predicates

The following predicates allow us to examine the data that is specific to this database. These predicates are implemented as part of Glop.

overlaps

The predicate overlaps(TestRegion, SpecifiedRegion) succeeds when the two regions have the same origin and the offsets overlap. The overlaps predicate works the same way as the OVERLAPS function in Genie.

	overlaps(Region, region(origin(human),start(8658),stop(8677)))

contains_region

The predicate contains_region(ContainerRegion, ContainedRegion) succeeds if the ContainerRegion completely contains the ContainedRegion. It is similar to the COVERS function in Genie.

	contains_region(region(origin(human),start(7750),stop(9230)),Region)

report

The predicate report(Table) displays the results of your query in the same format that Genie uses. If you would like to use any other programs on the Globin Gene Server to process your results, they need to be in this format. If you just want to see the results yourself, you can use another predicate to display the results.

When we create a query using findall, we get a set of tuples as an intermediate result. To convert this set to a table format, we create a structure whose functor is the table name and whose only argument is the variable containing the set of tuples (BindingAssayList) enclosed in curly braces representing a Glop set, as shown in the last line of Query 4:

/* 1 */       :-
/* 2 */    	binding_assay({BindingAssaySet}),
/* 3 */    	findall(
/* 4 */    		BindingAssay,
/* 5 */    		(
/* 6 */    		    member(BindingAssay,BindingAssaySet),
/* 7 */    		    BindingAssay=(_,_,_,_,_,assay(+Assay)),
/* 8 */    		    Assay = gelshift(probe(+beta_g_region(Region, _)), _, _, _, _, _, _),
/* 9 */    		    overlaps(Region, region(origin(human), start(8700), stop(8800)))
/* 10 */    		),
/* 11 */    		BindingAssayList
/* 12 */    	),
/* 13 */    	report(binding_assay({BindingAssayList})).

Additional Hints

Outline for Writing One-Table Queries

To write a Glop query involving one table, you can follow the outline below. To illustrate, each step is followed by the numbers of the lines the accomplish that step in Query 5.

  • set up criteria, (Lines 1 - 14)
  • specify the query, (Lines 16 - 26)
  • print the results. (Line 27)

    Adding Predicates to Express Criteria

    When writing complex queries, it is often helpful to write predicates expressing the criteria. The following query adds on to Query 4.

    Query 5: Retrieve all binding assay tuples where the probe (for gelshifts) or one of the regional effects (for non-gelshifts) overlaps the region corresponding to 8700-8800 in the human sequence.

    With this query, we check the probe or regional effect, depending on the assay. Assay is a variant record, so different attributes can be stored for each type of assay. Notice that regional_effect is the third attribute of in_vivo_footprint assays, but the seventh attribute of in_vitro_footprint and methylation_interference assays.

    /* 1 */      q(gelshift(probe(+beta_g_region(Region,_)),_,_,_,_,_,_)) :-
    /* 2 */    	overlaps(Region,region(origin(human),start(8700),stop(8800))).
    /* 3 */      q(in_vitro_footprint(_,_,_,_,_,_,regional_effect({RegionalEffectSet}),_)) :-
    /* 4 */    	member((Region,_,_,_,_),RegionalEffectSet),
    /* 5 */    	overlaps(Region,region(origin(human),start(8700),stop(8800))),
    /* 6 */    	!.
    /* 7 */      q(methylation_interference(_,_,_,_,_,_,regional_effect({RegionalEffectSet}),_)) :-
    /* 8 */    	member((Region,_,_,_,_),RegionalEffectSet),
    /* 9 */    	overlaps(Region,region(origin(human),start(8700),stop(8800))),
    /* 10 */    	!.
    /* 11 */      q(in_vivo_footprint(_,_,regional_effect({RegionalEffectSet}))) :-
    /* 12 */    	member((Region,_,_,_,_),RegionalEffectSet),
    /* 13 */    	overlaps(Region,region(origin(human),start(8700),stop(8800))),
    /* 14 */    	!.
    /* 15 */    
    /* 16 */      :-
    /* 17 */    	binding_assay({BindingAssaySet}),
    /* 18 */    	findall(
    /* 19 */    		BindingAssay,
    /* 20 */    		(
    /* 21 */    			member(BindingAssay,BindingAssaySet),
    /* 22 */    			BindingAssay=(_,_,_,_,_,assay(+Assay)),
    /* 23 */    			q(Assay)
    /* 24 */    		),
    /* 25 */    		BindingAssayList
    /* 26 */    	),
    /* 27 */    	report(binding_assay({BindingAssayList})).
    
    More Hints

    Another Example: Query 6

    Retrieve all DNA transfer experiment tuples having in one of their constructs a segment with a beta-globin region that overlaps the AP1 region (corresponding to 8658-8677 in the human sequence) and is completely contained in the HS2 region (corresponding to 7750-9230 in the human sequence), but no other segments overlap the LCR (corresponding to anything < 15,000 in the human sequence). In addition, one of the segments in this construct must feature a particular gene (e.g. beta-globin) as a reporter.

    This query follows the same outline as Query 5. It uses four predicates to express the criteria:

    q
    succeeds if the ConstructSet contains a ConstructSegmentList for which q1 succeeds, q2 fails, and q3 succeeds.
    q1
    succeeds if the ConstructSegmentList contains a segment whose beta-globin region overlaps the AP1 region and is completely contained in the HS2 region
    q2
    succeeds if the ConstructSegmentList contains a segment whose beta-globin region overlaps the LCR
    q3
    succeeds if the ConstructSegmentList features a beta-globin reporter.
    /* 1 */    q(ConstructSet) :-
    /* 2 */    	member((construct_segment(ConstructSegmentList),_),ConstructSet),
    /* 3 */    	q1(ConstructSegmentList),
    /* 4 */    	findall(_,q2(ConstructSegmentList),AnswerList), length(AnswerList,1),
    /* 5 */    	q3(ConstructSegmentList), !.
    /* 6 */    
    /* 7 */    q1(ConstructSegmentList) :-
    /* 8 */    	member(segment(dna_fragment(+beta_g_region(Region,_)),_,_),ConstructSegmentList),
    /* 9 */    	overlaps(Region,region(origin(human),start(8658),stop(8677))),
    /* 10 */    	contains_region(region(origin(human),start(7750),stop(9230)),Region), !.
    /* 11 */    
    /* 12 */    q2(ConstructSegmentList) :-
    /* 13 */    	member(segment(dna_fragment(+beta_g_region(Region,_)),_,_),ConstructSegmentList),
    /* 14 */    	overlaps(Region,region(origin(human),start(-999999),stop(15000))).
    /* 15 */    
    /* 16 */    q3(ConstructSegmentList) :-
    /* 17 */    	member(segment(_,_,feature({FeatureSet})),ConstructSegmentList),
    /* 18 */    	member(feature_element(+reporter(gene(beta_globin))),FeatureSet), !.
    /* 19 */    
    /* 20 */    :-
    /* 21 */    	dna_transfer_experiment({DTESet}),
    /* 22 */    	findall(
    /* 23 */    		DTE,
    /* 24 */    		(
    /* 25 */    			member(DTE,DTESet),
    /* 26 */    			DTE=(_,_,_,_,_,construct({ConstructSet}),_),
    /* 27 */    			q(ConstructSet)
    /* 29 */    		),
    /* 30 */    		DTEList
    /* 31 */    	), 
    /* 32 */    	report(dna_transfer_experiment({DTEList})).
    
    


    More Hints

    Creating a Custom Format for Your Results

    Instead of using the report predicate to display results as Genie would, you can define your own predicates to provide the information you want in an easy-to-read format. Query 7 defines five predicates to display a summary of the data.

    Counting

    Counting is not directly supported by Glop, but you can use Prolog to count tuples or data values. Query 7 uses predicates from Sicstus prolog's list library to accomplish its counting tasks.


    Query 7

    Query Overview

    For each reference, count the number of tuples and
    (a) show number of DNA transfer experiments and constructs; and
    (b) show number of binding assays and assay types.

    This query involves the following steps: