Skip to content

FitLayout/fitlayout-python-client

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FitLayout - Python Client

(c) 2025 Radek Burget ([email protected])

The FitLayout client connects to the FitLayout REST API server and allows obtaining artifact data, performing SPARQL queries on the artifact repository, modifying the repository content, or running remote FitLayout services. Its primary purpose is the implementation of algorithms for analyzing web pages in Python, where FitLayout serves as a source of data about rendered web pages, including details about the appearance and layout of individual content elements.

Usage

FitLayout uses the RDF data model for representing the pages and other artifacts. It defines a set of ontologies for describing the individual artifacts. The Python client is based on RDFLib that provides the RDF API for Python. The main FitLayoutClient class provides several functions for obtaining the artifact data.

A simple usage of the API:

from flclient import FitLayoutClient, BOX

fl = FitLayoutClient("http://localhost:8400/api", "repositoryId")

# Get the IRIs of all rendered page in the repository
pageIris = fl.artifacts(BOX.Page)

# Get the first page IRI (as an example)
pageIri = next(pageIris)

# Get the RDF graph of the first page (a rdflib Graph object)
pageGraph = fl.get_artifact(pageIri)

# Print all properties of the page itself (excluding "pngImage" which is too large)
# This uses the RDFLib API for filtering the RDF triples.
for s, p, o in pageGraph.triples((pageIri, None, None)):
    prop = p.fragment # Omit the namespace from the property IRI
    if prop == "pngImage":
        continue
    print(f"{prop}: {o}")

# Get all content boxes from the page
for s, p, o in pageGraph.triples((None, BOX.belongsTo, pageIri)):
    boxIri = s
    # Get the box text (the box:text property)
    # See the ontology documentation for other box properties
    for bs, bp, bo in pageGraph.triples((boxIri, BOX.text, None)):
        print(f"Box: {boxIri}, text: {bo}")

The most efficient way to retrieve data for further analysis (e.g., machine learning) is to use SPARQL queries over the artifact repository. The following example uses one SPARQL query to retrieve all AreaTree artifacts and another SPARQL query to retrieve information about all distinguishable visual areas contained in a given area tree:

from flclient import FitLayoutClient, default_prefix_string

fl = FitLayoutClient("http://localhost:8400/api", "repositoryId")

# Find all AreaTree IRIs in the repository.
# It returns a CSV with a single column 'areaTreeIri'.
listQuery = default_prefix_string() + """
    SELECT ?areaTreeIri
    WHERE {
        ?areaTreeIri rdf:type segm:AreaTree
    }
"""

# Execute the SPARQL query.
area_tree_rows = fl.sparql(listQuery)

# Get the first AreaTree IRI (as an example)
area_tree_iri = next(area_tree_rows)['areaTreeIri']
print("Area tree IRI:", area_tree_iri)

# A SPARQL query for finding area properties within the specified area tree
# For each area, retrieve its background color, text color, font size, position, dimensions, and text (if any)
# See the FitLayout ontology documentation for more details on the properties and relationships used.
areaQuery = default_prefix_string() + """
    SELECT (?c AS ?uri) ?backgroundColor ?color ?fontSize ?x ?y ?w ?h ?text
    WHERE {
        ?c rdf:type segm:Area .
        ?c segm:belongsTo <""" + str(area_tree_iri) + """> .
        ?c segm:containsBox ?box
        OPTIONAL { ?c box:backgroundColor ?backgroundColor } .
        OPTIONAL { ?box box:color ?color } .
        ?c box:fontSize ?fontSize .
        OPTIONAL { ?c segm:text ?text } .
        ?c box:bounds ?b . 

        ?b box:positionX ?x .
        ?b box:positionY ?y .
        ?b box:width ?w .
        ?b box:height ?h .
    }
"""

# Execute the SPARQL query
results = fl.sparql(areaQuery)

# Process the returned results.
for row in results:
    print(row)

Server setup

The Python client requires a FitLayout server to be running and accessible. The easiest way to set up a server is to use the fitlayout-local docker image, which provides both the server and a web GUI for adding and post-processing new web pages to the repository.

About

Python client library for the FitLayout REST API

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published