Package: backend.tweets¶

Submodules¶

datasets¶

Python Module to handle data loading

class backend.tweets.datasets.Dataset(name: str, data_format: str, filename: str)¶

Bases: object

(Data)Class encapsulating information for a single Dataset. Dataset instances will be saved into a global DATA_DICTIONARY (see below) acting as a proxy to access available datasets.

This mechanism acts as a surrogate for a Database and so potentially subject to change in the future.

name¶

Chosen dataset unique name

Type: str

data_format¶

Format of the data (e.g. CSV, GeoJSON)

Type: str

filename¶

Name of the datafile (including file extension)

Type: str

property data¶

Reads and returns a pd.DataFrame of the data.

Raises: NotImplementedError – When reading of the file type is not supported.

data_format: str = None¶

filename: str = None¶

property is_valid¶

Returns True if the file can be found at the source path provided.

Raises: FileNotFoundError: – When the file does not exist at the given path.

name: str = None¶

property source_path¶

Gets the source path for the file.

Raises: FileNotFoundError: – When the file does not exist at the given path.

backend.tweets.datasets.generate_la_keys(data_filename: str = 'la_keys.geojson')¶

Generate the Local Authorities Keys.

This function merges demographics and buondaries of Local Authorithies (as retrieved from “buondaries_LAs.geojson”) to generate an LA lookup table to be used for a quicker geographical location attribution for tweets.

backend.tweets.datasets.load_local_authorities() → backend.tweets.datasets.Dataset ¶: Load the Local Authorities Keys Dataset, or generate it if not found.

backend.tweets.datasets.load_tweets() → backend.tweets.datasets.Dataset ¶: Load the Tweets Dataset

pipelines¶

Module containing class for creating custom transformation pipelines, a series of functions used for transforming Twitter data, and a Twitter transformation pipeline.

class backend.tweets.pipelines.Pipeline¶

Bases: abc.ABC

ABC implementing the general template of a Dynamic Pipeline to process the Twitter dataset.

The pipeline is composed by a sequence of (name, function) pairs. Valid (Pipe) functions must comply to the following signature:

Callable[[pd.DataFrame], pd.DataFrame]

Therefore, only one parameter is expected, i.e. the dataset in the form of a pandas.DataFrame, and a DataFrame is returned (to be fed as input for the next step).

(Concrete) pipeline definitions are obtained via subclassing: steps are hard-coded so to have pre-defined and controllable behaviours. However, it is always possible to register extra steps into a pipeline via the register method.

apply(data: pandas.core.frame.DataFrame, verbosity: int = 0) → pandas.core.frame.DataFrame¶

Executes the pipeline on input data

Parameters

data (pd.DataFrame) – Input data to initialise the pipeline
verbosity (int, optional) – Controls the verbosity of the execution of the pipeline, by default 0 (no verbosity)

Returns

New copy of the data after the execution of all the steps of the pipeline.

Return type

pd.DataFrame

abstract create_pipeline() → List[Tuple[str, Callable[[pandas.core.frame.DataFrame], pandas.core.frame.DataFrame]]]¶

register(op: Tuple[str, Callable[[pandas.core.frame.DataFrame], pandas.core.frame.DataFrame]])¶

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

class backend.tweets.pipelines.TwitterPipeline¶

Bases: backend.tweets.pipelines.Pipeline

Implementation of the Pipeline class for reading and preparing tweets.

create_pipeline() → List[Tuple[str, Callable[[pandas.core.frame.DataFrame], pandas.core.frame.DataFrame]]]¶

backend.tweets.pipelines.convert_coordinates(data, col='geo.coordinates')¶: Takes a Pandas dataframe with a column geo.coordinates (col) and adds the lat and long to their own columns for easy conversion to geojson.

backend.tweets.pipelines.match_local_authorities(bbox: Sequence[Tuple[float, float]], la_df: pandas.core.frame.DataFrame, return_all: bool = False) → Union[Tuple[str, str, str], pandas.core.frame.DataFrame]¶

Get the Intersection Over Union for the the Local Authorities that overlap with the bounding box. Requires ‘geometry’ col in LA geopandas df. Returns df of local authorities of interest.

Parameters

bbox (BoundingBox) – Bounding box coordinates of the tweet
la_df (pd.DataFrame) – (pandas) DataFrame containing information of the Local Authorities (keys) to match
return_all (bool) – Flag controlling whether to return only the top matching local authorities or all of them (ranked by likelihood). By default, False.

Returns

A tuple containing the name, the code, and the reference of
the top matching LA, or all of them (in the form of
a pd.DataFrame)

backend.tweets.pipelines.match_reference_la(data)¶: Choose LA with highest likelihood. Add LA and LHB to dataset.