Package: backend.tweets¶
Submodules¶
datasets¶
Python Module to handle data loading
-
class
backend.tweets.datasets.
Dataset
(name: str, data_format: str, filename: str)¶ Bases:
object
(Data)Class encapsulating information for a single Dataset. Dataset instances will be saved into a global DATA_DICTIONARY (see below) acting as a proxy to access available datasets.
This mechanism acts as a surrogate for a Database and so potentially subject to change in the future.
-
name
¶ Chosen dataset unique name
- Type
str
-
data_format
¶ Format of the data (e.g. CSV, GeoJSON)
- Type
str
-
filename
¶ Name of the datafile (including file extension)
- Type
str
-
property
data
¶ Reads and returns a pd.DataFrame of the data.
- Raises
NotImplementedError – When reading of the file type is not supported.
-
data_format
: str = None¶
-
filename
: str = None¶
-
property
is_valid
¶ Returns True if the file can be found at the source path provided.
- Raises
FileNotFoundError: – When the file does not exist at the given path.
-
name
: str = None¶
-
property
source_path
¶ Gets the source path for the file.
- Raises
FileNotFoundError: – When the file does not exist at the given path.
-
-
backend.tweets.datasets.
generate_la_keys
(data_filename: str = 'la_keys.geojson')¶ Generate the Local Authorities Keys.
This function merges demographics and buondaries of Local Authorithies (as retrieved from “buondaries_LAs.geojson”) to generate an LA lookup table to be used for a quicker geographical location attribution for tweets.
Load the Local Authorities Keys Dataset, or generate it if not found.
-
backend.tweets.datasets.
load_tweets
() → backend.tweets.datasets.Dataset¶ Load the Tweets Dataset
pipelines¶
Module containing class for creating custom transformation pipelines, a series of functions used for transforming Twitter data, and a Twitter transformation pipeline.
-
class
backend.tweets.pipelines.
Pipeline
¶ Bases:
abc.ABC
ABC implementing the general template of a Dynamic Pipeline to process the Twitter dataset.
The pipeline is composed by a sequence of (name, function) pairs. Valid (Pipe) functions must comply to the following signature:
Callable[[pd.DataFrame], pd.DataFrame]
Therefore, only one parameter is expected, i.e. the dataset in the form of a pandas.DataFrame, and a DataFrame is returned (to be fed as input for the next step).
(Concrete) pipeline definitions are obtained via subclassing: steps are hard-coded so to have pre-defined and controllable behaviours. However, it is always possible to register extra steps into a pipeline via the register method.
-
apply
(data: pandas.core.frame.DataFrame, verbosity: int = 0) → pandas.core.frame.DataFrame¶ Executes the pipeline on input data
- Parameters
data (pd.DataFrame) – Input data to initialise the pipeline
verbosity (int, optional) – Controls the verbosity of the execution of the pipeline, by default 0 (no verbosity)
- Returns
New copy of the data after the execution of all the steps of the pipeline.
- Return type
pd.DataFrame
-
abstract
create_pipeline
() → List[Tuple[str, Callable[[pandas.core.frame.DataFrame], pandas.core.frame.DataFrame]]]¶
-
register
(op: Tuple[str, Callable[[pandas.core.frame.DataFrame], pandas.core.frame.DataFrame]])¶ Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
-
-
class
backend.tweets.pipelines.
TwitterPipeline
¶ Bases:
backend.tweets.pipelines.Pipeline
Implementation of the Pipeline class for reading and preparing tweets.
-
create_pipeline
() → List[Tuple[str, Callable[[pandas.core.frame.DataFrame], pandas.core.frame.DataFrame]]]¶
-
-
backend.tweets.pipelines.
convert_coordinates
(data, col='geo.coordinates')¶ Takes a Pandas dataframe with a column geo.coordinates (col) and adds the lat and long to their own columns for easy conversion to geojson.
Get the Intersection Over Union for the the Local Authorities that overlap with the bounding box. Requires ‘geometry’ col in LA geopandas df. Returns df of local authorities of interest.
- Parameters
bbox (BoundingBox) – Bounding box coordinates of the tweet
la_df (pd.DataFrame) – (pandas) DataFrame containing information of the Local Authorities (keys) to match
return_all (bool) – Flag controlling whether to return only the top matching local authorities or all of them (ranked by likelihood). By default, False.
- Returns
A tuple containing the name, the code, and the reference of
the top matching LA, or all of them (in the form of
a pd.DataFrame)
-
backend.tweets.pipelines.
match_reference_la
(data)¶ Choose LA with highest likelihood. Add LA and LHB to dataset.