Eclipse SUMO - Simulation of Urban MObility
Loading...
Searching...
No Matches
ParquetFormatter Class Reference

Output formatter for Parquet output. More...

#include <ParquetFormatter.h>

Inheritance diagram for ParquetFormatter:
[legend]
Collaboration diagram for ParquetFormatter:
[legend]

Public Member Functions

bool closeTag (std::ostream &into, const std::string &comment="")
 Closes the most recently opened tag.
 
OutputFormatterType getType ()
 Returns the type of formatter being used.
 
void openTag (std::ostream &into, const std::string &xmlElement)
 Keeps track of an open XML tag by adding a new element to the stack.
 
void openTag (std::ostream &into, const SumoXMLTag &xmlElement)
 Keeps track of an open XML tag by adding a new element to the stack.
 
 ParquetFormatter (const std::string &columnNames, const std::string &compression="", const int batchSize=1000000)
 Constructor.
 
void setExpectedAttributes (const SumoXMLAttrMask &expected, const int depth=2)
 Set the expected attributes to write. This is used for tracking which attributes are expected in table like outputs. This should be not necessary but at least in the initial phase of implementing CSV and Parquet it helps a lot to track errors.
 
template<>
void writeAttr (std::ostream &, const std::string &attr, const int &val)
 
template<class T >
void writeAttr (std::ostream &, const std::string &attr, const T &val)
 
template<>
void writeAttr (std::ostream &, const SumoXMLAttr attr, const int &val, const bool isNull)
 
template<class T >
void writeAttr (std::ostream &, const SumoXMLAttr attr, const T &val, const bool isNull=false)
 writes a named attribute
 
template<>
void writeAttr (std::ostream &into, const std::string &attr, const double &val)
 
template<>
void writeAttr (std::ostream &into, const SumoXMLAttr attr, const double &val, const bool isNull)
 
virtual void writePadding (std::ostream &into, const std::string &val)
 Writes some whitespace to format the output. This method is only implemented for XML output.
 
virtual void writePreformattedTag (std::ostream &into, const std::string &val)
 Writes a preformatted tag to the device but ensures that any pending tags are closed. This method is only implemented for XML output.
 
void writeTime (std::ostream &into, const SumoXMLAttr attr, const SUMOTime val)
 
virtual bool writeXMLHeader (std::ostream &into, const std::string &rootElement, const std::map< SumoXMLAttr, std::string > &attrs, bool writeMetadata, bool includeConfig)
 Writes an XML header with optional configuration.
 
bool wroteHeader () const
 Returns whether a header has been written. Useful to detect whether a file is being used by multiple sources.
 
virtual ~ParquetFormatter ()
 Destructor.
 

Private Member Functions

void checkAttr (const SumoXMLAttr attr)
 
const std::string getAttrString (const std::string &attrString)
 

Private Attributes

const int myBatchSize
 the number of rows to write per batch
 
std::vector< std::shared_ptr< arrow::ArrayBuilder > > myBuilders
 the content array builders for the table
 
bool myCheckColumns = false
 whether the columns should be checked for completeness
 
parquet::Compression::type myCompression = parquet::Compression::UNCOMPRESSED
 the compression to use
 
std::string myCurrentTag
 the currently read tag (only valid when generating the header)
 
SumoXMLAttrMask myExpectedAttrs
 the attributes which are expected for a complete row (including null values)
 
const std::string myHeaderFormat
 the format to use for the column names
 
int myMaxDepth = 0
 the maximum depth of the XML hierarchy
 
std::unique_ptr< parquet::arrow::FileWriter > myParquetWriter
 the output stream writer
 
std::shared_ptr< arrow::Schema > mySchema = arrow::schema({})
 the table schema
 
SumoXMLAttrMask mySeenAttrs
 the attributes already seen (including null values)
 
const OutputFormatterType myType
 the type of formatter being used (XML, CSV, Parquet, etc.)
 
std::vector< std::shared_ptr< arrow::Scalar > > myValues
 the current attribute / column values
 
bool myWroteHeader = false
 whether the schema has been constructed completely
 
std::vector< int > myXMLStack
 The number of attributes in the currently open XML elements.
 

Detailed Description

Output formatter for Parquet output.

Definition at line 65 of file ParquetFormatter.h.

Constructor & Destructor Documentation

◆ ParquetFormatter()

ParquetFormatter::ParquetFormatter ( const std::string &  columnNames,
const std::string &  compression = "",
const int  batchSize = 1000000 
)

Constructor.

Definition at line 83 of file ParquetFormatter.cpp.

References myCompression, WRITE_ERRORF, and WRITE_WARNINGF.

◆ ~ParquetFormatter()

virtual ParquetFormatter::~ParquetFormatter ( )
inlinevirtual

Destructor.

Definition at line 72 of file ParquetFormatter.h.

Member Function Documentation

◆ checkAttr()

void ParquetFormatter::checkAttr ( const SumoXMLAttr  attr)
inlineprivate

Definition at line 161 of file ParquetFormatter.h.

References myCheckColumns, myExpectedAttrs, myMaxDepth, mySeenAttrs, myXMLStack, TLF, and toString().

Referenced by writeAttr(), writeAttr(), and writeAttr().

Here is the caller graph for this function:

◆ closeTag()

bool ParquetFormatter::closeTag ( std::ostream &  into,
const std::string &  comment = "" 
)
virtual

Closes the most recently opened tag.

Parameters
[in]intoThe output stream to use
Returns
Whether a further element existed in the stack and could be closed
Todo:
it is not verified that the topmost element was closed

Implements OutputFormatter.

Definition at line 131 of file ParquetFormatter.cpp.

References myBatchSize, myBuilders, myCheckColumns, myCompression, myExpectedAttrs, myMaxDepth, myParquetWriter, mySchema, mySeenAttrs, myValues, myWroteHeader, myXMLStack, toString(), WRITE_ERRORF, and WRITE_WARNING.

◆ getAttrString()

const std::string ParquetFormatter::getAttrString ( const std::string &  attrString)
inlineprivate

Definition at line 146 of file ParquetFormatter.h.

References myCurrentTag, myHeaderFormat, and mySchema.

Referenced by writeAttr(), writeAttr(), writeAttr(), writeAttr(), writeAttr(), writeAttr(), and writeTime().

Here is the caller graph for this function:

◆ getType()

OutputFormatterType OutputFormatter::getType ( )
inlineinherited

Returns the type of formatter being used.

Returns
the formatter type

Definition at line 150 of file OutputFormatter.h.

References OutputFormatter::myType.

Referenced by OutputDevice::writeAttr(), OutputDevice::writeAttr(), OutputDevice::writeFuncAttr(), and OutputDevice::writeOptionalAttr().

Here is the caller graph for this function:

◆ openTag() [1/2]

void ParquetFormatter::openTag ( std::ostream &  into,
const std::string &  xmlElement 
)
virtual

Keeps track of an open XML tag by adding a new element to the stack.

Parameters
[in]intoThe output stream to use (unused)
[in]xmlElementName of element to open (unused)
Returns
The OutputDevice for further processing

Implements OutputFormatter.

Definition at line 107 of file ParquetFormatter.cpp.

References myCurrentTag, myMaxDepth, myValues, myWroteHeader, myXMLStack, and WRITE_WARNINGF.

◆ openTag() [2/2]

void ParquetFormatter::openTag ( std::ostream &  into,
const SumoXMLTag xmlElement 
)
virtual

Keeps track of an open XML tag by adding a new element to the stack.

Parameters
[in]intoThe output stream to use (unused)
[in]xmlElementName of element to open (unused)

Implements OutputFormatter.

Definition at line 119 of file ParquetFormatter.cpp.

References myCurrentTag, myMaxDepth, myValues, myWroteHeader, myXMLStack, toString(), and WRITE_WARNINGF.

◆ setExpectedAttributes()

void ParquetFormatter::setExpectedAttributes ( const SumoXMLAttrMask expected,
const int  depth = 2 
)
inlinevirtual

Set the expected attributes to write. This is used for tracking which attributes are expected in table like outputs. This should be not necessary but at least in the initial phase of implementing CSV and Parquet it helps a lot to track errors.

Parameters
[in]expectedwhich attributes are to be written (at the deepest XML level)
[in]depththe maximum XML hierarchy depth (excluding the root)

Reimplemented from OutputFormatter.

Definition at line 139 of file ParquetFormatter.h.

References myCheckColumns, myExpectedAttrs, and myMaxDepth.

◆ writeAttr() [1/6]

template<>
void ParquetFormatter::writeAttr ( std::ostream &  ,
const std::string &  attr,
const int &  val 
)
inline

◆ writeAttr() [2/6]

template<class T >
void ParquetFormatter::writeAttr ( std::ostream &  ,
const std::string &  attr,
const T &  val 
)
inline

◆ writeAttr() [3/6]

template<>
void ParquetFormatter::writeAttr ( std::ostream &  ,
const SumoXMLAttr  attr,
const int &  val,
const bool  isNull 
)
inline

◆ writeAttr() [4/6]

template<class T >
void ParquetFormatter::writeAttr ( std::ostream &  ,
const SumoXMLAttr  attr,
const T &  val,
const bool  isNull = false 
)
inline

writes a named attribute

Parameters
[in]attrThe attribute (name)
[in]valThe attribute value
[in]isNullThe given value is not set

Definition at line 104 of file ParquetFormatter.h.

References checkAttr(), getAttrString(), myBuilders, mySchema, myValues, myWroteHeader, and toString().

Referenced by writeTime().

Here is the caller graph for this function:

◆ writeAttr() [5/6]

template<>
void ParquetFormatter::writeAttr ( std::ostream &  into,
const std::string &  attr,
const double &  val 
)
inline

◆ writeAttr() [6/6]

template<>
void ParquetFormatter::writeAttr ( std::ostream &  into,
const SumoXMLAttr  attr,
const double &  val,
const bool  isNull 
)
inline

◆ writePadding()

virtual void OutputFormatter::writePadding ( std::ostream &  into,
const std::string &  val 
)
inlinevirtualinherited

Writes some whitespace to format the output. This method is only implemented for XML output.

Parameters
[in]intoThe output stream to use
[in]valThe whitespace

Reimplemented in PlainXMLFormatter.

Definition at line 136 of file OutputFormatter.h.

References UNUSED_PARAMETER.

Referenced by OutputDevice::writePadding().

Here is the caller graph for this function:

◆ writePreformattedTag()

virtual void OutputFormatter::writePreformattedTag ( std::ostream &  into,
const std::string &  val 
)
inlinevirtualinherited

Writes a preformatted tag to the device but ensures that any pending tags are closed. This method is only implemented for XML output.

Parameters
[in]intoThe output stream to use
[in]valThe preformatted data

Reimplemented in PlainXMLFormatter.

Definition at line 125 of file OutputFormatter.h.

References UNUSED_PARAMETER.

Referenced by OutputDevice::writePreformattedTag().

Here is the caller graph for this function:

◆ writeTime()

void ParquetFormatter::writeTime ( std::ostream &  into,
const SumoXMLAttr  attr,
const SUMOTime  val 
)
inlinevirtual

◆ writeXMLHeader()

virtual bool OutputFormatter::writeXMLHeader ( std::ostream &  into,
const std::string &  rootElement,
const std::map< SumoXMLAttr, std::string > &  attrs,
bool  writeMetadata,
bool  includeConfig 
)
inlinevirtualinherited

Writes an XML header with optional configuration.

If something has been written (myXMLStack is not empty), nothing is written and false returned. The default implementation does nothing and returns false.

Parameters
[in]intoThe output stream to use
[in]rootElementThe root element to use
[in]attrsAdditional attributes to save within the rootElement
[in]includeConfigwhether the current config should be included as XML comment
Returns
whether something has been written

Reimplemented in PlainXMLFormatter.

Definition at line 77 of file OutputFormatter.h.

References UNUSED_PARAMETER.

Referenced by OutputDevice::writeXMLHeader().

Here is the caller graph for this function:

◆ wroteHeader()

bool ParquetFormatter::wroteHeader ( ) const
inlinevirtual

Returns whether a header has been written. Useful to detect whether a file is being used by multiple sources.

Returns
Whether a header has been written

Implements OutputFormatter.

Definition at line 135 of file ParquetFormatter.h.

References myWroteHeader.

Field Documentation

◆ myBatchSize

const int ParquetFormatter::myBatchSize
private

the number of rows to write per batch

Definition at line 177 of file ParquetFormatter.h.

Referenced by closeTag().

◆ myBuilders

std::vector<std::shared_ptr<arrow::ArrayBuilder> > ParquetFormatter::myBuilders
private

the content array builders for the table

Definition at line 189 of file ParquetFormatter.h.

Referenced by closeTag(), writeAttr(), writeAttr(), writeAttr(), writeAttr(), writeAttr(), writeAttr(), and writeTime().

◆ myCheckColumns

bool ParquetFormatter::myCheckColumns = false
private

whether the columns should be checked for completeness

Definition at line 204 of file ParquetFormatter.h.

Referenced by checkAttr(), closeTag(), setExpectedAttributes(), writeAttr(), writeAttr(), and writeAttr().

◆ myCompression

parquet::Compression::type ParquetFormatter::myCompression = parquet::Compression::UNCOMPRESSED
private

the compression to use

Definition at line 174 of file ParquetFormatter.h.

Referenced by closeTag(), and ParquetFormatter().

◆ myCurrentTag

std::string ParquetFormatter::myCurrentTag
private

the currently read tag (only valid when generating the header)

Definition at line 180 of file ParquetFormatter.h.

Referenced by getAttrString(), openTag(), and openTag().

◆ myExpectedAttrs

SumoXMLAttrMask ParquetFormatter::myExpectedAttrs
private

the attributes which are expected for a complete row (including null values)

Definition at line 207 of file ParquetFormatter.h.

Referenced by checkAttr(), closeTag(), and setExpectedAttributes().

◆ myHeaderFormat

const std::string ParquetFormatter::myHeaderFormat
private

the format to use for the column names

Definition at line 171 of file ParquetFormatter.h.

Referenced by getAttrString().

◆ myMaxDepth

int ParquetFormatter::myMaxDepth = 0
private

the maximum depth of the XML hierarchy

Definition at line 198 of file ParquetFormatter.h.

Referenced by checkAttr(), closeTag(), openTag(), openTag(), and setExpectedAttributes().

◆ myParquetWriter

std::unique_ptr<parquet::arrow::FileWriter> ParquetFormatter::myParquetWriter
private

the output stream writer

Definition at line 186 of file ParquetFormatter.h.

Referenced by closeTag().

◆ mySchema

std::shared_ptr<arrow::Schema> ParquetFormatter::mySchema = arrow::schema({})
private

the table schema

Definition at line 183 of file ParquetFormatter.h.

Referenced by closeTag(), getAttrString(), writeAttr(), writeAttr(), writeAttr(), writeAttr(), writeAttr(), writeAttr(), and writeTime().

◆ mySeenAttrs

SumoXMLAttrMask ParquetFormatter::mySeenAttrs
private

the attributes already seen (including null values)

Definition at line 210 of file ParquetFormatter.h.

Referenced by checkAttr(), and closeTag().

◆ myType

const OutputFormatterType OutputFormatter::myType
privateinherited

the type of formatter being used (XML, CSV, Parquet, etc.)

Definition at line 168 of file OutputFormatter.h.

Referenced by OutputFormatter::getType().

◆ myValues

std::vector<std::shared_ptr<arrow::Scalar> > ParquetFormatter::myValues
private

the current attribute / column values

Definition at line 195 of file ParquetFormatter.h.

Referenced by closeTag(), openTag(), openTag(), writeAttr(), writeAttr(), writeAttr(), writeAttr(), writeAttr(), writeAttr(), and writeTime().

◆ myWroteHeader

bool ParquetFormatter::myWroteHeader = false
private

whether the schema has been constructed completely

Definition at line 201 of file ParquetFormatter.h.

Referenced by closeTag(), openTag(), openTag(), writeAttr(), writeAttr(), writeAttr(), writeAttr(), writeAttr(), writeAttr(), writeTime(), and wroteHeader().

◆ myXMLStack

std::vector<int> ParquetFormatter::myXMLStack
private

The number of attributes in the currently open XML elements.

Definition at line 192 of file ParquetFormatter.h.

Referenced by checkAttr(), closeTag(), openTag(), and openTag().


The documentation for this class was generated from the following files: