![]() |
Eclipse SUMO - Simulation of Urban MObility
|
Output formatter for Parquet output. More...
#include <ParquetFormatter.h>
Public Member Functions | |
| bool | closeTag (std::ostream &into, const std::string &comment="") |
| Closes the most recently opened tag. | |
| OutputFormatterType | getType () |
| Returns the type of formatter being used. | |
| void | openTag (std::ostream &into, const std::string &xmlElement) |
| Keeps track of an open XML tag by adding a new element to the stack. | |
| void | openTag (std::ostream &into, const SumoXMLTag &xmlElement) |
| Keeps track of an open XML tag by adding a new element to the stack. | |
| ParquetFormatter (const std::string &columnNames, const std::string &compression="", const int batchSize=1000000) | |
| Constructor. | |
| void | setExpectedAttributes (const SumoXMLAttrMask &expected, const int depth=2) |
| Set the expected attributes to write. This is used for tracking which attributes are expected in table like outputs. This should be not necessary but at least in the initial phase of implementing CSV and Parquet it helps a lot to track errors. | |
| template<> | |
| void | writeAttr (std::ostream &, const std::string &attr, const int &val) |
| template<class T > | |
| void | writeAttr (std::ostream &, const std::string &attr, const T &val) |
| template<> | |
| void | writeAttr (std::ostream &, const SumoXMLAttr attr, const int &val, const bool isNull) |
| template<class T > | |
| void | writeAttr (std::ostream &, const SumoXMLAttr attr, const T &val, const bool isNull=false) |
| writes a named attribute | |
| template<> | |
| void | writeAttr (std::ostream &into, const std::string &attr, const double &val) |
| template<> | |
| void | writeAttr (std::ostream &into, const SumoXMLAttr attr, const double &val, const bool isNull) |
| virtual void | writePadding (std::ostream &into, const std::string &val) |
| Writes some whitespace to format the output. This method is only implemented for XML output. | |
| virtual void | writePreformattedTag (std::ostream &into, const std::string &val) |
| Writes a preformatted tag to the device but ensures that any pending tags are closed. This method is only implemented for XML output. | |
| void | writeTime (std::ostream &into, const SumoXMLAttr attr, const SUMOTime val) |
| virtual bool | writeXMLHeader (std::ostream &into, const std::string &rootElement, const std::map< SumoXMLAttr, std::string > &attrs, bool writeMetadata, bool includeConfig) |
| Writes an XML header with optional configuration. | |
| bool | wroteHeader () const |
| Returns whether a header has been written. Useful to detect whether a file is being used by multiple sources. | |
| virtual | ~ParquetFormatter () |
| Destructor. | |
Private Member Functions | |
| void | checkAttr (const SumoXMLAttr attr) |
| template<class ATTR_TYPE , class BUILDER > | |
| void | checkBuilder (const ATTR_TYPE &attr, const std::shared_ptr< arrow::DataType > &(*dataType)()) |
| const std::string | getAttrString (const std::string &attrString) |
Private Attributes | |
| const int | myBatchSize |
| the number of rows to write per batch | |
| std::vector< std::shared_ptr< arrow::ArrayBuilder > > | myBuilders |
| the content array builders for the table | |
| bool | myCheckColumns = false |
| whether the columns should be checked for completeness | |
| parquet::Compression::type | myCompression = parquet::Compression::UNCOMPRESSED |
| the compression to use | |
| std::string | myCurrentTag |
| the currently read tag (only valid when generating the header) | |
| SumoXMLAttrMask | myExpectedAttrs |
| the attributes which are expected for a complete row (including null values) | |
| const std::string | myHeaderFormat |
| the format to use for the column names | |
| int | myMaxDepth = 2 |
| the maximum depth of the XML hierarchy | |
| bool | myNeedsWrite = false |
| whether there is still unwritten data | |
| std::unique_ptr< parquet::arrow::FileWriter > | myParquetWriter |
| the output stream writer | |
| std::shared_ptr< arrow::Schema > | mySchema = arrow::schema({}) |
| the table schema | |
| SumoXMLAttrMask | mySeenAttrs |
| the attributes already seen (including null values) | |
| const OutputFormatterType | myType |
| the type of formatter being used (XML, CSV, Parquet, etc.) | |
| std::vector< std::shared_ptr< arrow::Scalar > > | myValues |
| the current attribute / column values | |
| bool | myWroteHeader = false |
| whether the schema has been constructed completely | |
| std::vector< int > | myXMLStack |
| The number of attributes in the currently open XML elements. | |
Output formatter for Parquet output.
Definition at line 65 of file ParquetFormatter.h.
| ParquetFormatter::ParquetFormatter | ( | const std::string & | columnNames, |
| const std::string & | compression = "", |
||
| const int | batchSize = 1000000 |
||
| ) |
Constructor.
Definition at line 83 of file ParquetFormatter.cpp.
References myCompression, WRITE_ERRORF, and WRITE_WARNINGF.
|
inlinevirtual |
Destructor.
Definition at line 72 of file ParquetFormatter.h.
|
inlineprivate |
Definition at line 152 of file ParquetFormatter.h.
References myCheckColumns, myExpectedAttrs, myMaxDepth, mySeenAttrs, myXMLStack, TLF, and toString().
Referenced by writeAttr(), writeAttr(), and writeAttr().
|
inlineprivate |
Definition at line 162 of file ParquetFormatter.h.
References getAttrString(), myBuilders, myNeedsWrite, mySchema, myValues, myWroteHeader, and toString().
|
virtual |
Closes the most recently opened tag.
| [in] | into | The output stream to use |
Implements OutputFormatter.
Definition at line 131 of file ParquetFormatter.cpp.
References myBatchSize, myBuilders, myCheckColumns, myCompression, myExpectedAttrs, myMaxDepth, myNeedsWrite, myParquetWriter, mySchema, mySeenAttrs, myValues, myWroteHeader, myXMLStack, toString(), WRITE_ERRORF, and WRITE_WARNING.
|
inlineprivate |
Definition at line 137 of file ParquetFormatter.h.
References myCurrentTag, myHeaderFormat, and mySchema.
Referenced by checkBuilder().
|
inlineinherited |
Returns the type of formatter being used.
Definition at line 150 of file OutputFormatter.h.
References OutputFormatter::myType.
Referenced by OutputDevice::isXML(), OutputDevice::writeAttr(), OutputDevice::writeAttr(), OutputDevice::writeFuncAttr(), and OutputDevice::writeOptionalAttr().
|
virtual |
Keeps track of an open XML tag by adding a new element to the stack.
| [in] | into | The output stream to use (unused) |
| [in] | xmlElement | Name of element to open (unused) |
Implements OutputFormatter.
Definition at line 107 of file ParquetFormatter.cpp.
References myCurrentTag, myMaxDepth, myValues, myWroteHeader, myXMLStack, and WRITE_WARNINGF.
|
virtual |
Keeps track of an open XML tag by adding a new element to the stack.
| [in] | into | The output stream to use (unused) |
| [in] | xmlElement | Name of element to open (unused) |
Implements OutputFormatter.
Definition at line 119 of file ParquetFormatter.cpp.
References myCurrentTag, myMaxDepth, myValues, myWroteHeader, myXMLStack, toString(), and WRITE_WARNINGF.
|
inlinevirtual |
Set the expected attributes to write. This is used for tracking which attributes are expected in table like outputs. This should be not necessary but at least in the initial phase of implementing CSV and Parquet it helps a lot to track errors.
| [in] | expected | which attributes are to be written (at the deepest XML level) |
| [in] | depth | the maximum XML hierarchy depth (excluding the root) |
Reimplemented from OutputFormatter.
Definition at line 130 of file ParquetFormatter.h.
References myCheckColumns, myExpectedAttrs, and myMaxDepth.
|
inline |
Definition at line 267 of file ParquetFormatter.h.
References myCheckColumns, and myValues.
|
inline |
Definition at line 111 of file ParquetFormatter.h.
References myCheckColumns, myValues, and toString().
|
inline |
Definition at line 248 of file ParquetFormatter.h.
References checkAttr(), and myValues.
|
inline |
writes a named attribute
| [in] | attr | The attribute (name) |
| [in] | val | The attribute value |
| [in] | isNull | The given value is not set |
Definition at line 104 of file ParquetFormatter.h.
References checkAttr(), myValues, and toString().
Referenced by writeTime().
|
inline |
Definition at line 255 of file ParquetFormatter.h.
References myCheckColumns, and myValues.
|
inline |
Definition at line 236 of file ParquetFormatter.h.
References checkAttr(), myValues, SUMO_ATTR_X, and SUMO_ATTR_Y.
|
inlinevirtualinherited |
Writes some whitespace to format the output. This method is only implemented for XML output.
| [in] | into | The output stream to use |
| [in] | val | The whitespace |
Reimplemented in PlainXMLFormatter.
Definition at line 136 of file OutputFormatter.h.
References UNUSED_PARAMETER.
Referenced by OutputDevice::writePadding().
|
inlinevirtualinherited |
Writes a preformatted tag to the device but ensures that any pending tags are closed. This method is only implemented for XML output.
| [in] | into | The output stream to use |
| [in] | val | The preformatted data |
Reimplemented in PlainXMLFormatter.
Definition at line 125 of file OutputFormatter.h.
References UNUSED_PARAMETER.
Referenced by OutputDevice::writePreformattedTag().
|
inlinevirtual |
Implements OutputFormatter.
Definition at line 117 of file ParquetFormatter.h.
References gHumanReadableTime, myValues, STEPS2TIME, time2string(), and writeAttr().
|
inlinevirtualinherited |
Writes an XML header with optional configuration.
If something has been written (myXMLStack is not empty), nothing is written and false returned. The default implementation does nothing and returns false.
| [in] | into | The output stream to use |
| [in] | rootElement | The root element to use |
| [in] | attrs | Additional attributes to save within the rootElement |
| [in] | includeConfig | whether the current config should be included as XML comment |
Reimplemented in PlainXMLFormatter.
Definition at line 77 of file OutputFormatter.h.
References UNUSED_PARAMETER.
Referenced by OutputDevice::writeXMLHeader().
|
inlinevirtual |
Returns whether a header has been written. Useful to detect whether a file is being used by multiple sources.
Implements OutputFormatter.
Definition at line 126 of file ParquetFormatter.h.
References myWroteHeader.
|
private |
the number of rows to write per batch
Definition at line 192 of file ParquetFormatter.h.
Referenced by closeTag().
|
private |
the content array builders for the table
Definition at line 204 of file ParquetFormatter.h.
Referenced by checkBuilder(), and closeTag().
|
private |
whether the columns should be checked for completeness
Definition at line 219 of file ParquetFormatter.h.
Referenced by checkAttr(), closeTag(), setExpectedAttributes(), writeAttr(), writeAttr(), and writeAttr().
|
private |
the compression to use
Definition at line 189 of file ParquetFormatter.h.
Referenced by closeTag(), and ParquetFormatter().
|
private |
the currently read tag (only valid when generating the header)
Definition at line 195 of file ParquetFormatter.h.
Referenced by getAttrString(), openTag(), and openTag().
|
private |
the attributes which are expected for a complete row (including null values)
Definition at line 225 of file ParquetFormatter.h.
Referenced by checkAttr(), closeTag(), and setExpectedAttributes().
|
private |
the format to use for the column names
Definition at line 186 of file ParquetFormatter.h.
Referenced by getAttrString().
|
private |
the maximum depth of the XML hierarchy
Definition at line 213 of file ParquetFormatter.h.
Referenced by checkAttr(), closeTag(), openTag(), openTag(), and setExpectedAttributes().
|
private |
whether there is still unwritten data
Definition at line 222 of file ParquetFormatter.h.
Referenced by checkBuilder(), and closeTag().
|
private |
the output stream writer
Definition at line 201 of file ParquetFormatter.h.
Referenced by closeTag().
|
private |
the table schema
Definition at line 198 of file ParquetFormatter.h.
Referenced by checkBuilder(), closeTag(), and getAttrString().
|
private |
the attributes already seen (including null values)
Definition at line 228 of file ParquetFormatter.h.
Referenced by checkAttr(), and closeTag().
|
privateinherited |
the type of formatter being used (XML, CSV, Parquet, etc.)
Definition at line 168 of file OutputFormatter.h.
Referenced by OutputFormatter::getType().
|
private |
the current attribute / column values
Definition at line 210 of file ParquetFormatter.h.
Referenced by checkBuilder(), closeTag(), openTag(), openTag(), writeAttr(), writeAttr(), writeAttr(), writeAttr(), writeAttr(), writeAttr(), and writeTime().
|
private |
whether the schema has been constructed completely
Definition at line 216 of file ParquetFormatter.h.
Referenced by checkBuilder(), closeTag(), openTag(), openTag(), and wroteHeader().
|
private |
The number of attributes in the currently open XML elements.
Definition at line 207 of file ParquetFormatter.h.
Referenced by checkAttr(), closeTag(), openTag(), and openTag().