================
Revision history
================
This chapter describes improvements compared to earlier versions of cutplace.
Version 0.9.0, 2021-12-26
=========================
* Removed support for Python 2.
* Changed build tool to `poetry `_.
* Changed test tool to `pytest `_.
Version 0.8.9, 2021-12-25
=========================
This is the last version that supposedly works with Python 2. There is no
actual release for it because the Python build process and tools changed too
much to keep it working both with Python 2 and 3 with reasonable effort. The
next version will drop Python 2 and rework the build process.
* Fixed "field type part must be a single word" error in Python 3 because at
some point ``generate_tokens`` in the standard library startet adding a
spurious newline character at the end (issue
`#121 `_).
* Removed dependency to external argparse that caused headache with Python 3's
internal argparse.
* Updated several dependencies to last version that works with both Python 2
and Python 3.
* Removed ability to build the documentation with Python 2.
Version 0.8.8, 2015-11-13
=========================
* Changed development status to "Production/Stable" as cutplace has been
processing millions of data rows on a daily base for a couple of months
now.
* Improved validation of Excel dates and times. Fields of type
:ref:`field-format-datetime` now check the format specified by the rule
and in case it only contains date or time components only extracts those.
(API note: internally, Excel dates and times returned by the low level
function :py:meth:`cutplace.rowio.excel_rows()` still use the full
`YYYY-MM-DD hh:mm:ss` format. This change mostly concerns
:py:class:`cutplace.validio.Reader` and
:py:meth:`cutplace.fields.DateTimeFieldFormat.validated_value()`.)
* Cleaned up CID and data files for documentation, examples and tests (
issue `#107 `_). There
are fewer files now and they have multiple uses. Furthermore examples in
the documentation now match the CID's in the :file:`examples` folder.
Partially this can be attributed to parts of the documentation now
including RST files that are updated during the build process.
* Added Python 3.5 as a supported version.
Version 0.8.7, 2015-07-18
=========================
* Fixed that errors detected by field formats during declaration got
suppressed and typically resulted in more confusing errors later.
* Fixed :py:exc:`NameError` on platforms without :py:mod:`tkinter`.
* Improved documentation:
* Fixed missing example for own field and check in
:ref:`using-own-check-and-field-formats` (issue
`#33 `_).
* Improved API:
* Fixed that :py:meth:`cutplace.rows` also yielded header rows instead of
only data rows.
* Cleaned up API documentation (typos, links).
Version 0.8.6, 2015-07-14
=========================
* Fixed installation from source distribution by upgrading to
`Pyscaffold `_
2.2.1. Earlier versions used
`versioneer `_ for version
numbering and forgot to include ``versionneer.py`` in the distribution
archive. The current version uses
`setuptools_scm `_ which only
needs a clean dependency instead of additional source code (issue
`#108 `_).
* Added command line option :option:`--gui` to open a graphical user
interface for validation (issue
`#77 `_).
* Improved error message for broken field declarations in CIDs detected by
the field format constructor, which now include the name of the field and
the location in the CID.
* Improved error message when attempting to use a 0 byte as item delimiter in
CID's for delimited data. (Python's :py:mod:`csv` module fails with a
:py:exc:`TypeError` because the low level implementation is based on C).
* Improved API: Cleaned up :py:class:`cutplace.Cid`:
* Changed :py:meth:`cutplace.Cid.add_check` to require an
:py:class:`cutplace.fields.AbstractCheck` as parameter and added
:py:meth:`cutplace.Cid.add_check_row` to accept a row.
* Changed :py:meth:`cutplace.Cid.add_field_format` to require an
:py:class:`cutplace.fields.AbstractFieldFormat` as parameter and added
:py:meth:`cutplace.Cid.add_field_format_row` to accept a row.
Version 0.8.5, 2015-03-09
=========================
* Fixed data format property :ref:`header `, which was ignored (issue
`#93 `_).
* Added :ref:`constant-field` field format (issue
`#92 `_).
Version 0.8.4, 2015-03-01
=========================
* Fixed validated writing of header rows by disabling validation of
the header.
* Fixed reading of non ASCII values from ODS under Python 2.
* Fixed default decimal separator, which now is dot (``.``) instead of
comma (``,``). Interestingly enough in practice this never really mattered
as long as there was no thousands separator (the default), in which case a
decimal value using a dot as actual decimal separator simply preserved it
and got accepted anyway.
* Added rule to :ref:`field-format-decimal` fields which allows to specify a
range and precision (issue
`#10 `_, contributed by
Patrick Heuberger).
* Improved documentation: cleaned up section on :ref:`exit-code`.
* Improved API documentation: added a section on :ref:`writing-data`.
* Improved API: changed validation of length for fixed field values:
:py:class:`cutplace.Writer` rejects too long values with a
:py:exc:`~cutplace.errors.FieldValueError` and automatically pads too short
values with trailing blanks while the low level
:py:class:`cutplace.rowio.FixedRowWriter` rejects both cases with an
:py:exc:`AssertionError`. Furthermore the length of fixed values is now
checked before validating it against the field format rule.
Version 0.8.3, 2015-01-31
=========================
* Added option :option:`--until` to increase performance by skipping
validation of field format and checks after a specified number of rows
(issue `#86 `_).
* Fixed reading of Excel error cells.
* Improved API:
* Removed shortcuts for exceptions from :py:mod:`cutplace`. Use the
originals in :py:mod:`cutplace.errors` instead.
* Added convenience function :py:func:`cutplace.validate` and
:py:func:`cutplace.rows` to validate and read data with a
single line of source code.
* Added :py:class:`cutplace.Writer` for validated writing of delimited and
fixed data (issue
`#84 `_).
* Improved API documentation.
Version 0.8.2, 2015-01-19
=========================
* Changed syntax for ranges to prefer ellipsis (``...``) over colon (``:``)
because it expresses the intended meaning more clearly. The colon is still
supported so existing CIDs keep working, but the documentation and examples
use the new syntax.
* Improved error reporting when parsing CIDs. In particular all errors
related to the data format include a specific location, and some errors
provide more information about the context they occurred in.
* Cleaned up :option:`--help`:
* Removed description of obsolete option :option:`--cid-encoding`.
* Cleaned up option groups with only one option.
* Cleaned up sequence of options which is now sorted alphabetically.
* Cleaned up notes on :doc:`development` to reflect changes of 0.8.0.
Version 0.8.1, 2015-01-11
=========================
* Fixed ranges for `Integer` fields with a length of one digit, which caused
a :py:exc:`ValueError`.
* Added Python 2 support to universal wheel for distribution.
Version 0.8.0, 2015-01-11
=========================
This version is a major rework of the whole code base in order to to fix some
long standing bug and migrate it to Python 3.2+ while retaining support for
Python 2.6+. A big thank you goes to Patrick Heuberger, Jakob Neuberger and
Patrick Prohaska for doing this as a school project for
`HTL Wiener Neustadt `_.
In summary, the changes are:
* A few long standing bugs have finally been fixed, in particular:
* Fixed that command line client gets stuck on CID in ODS format
with syntax error
(`issue #46 `_)
* Fixed that delimited format fails when last char of field is escaped
(`issue #49 `_)
* Fixed ImportError: No module named xlrd
(`issue #50 `_)
* The documentation is now available at
http://cutplace.readthedocs.org/en/latest/.
* Cutplace interface definitions are now abbreviated as CID, replacing the
acronym ICD (interface control document). Nevertheless the file format remains
the same so existing data descriptions can be used as is.
* The distribution now uses the `wheel format `_
instead of egg. A source distribution is still available as ZIP.
Rarely used functionality that seemed a good idea to have at some time has
been removed. If you deem of these features critical, feel free to submit a
pull request or to open an
`issue `_ and
request a reimplementation:
* The :command:`cutplace` command line options :option:`--accept` and
:option:`--reject` are gone and all output options related to it. If you
still need a filter to build a file that preserves all valid rows and
removes rejected ones, a few line of Python code can do the trick::
from cutplace import Cid, Reader
cid = Cid('.../some_cid.ods')
reader = Reader(cid, '.../some_data.csv')
for row in reader.rows(on_error='continue'):
# Do something with ``row``.
pass
* The command line option :option:`--listencodings` is gone. Instead refer to
the
`standard encodings `_
listed in the Python documentation.
* The command line option :option:`--cid-encoding` is gone. If you need non
ASCII characters, use ODS format or CSV with UTF-8.
* The command line option :option:`--web` (and all related options) to launch
a small web server with a validation form is gone. Eventually there is
going to be a GUI client, refer to
`issue #77 `_.
* The tool :command:`cutsniff` to build a draft CID is gone as it only takes a few
minutes to build a draft anyway. Furthermore, the plain CSV results always needed
quite some work to get a more presentable format concerning layout and colors.
The API (see :ref:`modindex`) has been reworked too and is cleaner and more
pythonic now. The project structure applies most of the
`Simple Rules For Building Great Python Packages `_.
The basic project structure and build process are provided by
`Pyscaffold `_.
* All essential functions can be accessed after a simple ``import cutplace``. The
various sub modules are needed only for special requirements.
* All errors raised by ``cutplace`` are collected in :py:mod:`cutplace.errors`.
Version 0.7.1, 2012-05-20
=========================
* Changed error location of failed row checks to use the first column instead
of a number one past the actual number of columns (issue #42).
* Changed ``Pattern`` field format to allow shell patterns instead of only
simple DOS patterns (issue #37).
* Improved :command:`cutsniff`:
* Added sniffing of numeric fields (#48).
* Added first none empty field value as example.
* Moved project and repository to
(issue #47).
* Improved API:
* Added validating writer, see ``interface.Writer`` for more information
(issue #45).
* Added property ``example`` for ``*FieldFormat`` (issue #41).
* Cleaned up build and the section on "Jenkins" so that everything works
as described even if Jenkins runs as deamon with MacPorts.
Version 0.7.0, 2012-01-09
=========================
* Added command line option ``--plugins`` to specify a folder where cutplace
looks for plugins declaring additional field formats and checks. For
details, see :ref:`using-own-check-and-field-formats`.
* Changed ``interface.validatedRows(..., errors="yield")`` to yield
``tools.ErrorInfo`` in case of error instead of ``Exception``.
* Reduced memory foot print of CSV reading (Ticket #32). As a side effect,
all formats now read and validate in separate threads, which should
result in a slight performance improvement on systems with multiple CPU
cores.
* Cleaned up developer reports (Ticket #40). Most of the reports are now
built using Jenkins as described in "Jenkins", the only exception
being the profiler report to monitor performance. Also changed build
instructions to favor ``ant`` over ``setup.py``.
* Cleaned up API:
* :command:`cutplace` and :command:`cutsniff` have a similar ``main()`` that
returns an integer exit code without actually calling ``sys.exit()``.
* Cleaned up formatting to conform to PEP8 style.
Version 0.6.8, 2011-07-26
=========================
* Fixed "see also" location in error messages caused by ``IsUniqueCheck``
which used the current location as original location.
* Fixed ``AttributeError`` when using the API method
``AbstractFieldFormat.getFieldValueFor()``.
* Fixed ``ImportError`` during installation on systems lacking the Python
profiler.
Version 0.6.7, 2011-05-24
=========================
* Added option ``--names`` to :command:`cutsniff` to specify field names as comma
separated list of names. Without this option, the names found in the last
row specified by ``--head`` are used. Without this option, fields names will
have generated values the user manually will have to change in order to get
meaningful names.
Version 0.6.6, 2011-05-18
=========================
* Cleaned up debugging output.
Version 0.6.5, 2011-05-17
=========================
* Added command line option ``--header`` to :command:`cutsniff` to exclude header
rows from analysis.
* Fixed build error in case module coverage was not installed by making
coverage a required module again.
Version 0.6.4, 2011-03-19
=========================
* Added :command:`cutsniff`, a tool to create an ICD by analyzing an existing data
file.
* #21: Fixed automatic detection of Excel format when reading ICDs using the
web interface. (Tickte #21).
* Fixed ``AttributeError`` when data format was set to "delimited".
Version 0.6.3, 2010-10-25
=========================
* Fixed ``InterfaceControlDocument.checkNames`` which actually contained the
field names. Additionally, checkNames now contains the names in the order
they were declared in the ICD. Consequently the checks are performed in this
order during validation unlike until now, where the internal hashcode
decided the order of checks. (Ticket #35)
* Improved documentation, in particular:
* Added more information on writing field format and checks of your own. It
still lacks details on how to actually use these in an ICD though.
(Ticket #33)
* Cleaned up introductions of most chapters with the intention to make them
easier to comprehend.
* Changed public instance variables to properties. This allows to mark many of
them as read only, and also makes them show up in the API reference.
(Ticket #34).
Version 0.6.2, 2010-09-29
=========================
* Added input location for error messages caused by failed checks.
(Ticket #26, #27 and #28)
* Added error message if a field name is a Python keyword such as
``class`` or ``if``. This avoids strange error messages if later an
``IsUnique`` check refers to such a field. (Ticket #20)
* Changed style for error messages referring to locations in CSV, ODS
and Excel data to R1C1. For example, "R17C23" points to row 15,
column 23.
* Changed internal modules to use "_" as prefix in name. This removes them
from the API documentation. Furthermore, module ``tools`` has been split into
public ``tools`` and internal ``_tools``.
* Changed interface for listeners of validation events:
* Renamed `ValidationListener` to `BaseValidationListener`.
* Added parameter `location` to `acceptedRow()` which is of type
`tools.InputLocation`.
* Cleaned up API documentation, using reStructured Text as output format
and adding a tutorial in chapter :doc:`api`.
* Cleaned up logging to slightly improve performance.
Version 0.6.1, 2010-04-25
=========================
* Added data format properties "decimal delimiter" (default: ".") and
"thousands delimiter" (default: none). Fields of type `Decimal` take them
into account. See also: Ticket #24.
* Added detailed error locations to some errors detected when reading the
ICD.
* Changed choice fields to be case sensitive.
* Changed choice fields to support values in quotes. That way it is also
possible to use escape sequences within values. Values with non ASCII
characters (such as umlauts) have to be quotes now. See also: Ticket #25.
* Renamed module `cutplace.range` to `cutplace.ranges` to avoid name clash
with the built in Python function `range()`. In case you have an older
version of cutplace installed and plan to import the cutplace Python
module using::
from cutplace import * # ugly, avoid anyway
you will have to manually remove the files :file:`cutplace/range.py`
and :file:`cutplace/range.pyc` (in case it exists).
* Added API documentation available from
.
Version 0.6.0, 2010-03-29
=========================
* Changed license from GPL to LGPL so closed source application can import
the cutplace Python module.
* Fixed validation of empty dates with DateTime fields.
* Added support for letters, hex numbers and symbolic names in ranges.
* Added support for letters, escaped characters, hex numbers and symbolic
names in item delimiters for data formats.
* Added auto detection of item delimiters tab ("\\t", ASCII 9) and vertical
bar (|). [Josef Wolte]
* Cleaned up code for field validation.
Version 0.5.8, 2009-10-12
=========================
* Changed Unicode encoding errors to result in the row to be rejected similar
to a row with an invalid field instead of a simple message in the console.
* Changed command line exit code to 1 instead of 0 in case validation errors
were found in any data file specified.
* Changed command line exit code to 4 instead of 0 for errors that could not
be handled or reported otherwise (usually hinting at a bug in the code).
This case also results in a stack trace to be printed.
Version 0.5.7, 2009-09-07
=========================
* Fixed validation of empty Choice fields that according to the ICD were
allowed to be empty but nevertheless were rejected.
* Fixed a strange error when run using Jython 2.5.0 on certain platforms.
The exact message was: ``TypeError: 'type' object is not iterable``.
Version 0.5.6, 2009-08-19
=========================
* Added a short summary at the end of validation. Depending on the result,
this can be either for instance ``eggs.csv: accepted 123 rows`` or
``eggs.csv: rejected 7 of 123 rows. 2 final checks failed.``.
* Changed default for ``--log`` from``info`` to ``warning``.
* Improved confusing error message when a field value is rejected because
of improper length.
* Fixed ``ImportError`` when run using Jython 2.5, which does not support the
Python standard module ``webbrowser``. Attempting to use ``--browser`` will
result in an error message nevertheless.
Version 0.5.5, 2009-07-26
=========================
* Added summary to validation results shown by web interface.
* Fixed validation of Excel data using the web interface.
* Cleaned up reporting of errors not related to validation via web interface.
The resulting web page now is less cluttered and the HTTP result is a
consistent 40x error.
Version 0.5.4, 2009-07-21
=========================
* Fixed ``--split`` which did not actually write any files. (Ticket #19)
* Fixed encoding error when reading data from Excel files that used cell
formats of type data, error or time.
* Fixed validation of Decimal fields, which resulted in a
``NotImplementedError``.
* Fixed internal handling of ranges with a default, which resulted in a
``NameError``.
Version 0.5.3, 2009-07-18
=========================
* Added command line option ``--split`` to store accepted and rejected data in two
separated files. See also: ticket #17.
* Fixed handling of non ASCII data, which did not work properly with all
formats. Now cutplace consistently uses Unicode strings to internally
represent data items. See also: ticket #18.
* Improved error messages and removed stack trace in cases where it does not
add anything of value such as for I/O errors.
* Changed development status from alpha to beta.
Version 0.5.2, 2009-06-11
=========================
* Fixed missing setup script.
Version 0.5.1, 2009-06-11
=========================
* Added support for ICDs in Excel and ODS format for built in web server.
* Changed representation of integer number read from Excel data: instead
of for example "123.0" this now renders as "123".
* Improved memory usage for data and ICDs in ODS format.
* Fixed reading of ICDs in Excel and ODS format.
* Fixed TypeError when the CSV delimiters specified in the ICD were encoded
in Unicode.
Version 0.5.0, 2009-06-02
=========================
* Fixed handling of Excel numbers, dates and times. Refer to the
section on Excel data format for details.
* Changed order for field format (again): It now is
name/example/empty/length/type/rule instead of
name/example/empty/type/length/rule.
* Changed optional items for field format: now the field name is the
only thing required. If no type is specified, "Text" is used.
* Added a proper tutorial that starts with a very simple ICD and
improves it step by step. The old tutorial presented one huge ICD
and attempted to explain everything in it, which could easily
overwhelm the reader.
* Migrated documentation from DocBook to RestructuredText.
* Improved build and installation process (``setup.py``).
Version 0.4.4, 2009-05-23
=========================
* Fixed checks when validating more than one data file from the command line.
Until now the checks did preserve internal state information needed to
perform the check. For instance, IsUnique check remembered the keys of all
rows read so far. So when a data file contained a row with a key that already
showed up in an earlier data file, the check failed. To prevent this from
happening, ``validate()`` now resets all checks. See also: Ticket #9.
* Fixed detection of characters outside of the "Allowed characters" range.
Apparently this never worked until now.
* Fixed handling of empty choices consisting only of white space.
* Fixed detection of fixed fields without length.
* Fixed handling of white space in field items of fixed length data.
* Added plenty of test cases and consequently performed a couple of minor
fixes, improvements and clean ups.
Version 0.4.3, 2009-05-18
=========================
* Fixed auto detection of delimiters in a CSV file, which got broken when
switching to Python's built in CSV reader with version 0.3.1. See also:
Ticket #8.
Version 0.4.2, 2009-05-17
=========================
* Added validation for data format property "Allowed characters", which can be
used with all data formats.
* Added data format property "Header" to specify the number of header rows that
should be skipped without validation. This property can be used with all data
formats.
* Added data format property "Sheet" to specify the number of the sheet to
validate in spreadsheet data formats (Excel and ODS).
* Added complex ranges that consist of several sub ranges separated by a comma
(,). For example: "10:20, 30:40" means that a value must be between 10 and 20
or 30 and 40.
* Moved forums to http://apps.sourceforge.net/phpbb/cutplace/.
* Moved project site and issue tracker to
http://apps.sourceforge.net/trac/cutplace/.
* Fixed handling of data rows with too few or too many items.
* Cleaned up error handling and error messages.
Version 0.4.1, 2009-05-10
=========================
* Added support for Excel and ODS data formats.
Version 0.4.0, 2009-05-06
=========================
* Added support for ICDs stored in Excel format. In order for this to work, the
xlrd Python package needs to be installed. It is available from
http://pypi.python.org/pypi/xlrd.
* Changed ICD format: Inserted a new column after the field name and before the
field type that can contain an optional example value. This enables readers
to quickly grasp the meaning of a field by taking a glimpse at the first few
columns instead of having to "decipher" the field type and rule.
Version 0.3.1, 2009-05-03
=========================
* Added proper error messages for several possible error the user might make
when writing an ICD. So far these errors resulted into confusing messages
about failed assertions, attempted ``NoneType`` accesses and the like.
* Added requirement that field names in the ICD only use ASCII letters, digits
and underscore (_). This is necessary to prevent Python errors in checks that
refer to field values using Python variables, such as DistinctCount and
IsUnique.
* Changed CSV parser to use Python's built in one. This works around the
following issues:
- Improved performance when working with CSV data (about 4 times faster).
- Error when reading valid CSV data that contained nothing but a single item
separator.
However, it also introduces new issues:
- Increased memory usage when working with CSV data because ``csv.reader``
does not fit well with the ``AbstractParser`` class. Currently the whole
file is read into memory.
- Lack of any error detection in the CSV structure. For example, unclosed
quotes at the end or inconsistent line feeds do not raise any errors.
* On the long run, cutplace will need its own CSV parser. If only this would
not be so boring to code...
* Improved error messages for broken field names and types in the ICD.
Version 0.3.0, 2009-04-28
=========================
* Fixed error messages in case field name or type was missing in ICD.
* Fixed handling of percent sign (%) in ``DateTime`` field format.
* Changed syntax to specify ranges like field lengths or rules for ``Integer``
fields formats. Use ":" instead of "...".
There are basically two reasons for this change: Firstly, this looks more
Python-like and thus more consistent with other parts of the ICD like the
"Checks" section which also uses Python syntax in various places. Secondly,
this avoids issues with Excel which under certain circumstances changes the 3
characters in "..." to a single character ellipsis. Using ":" still is not
without issues though: if you use a spreadsheet application to author ICDs,
most of them think of a value like "1:60" (which could for example specify a
field length between 1 and 60 characters) to refer to a time of 1 hour and 60
minutes. To avoid any confusion, disable the cell format auto detection of
the spreadsheet application by changing all cells to contain "Text".
Version 0.2.2, 2009-04-07
=========================
* Added support to use data encodings other than ASCII by specifying them in
the data format section of the ICD using the encoding property.
* Added support for fixed data format.
* Added command line option ``--browse`` to be used together with ``--web`` in
order to open the validation page in the web browser.
* Added command line option ``--icd-encoding`` to specify the character encoding
to be used with ICDs in CSV format.
Version 0.2.1, 2009-03-29
=========================
* Added support for ICDs in ODS format for command line client.
* Added ``cutplace.exe`` for Windows, which will be generated during
installation.
* Added automatic installation of setuptools when you try to build cutplace
using the Subversion repository. This feature is provided by ``ez_setup.py``,
which is available from the setuptools site.
* Fixed cutplace script, which did exit with an ``ExitQuietlyOptionError`` for
options that just showed some information and exited (such as ``--help``).
Version 0.2.0, 2009-03-27
=========================
* Added option ``--web`` and ``--port`` to launch web server providing a simple
graphical user interface for validation.
* Changed ``--listencodings`` to ``--list-encodings``.
Version 0.1.2, 2009-03-22
=========================
* Added ``DistinctCount`` check.
* Added ``IsUnique`` check.
* Added command line option ``--trace``.
* Added support to validate an ICD when no data are specified in the command
line.
* Cleaned up error messages.
Version 0.1.1, 2009-03-17
=========================
* Initial release.