Package DateHandler :: Module _DateParser :: Class DateParser
[frames] | no frames]

Class DateParser

source code


Convert a text string into a Date object. If the date cannot be converted, the text string is assigned.

Instance Methods
 
__init__(self) source code
 
re_longest_first(self, keys)
returns a string for a RE group which contains the given keys sorted so that longest keys match first.
source code
 
init_strings(self)
This method compiles regular expression strings for matching dates.
source code
 
match_calendar(self, text, cal)
Try parsing calendar.
source code
 
match_quality(self, text, qual)
Try matching quality.
source code
 
match_span(self, text, cal, qual, date)
Try matching span date.
source code
 
match_range(self, text, cal, qual, date)
Try matching range date.
source code
 
match_bce(self, text)
Try matching BCE qualifier.
source code
 
match_modifier(self, text, cal, qual, bc, date)
Try matching date with modifier.
source code
 
set_date(self, date, text)
Parses the text and sets the date according to the parsing.
source code
 
invert_year(self, subdate) source code
 
parse(self, text)
Parses the text, returning a Date object.
source code
Class Variables
  month_to_int = {u'10ber': 12, u'10bre': 12, u'10bris': 12, u'7...
  modifier_to_int = {'about': 3, 'abt': 3, 'abt.': 3, 'aft': 2, ...
  modifier_after_to_int = {}
  hebrew_to_int = {'adari': 6, 'adarii': 7, 'av': 12, 'elul': 13...
  french_to_int = {u'brumaire': 2, u'extra': 13, u'floréal': 8, ...
  islamic_to_int = {'dhu hijja': 12, 'dhu l-hijja': 12, 'dhu l-q...
  persian_to_int = {'aban': 8, 'azar': 9, 'bahman': 11, 'dey': 1...
  bce = ['B.C.E.', 'B.C.E', 'B.C.', 'BCE', 'B.C', 'BC']
  calendar_to_int = {'f': 3, 'french': 3, 'french republican': 3...
  quality_to_int = {'calc': 2, 'calc.': 2, 'calculated': 2, 'est...
Method Details

re_longest_first(self, keys)

source code 

returns a string for a RE group which contains the given keys sorted so that longest keys match first. Any '.' characters are quoted.

init_strings(self)

source code 

This method compiles regular expression strings for matching dates.

Most of the re's in most languages can stay as is. span and range most likely will need to change. Whatever change is done, this method may be called first as DateParser.init_strings(self) so that the invariant expresions don't need to be repeteadly coded. All differences can be coded after DateParser.init_strings(self) call, that way they override stuff from this method. See DateParserRU() as an example.

match_calendar(self, text, cal)

source code 

Try parsing calendar.

Return calendar index and the text with calendar removed.

match_quality(self, text, qual)

source code 

Try matching quality.

Return quality index and the text with quality removed.

match_span(self, text, cal, qual, date)

source code 

Try matching span date.

On success, set the date and return 1. On failure return 0.

match_range(self, text, cal, qual, date)

source code 

Try matching range date.

On success, set the date and return 1. On failure return 0.

match_bce(self, text)

source code 

Try matching BCE qualifier.

Return BCE (True/False) and the text with matched part removed.

match_modifier(self, text, cal, qual, bc, date)

source code 

Try matching date with modifier.

On success, set the date and return 1. On failure return 0.


Class Variable Details

month_to_int

Value:
{u'10ber': 12,
 u'10bre': 12,
 u'10bris': 12,
 u'7ber': 9,
 u'7bre': 9,
 u'7bris': 9,
 u'8ber': 10,
 u'8bre': 10,
...

modifier_to_int

Value:
{'about': 3,
 'abt': 3,
 'abt.': 3,
 'aft': 2,
 'aft.': 2,
 'after': 2,
 'around': 3,
 'bef': 1,
...

hebrew_to_int

Value:
{'adari': 6,
 'adarii': 7,
 'av': 12,
 'elul': 13,
 'heshvan': 2,
 'iyyar': 9,
 'kislev': 3,
 'nisan': 8,
...

french_to_int

Value:
{u'brumaire': 2,
 u'extra': 13,
 u'floréal': 8,
 u'frimaire': 3,
 u'fructidor': 12,
 u'germinal': 7,
 u'messidor': 10,
 u'nivôse': 4,
...

islamic_to_int

Value:
{'dhu hijja': 12,
 'dhu l-hijja': 12,
 'dhu l-qa`da': 11,
 'dhu qadah': 11,
 'jumaada al-thaany': 6,
 'jumaada i': 5,
 'jumaada ii': 5,
 'jumaada-ul-akhir': 6,
...

persian_to_int

Value:
{'aban': 8,
 'azar': 9,
 'bahman': 11,
 'dey': 10,
 'esfand': 12,
 'farvardin': 1,
 'khordad': 3,
 'mehr': 7,
...

calendar_to_int

Value:
{'f': 3,
 'french': 3,
 'french republican': 3,
 'g': 0,
 'gregorian': 0,
 'h': 2,
 'hebrew': 2,
 'i': 5,
...

quality_to_int

Value:
{'calc': 2,
 'calc.': 2,
 'calculated': 2,
 'est': 1,
 'est.': 1,
 'estimated': 1}